Thursday, March 22, 2012

Binary collation

Is there a code page independent binary collation? I want a collation that
will sort binary order that will not force translations from other collations.
My problem is this: I want the user to be able to use any collation they
want. My process is going to work with their data in binary order. Even if
I could retrieve the collation they used, I don't know how to, in a program,
to convert that collation into a binary collation. For some collations, its
obvious. If their collation was SQL_CZECH_CP1350_CI_AS I could probably
construct Czech_BIN from that.
Hello Michael,
Did you try to cast the value to binary? Here's a sample i've just written:
create table mytable (Col1 nvarchar(20));
insert into mytable values (N'Die Nu, die Nsse');
select CAST ( Col1 AS binary (40)) from mytable
The query will return the following value as binary. Note that there is no
collation on binary.
0x44006900650020004E007500DF002C002000640069006500 20004E00FC0073007300650000
000000
"Michael Bauers" <MichaelBauers@.discussions.microsoft.com> wrote in message
news:54F4F3E1-0FD9-4209-9301-1182139A7DC1@.microsoft.com...
> Is there a code page independent binary collation? I want a collation
that
> will sort binary order that will not force translations from other
collations.
> My problem is this: I want the user to be able to use any collation they
> want. My process is going to work with their data in binary order. Even
if
> I could retrieve the collation they used, I don't know how to, in a
program,
> to convert that collation into a binary collation. For some collations,
its
> obvious. If their collation was SQL_CZECH_CP1350_CI_AS I could probably
> construct Czech_BIN from that.
|||What if the values come from another table (not constant values.)
"Michael Thomas [Microsoft]" wrote:

> Hello Michael,
> Did you try to cast the value to binary? Here's a sample i've just written:
> create table mytable (Col1 nvarchar(20));
> insert into mytable values (N'Die Nu?, die Nüsse');
> select CAST ( Col1 AS binary (40)) from mytable
>
>
> The query will return the following value as binary. Note that there is no
> collation on binary.
>
> 0x44006900650020004E007500DF002C002000640069006500 20004E00FC0073007300650000
> 000000
>
> "Michael Bauers" <MichaelBauers@.discussions.microsoft.com> wrote in message
> news:54F4F3E1-0FD9-4209-9301-1182139A7DC1@.microsoft.com...
> that
> collations.
> if
> program,
> its
>
>
|||What about storing the character data as Unicode (nvarchar, nchar, and
following http://support.microsoft.com/?id=239530)? Once it's Unicode it
won't undergo any code page converstions.
If you're set on using non-Unicode data types and don't want the
possibility of code page conversions when the text data flows between
environments with different code pages, the proper thing to do is to store
the data in varbinary or image columns. char/varchar is considered to be
string data, and SQL will always attempt to retain the "meaning" of the
characters as they move between different code page environments. If you
want to remove this functionality, then you're essentially asking SQL to
treat the data as a raw binary byte stream anyway. Don't attempt to store
code page X character data in a code page Y varchar column.
Can you clarify your scenario a bit? Is your app intended to work with
arbitrary database schemas (like a generic data transfer tool), or is the
database schema yours? What's the primary problem you're trying to avoid?
Pulling code page X data out of the data into a non-Unicode variable in a
code page Y application, and getting data loss as a result?
Would it help to be able to determine the code page associated with each
collation so you could pick an appropriate binary collation?
SELECT name, COLLATIONPROPERTY (name, 'CodePage')
FROM ::fn_helpcollations ()
WHERE name LIKE '%_BIN%'
Bart
Bart Duncan
Microsoft SQL Server Support
Please reply to the newsgroup only - thanks.
This posting is provided "AS IS" with no warranties, and confers no rights.
| Thread-Topic: Binary collation
| thread-index: AcTMGjaHGpqnfmm4Qp2RM3AZmAHZXg==
| X-WBNR-Posting-Host: 12.24.200.251
| From: "=?Utf-8?B?TWljaGFlbCBCYXVlcnM=?="
<MichaelBauers@.discussions.microsoft.com>
| References: <54F4F3E1-0FD9-4209-9301-1182139A7DC1@.microsoft.com>
<udSTNlBzEHA.3656@.TK2MSFTNGP09.phx.gbl>
| Subject: Re: Binary collation
| Date: Tue, 16 Nov 2004 12:24:02 -0800
| Lines: 47
| Message-ID: <EA0BBCAD-D82F-4393-8FC0-12DFABAC5DCB@.microsoft.com>
| MIME-Version: 1.0
| Content-Type: text/plain;
| charset="Utf-8"
| Content-Transfer-Encoding: 8bit
| X-Newsreader: Microsoft CDO for Windows 2000
| Content-Class: urn:content-classes:message
| Importance: normal
| Priority: normal
| X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0
| Newsgroups: microsoft.public.sqlserver.server
| NNTP-Posting-Host: TK2MSFTNGXA03.phx.gbl 10.40.1.29
| Path: cpmsftngxa10.phx.gbl!TK2MSFTNGXA03.phx.gbl
| Xref: cpmsftngxa10.phx.gbl microsoft.public.sqlserver.server:367631
| X-Tomcat-NG: microsoft.public.sqlserver.server
|
| What if the values come from another table (not constant values.)
|
|
| "Michael Thomas [Microsoft]" wrote:
|
| > Hello Michael,
| >
| > Did you try to cast the value to binary? Here's a sample i've just
written:
| >
| > create table mytable (Col1 nvarchar(20));
| >
| > insert into mytable values (N'Die Nu?, die Nüsse');
| >
| > select CAST ( Col1 AS binary (40)) from mytable
| >
| >
| >
| >
| >
| > The query will return the following value as binary. Note that there is
no
| > collation on binary.
| >
| >
| >
| >
0x44006900650020004E007500DF002C002000640069006500 20004E00FC0073007300650000
| > 000000
| >
| >
| > "Michael Bauers" <MichaelBauers@.discussions.microsoft.com> wrote in
message
| > news:54F4F3E1-0FD9-4209-9301-1182139A7DC1@.microsoft.com...
| > > Is there a code page independent binary collation? I want a collation
| > that
| > > will sort binary order that will not force translations from other
| > collations.
| > >
| > > My problem is this: I want the user to be able to use any collation
they
| > > want. My process is going to work with their data in binary order.
Even
| > if
| > > I could retrieve the collation they used, I don't know how to, in a
| > program,
| > > to convert that collation into a binary collation. For some
collations,
| > its
| > > obvious. If their collation was SQL_CZECH_CP1350_CI_AS I could
probably
| > > construct Czech_BIN from that.
| >
| >
| >
|
|||Michael,
Just saw your other post on this topic where you provided a bit more detail
about what you are doing:
> I should clarify the situation. We are inserting rows into a table with
> LATIN1_BIN collation from a table with CZECH_BIN collation.
From this I gather that the problem you are facing is data loss as the
strings undergo a code page conversion.
My first question is: why not use the same collation for the destination
table?
My second question would be: Can you use Unicode types for the destination
table? A Unicode column can store data from any other Unicode or
non-Unicode column regardless of collation, without data loss.
Bart
Bart Duncan
Microsoft SQL Server Support
Please reply to the newsgroup only - thanks.
This posting is provided "AS IS" with no warranties, and confers no rights.
| Newsgroups: microsoft.public.sqlserver.server
| From: bartd@.online.microsoft.com (Bart Duncan [MSFT])
| Organization: Microsoft
| Date: Wed, 17 Nov 2004 18:45:21 GMT
| Subject: Re: Binary collation
| X-Tomcat-NG: microsoft.public.sqlserver.server
| MIME-Version: 1.0
| Content-Type: text/plain
| Content-Transfer-Encoding: 7bit
|
| What about storing the character data as Unicode (nvarchar, nchar, and
| following http://support.microsoft.com/?id=239530)? Once it's Unicode it
| won't undergo any code page converstions.
|
| If you're set on using non-Unicode data types and don't want the
| possibility of code page conversions when the text data flows between
| environments with different code pages, the proper thing to do is to
store
| the data in varbinary or image columns. char/varchar is considered to be
| string data, and SQL will always attempt to retain the "meaning" of the
| characters as they move between different code page environments. If you
| want to remove this functionality, then you're essentially asking SQL to
| treat the data as a raw binary byte stream anyway. Don't attempt to
store
| code page X character data in a code page Y varchar column.
|
| Can you clarify your scenario a bit? Is your app intended to work with
| arbitrary database schemas (like a generic data transfer tool), or is the
| database schema yours? What's the primary problem you're trying to
avoid?
| Pulling code page X data out of the data into a non-Unicode variable in a
| code page Y application, and getting data loss as a result?
|
| Would it help to be able to determine the code page associated with each
| collation so you could pick an appropriate binary collation?
| SELECT name, COLLATIONPROPERTY (name, 'CodePage')
| FROM ::fn_helpcollations ()
| WHERE name LIKE '%_BIN%'
|
| Bart
| --
| Bart Duncan
| Microsoft SQL Server Support
|
| Please reply to the newsgroup only - thanks.
| This posting is provided "AS IS" with no warranties, and confers no
rights.
|
|
| --
| | Thread-Topic: Binary collation
| | thread-index: AcTMGjaHGpqnfmm4Qp2RM3AZmAHZXg==
| | X-WBNR-Posting-Host: 12.24.200.251
| | From: "=?Utf-8?B?TWljaGFlbCBCYXVlcnM=?="
| <MichaelBauers@.discussions.microsoft.com>
| | References: <54F4F3E1-0FD9-4209-9301-1182139A7DC1@.microsoft.com>
| <udSTNlBzEHA.3656@.TK2MSFTNGP09.phx.gbl>
| | Subject: Re: Binary collation
| | Date: Tue, 16 Nov 2004 12:24:02 -0800
| | Lines: 47
| | Message-ID: <EA0BBCAD-D82F-4393-8FC0-12DFABAC5DCB@.microsoft.com>
| | MIME-Version: 1.0
| | Content-Type: text/plain;
| | charset="Utf-8"
| | Content-Transfer-Encoding: 8bit
| | X-Newsreader: Microsoft CDO for Windows 2000
| | Content-Class: urn:content-classes:message
| | Importance: normal
| | Priority: normal
| | X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0
| | Newsgroups: microsoft.public.sqlserver.server
| | NNTP-Posting-Host: TK2MSFTNGXA03.phx.gbl 10.40.1.29
| | Path: cpmsftngxa10.phx.gbl!TK2MSFTNGXA03.phx.gbl
| | Xref: cpmsftngxa10.phx.gbl microsoft.public.sqlserver.server:367631
| | X-Tomcat-NG: microsoft.public.sqlserver.server
| |
| | What if the values come from another table (not constant values.)
| |
| |
| | "Michael Thomas [Microsoft]" wrote:
| |
| | > Hello Michael,
| | >
| | > Did you try to cast the value to binary? Here's a sample i've just
| written:
| | >
| | > create table mytable (Col1 nvarchar(20));
| | >
| | > insert into mytable values (N'Die Nu?, die Nüsse');
| | >
| | > select CAST ( Col1 AS binary (40)) from mytable
| | >
| | >
| | >
| | >
| | >
| | > The query will return the following value as binary. Note that there
is
| no
| | > collation on binary.
| | >
| | >
| | >
| | >
|
0x44006900650020004E007500DF002C002000640069006500 20004E00FC0073007300650000
| | > 000000
| | >
| | >
| | > "Michael Bauers" <MichaelBauers@.discussions.microsoft.com> wrote in
| message
| | > news:54F4F3E1-0FD9-4209-9301-1182139A7DC1@.microsoft.com...
| | > > Is there a code page independent binary collation? I want a
collation
| | > that
| | > > will sort binary order that will not force translations from other
| | > collations.
| | > >
| | > > My problem is this: I want the user to be able to use any collation
| they
| | > > want. My process is going to work with their data in binary order.
| Even
| | > if
| | > > I could retrieve the collation they used, I don't know how to, in a
| | > program,
| | > > to convert that collation into a binary collation. For some
| collations,
| | > its
| | > > obvious. If their collation was SQL_CZECH_CP1350_CI_AS I could
| probably
| | > > construct Czech_BIN from that.
| | >
| | >
| | >
| |
|

No comments:

Post a Comment