In Valid Logic

Endlessly expanding technology

One of these things is not like the other, except to SQL

Was recently working on an issue that was so perplexing, that I almost had David Penton, our resident SQL expert stumped.

We had an issue with some username handling and it ended up boiling down to there we could go into SQL Server, run this query, and get a result when you’d normally expect it to not match:

[sql light=”true”]select 1 where N’ß’ = N’ss’[/sql]

After that, I was truly stumped. I’d Googled everything I could think of any came up empty. Had I known what the German sharp S was, life would have been easier. Come to find out through some sources, that some of the SQL standards (in this case, SQL92 is referenced) call for the German sharp S is translated down to “ss” in string comparisons. So although it might seem like a bug, it is to-spec. Oracle is also reported as working this way.

To work around the behavior, one option is a workaround function for SQL Server which essentially has you convert the string to a binary column and store it. This may be outside what you can do in your application though, as in ours, we’d need to update every place that does string comparisons on the username as well as altering the ASP.NET Membership tables and stored procedures.

An alternative is to change your database collation to use one that is more binary dependent such as Latin1_General_BIN. This will treat ß and ‘ss’ separately, though also be aware since it is binary, it is case insensitive and some behaviors such as sorting may also change.

A topic like this is important for any developer for a few reasons:

First, being in the US, it is easy to be ignorant of other culture’s language handling. You may work to add resource labels so your application can be translated and such, but every so often you will find behaviors related to internalization that perplex you because you’ve never heard of the German sharp S (as in this case).

Second, when its an issue in SQL and you’re in doubt, look at collation first. SQL Server has a huge list of collations, and each one does something different. Its either a behavior with the one you’re using, or the user is using a collation you haven’t tried (tested) before.

And finally… need to try harder to stump Penton. It can be done!

Monday, February 08, 2010

blog comments powered by Disqus