Hi,

I've been wondering a bit about how Firebird handles transliteration of 
various parts of a query, in particular regarding (quoted) identifiers.

My situation is that I have a database with default charset UTF8 and all 
char/varchar columns use this charset. I also always use UTF8 as 
connection charset.

I would assume that this means that Firebird expects to receive query 
strings encoded in UTF8, including identifiers and string literals that 
appear in the query.

At the same time, I know that the identifiers are stored in columns with 
charset Unicode_FSS, which as far as understand is identical with UTF8 
except 1) it will accept malformed strings and 2) it will allocate a 
buffer that's fits 4 x maxlength bytes and will accept any string that 
fits in that buffer even if the number of Unicode characters > maxlength.

Are there any other differences between Unicode_FSS and UTF8?
Are all valid UTF8 strings < maxlength identical with the corresponding 
Unicode_FSS string?

Also, string literals can be specified to be some other charset than 
UTF8 - does this mean that the query buffer sent to the server actually 
contains segments with different encodings? Or is the query buffer 
always 100% encoded in the connection charest?

I tried this with UTF8 connection charset:
select _win1252 'asdfö' "Test"
from rdb$database

It returns this:
asdfö

So, it seems the string literal is encoded in UTF8 and sent that way to 
the server, which interprets it as encoded in WIN1252. So, it seems the 
buffer itself is 100% UTF8. Right?

What about identifiers? Assume I have an identifier "Åäöü€ÉÈÏÿñ". Is 
there any instance that Firebird could get into trouble with this 
assuming I always quote it and always use UTF8 connection charset?



Reply via email to