Re: [sqlite] Some clarification needed about Unicode

2009-10-30 Thread A.J.Millan
- Original Message - From: John Crenshaw johncrens...@priacta.com To: General Discussion of SQLite Database sqlite-users@sqlite.org Sent: Thursday, October 29, 2009 10:55 PM Subject: Re: [sqlite] Some clarification needed about Unicode No, I mean which encoding. You can't give a UTF-16

Re: [sqlite] Some clarification needed about Unicode

2009-10-30 Thread John Crenshaw
http://codesnipers.com/?q=utf-8-versus-windows-unicode The author asset that .NET is the only platform that offer full UTF-16 support in the Windows API. The author is half mistaken, as was I. Michael Kaplan and Raymond Chen (big MS names many will recognize) clarified this. For Win2k, only

[sqlite] Some clarification needed about Unicode

2009-10-29 Thread A.J.Millan
Hi list: After some years using this wonderful tool, I embraced the internationalization of a application, and despite some readings in this list, and muy own test -not conclusive-, I still have some obscure corners. [1] Supposing some textual data already inserted as UTF-8 (default mode) in

Re: [sqlite] Some clarification needed about Unicode

2009-10-29 Thread Jean-Christophe Deschamps
[1] Supposing some textual data already inserted as UTF-8 (default mode) in a dBase, and a connection opened with sqlite3_open(): Does a sqlite3_column_text16 retrieves a correct UTF-16 content? Is to say, do SQLite the convertion internally? [2] Assuming the previous -or a UTF-16 content

Re: [sqlite] Some clarification needed about Unicode

2009-10-29 Thread John Crenshaw
- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of A.J.Millan Sent: Thursday, October 29, 2009 5:14 AM To: sqlite-users@sqlite.org Subject: [sqlite] Some clarification needed about Unicode Hi list: After some years using this wonderful tool, I embraced

Re: [sqlite] Some clarification needed about Unicode

2009-10-29 Thread Jean-Denis Muys
On 10/29/09 10:51 , John Crenshaw johncrens...@priacta.com wrote: 2. UTF-8 is NOT the same as ASCII for values greater than 127. ASCII only uses 7 bits values, so no larger representation can be the same as ASCII for values greater than 127. This may be seen as nit picking, but when

Re: [sqlite] Some clarification needed about Unicode

2009-10-29 Thread A.J.Millan
From: Jean-Denis Muys jdm...@kleegroup.com Sent: Thursday, October 29, 2009 11:10 AM Subject: Re: [sqlite] Some clarification needed about Unicode This may be seen as nit picking, but when discussing character encodings and representations, the issues can become so subtil and confusing

Re: [sqlite] Some clarification needed about Unicode

2009-10-29 Thread Dan Kennedy
On Oct 29, 2009, at 4:41 PM, Jean-Christophe Deschamps wrote: [1] Supposing some textual data already inserted as UTF-8 (default mode) in a dBase, and a connection opened with sqlite3_open(): Does a sqlite3_column_text16 retrieves a correct UTF-16 content? Is to say, do SQLite the

Re: [sqlite] Some clarification needed about Unicode

2009-10-29 Thread John Crenshaw
Kennedy Sent: Thursday, October 29, 2009 6:39 AM To: General Discussion of SQLite Database Subject: Re: [sqlite] Some clarification needed about Unicode On Oct 29, 2009, at 4:41 PM, Jean-Christophe Deschamps wrote: [1] Supposing some textual data already inserted as UTF-8 (default mode

Re: [sqlite] Some clarification needed about Unicode

2009-10-29 Thread A.J.Millan
John: 2. UTF-8 is NOT the same as ASCII for values greater than 127. Similarly, UTF-16 is NOT the same as UCS-2 (the wide Unicode chars used by MS APIs), though it looks the same at low values. UTF-16 is a multibyte character set, while UCS-2 is always 2 bytes per character. You have to

Re: [sqlite] Some clarification needed about Unicode

2009-10-29 Thread Igor Tandetnik
John Crenshaw wrote: My main point is that you can't take the UTF-16 string and safely supply it to APIs which want UCS-2 encoded text, such as Win32 APIs (including things like SetWindowText()). What makes you believe Win32 API, and SetWindowText in particular, does not support surrogate

Re: [sqlite] Some clarification needed about Unicode

2009-10-29 Thread Igor Tandetnik
A.J.Millan wrote: Really, here you touched tangentially the core of my question. Besides all those great theories, at last I have UTF-8 encoded data in a dBase, and the UCS-2 encoded data of the MS Win32 API (w_chars in muy Cpp app). The question is: What is the concrete way to and from that

Re: [sqlite] Some clarification needed about Unicode

2009-10-29 Thread A.J.Millan
John: From: John Crenshaw johncrens...@priacta.com To: General Discussion of SQLite Database sqlite-users@sqlite.org Sent: Thursday, October 29, 2009 11:46 AM Subject: Re: [sqlite] Some clarification needed about Unicode My main point is that you can't take the UTF-16 string and safely supply

Re: [sqlite] Some clarification needed about Unicode

2009-10-29 Thread Igor Tandetnik
John Crenshaw wrote: Similarly, UTF-16 is NOT the same as UCS-2 (the wide Unicode chars used by MS APIs) Win32 API does too support UTF-16. What makes you believe otherwise? though it looks the same at low values. UTF-16 is a multibyte character set, while UCS-2 is always 2 bytes per

Re: [sqlite] Some clarification needed about Unicode

2009-10-29 Thread Igor Tandetnik
A.J.Millan wrote: Now, do you know about some library to conver to and from UTF-8 or UTF-16 to UCS-2? John's claims notwithstanding, you don't want or need UCS-2. It's a strict subset of UTF-16. Every valid UCS-2 string is also a UTF-16 string, but the converse is not true. UCS-2 is of

Re: [sqlite] Some clarification needed about Unicode

2009-10-29 Thread A.J.Millan
To: sqlite-users@sqlite.org Sent: Thursday, October 29, 2009 12:51 PM Subject: Re: [sqlite] Some clarification needed about Unicode A.J.Millan wrote: Really, here you touched tangentially the core of my question. Besides all those great theories, at last I have UTF-8 encoded data in a dBase

Re: [sqlite] Some clarification needed about Unicode

2009-10-29 Thread Jean-Denis Muys
On 10/29/09 12:55 , A.J.Millan a...@zator.com wrote: Now, do you know about some library to conver to and from UTF-8 or UTF-16 to UCS-2? [4-1b] convert with WideCharToMultiByte(CP_UTF8) On 10/29/09 12:51 , Igor Tandetnik itandet...@mvps.org wrote: You can use WideCharToMultiByte(CP_UTF8)

Re: [sqlite] Some clarification needed about Unicode

2009-10-29 Thread Igor Tandetnik
A.J.Millan wrote: Thanks for your answer; let me see if I understood correctly the process: [1] Read the actual textual data with sqlite3_column_blob() [2] Assuming the system code page matches the one used when the data was originally inserted, convert with mbstowcs() [3] (Doubt) The

Re: [sqlite] Some clarification needed about Unicode

2009-10-29 Thread A.J.Millan
- Original Message - From: Igor Tandetnik itandet...@mvps.org To: sqlite-users@sqlite.org Sent: Thursday, October 29, 2009 1:45 PM Subject: Re: [sqlite] Some clarification needed about Unicode The only Win32 API function that can handle UTF-8 strings is MultiByteToWideChar (when

Re: [sqlite] Some clarification needed about Unicode

2009-10-29 Thread Jean-Christophe Deschamps
My main point is that you can't take the UTF-16 string and safely supply it to APIs which want UCS-2 encoded text, such as Win32 APIs (including things like SetWindowText()). Odds are that the only library you are using which supports UTF-16 is SQLite. You should always be converting the text

Re: [sqlite] Some clarification needed about Unicode

2009-10-29 Thread Jean-Christophe Deschamps
Hi, Please, follow Igor advices, he is right. [1] Read the actual textual data with sqlite3_column_blob() Which you can directly convert to TEXT if, as you say, you entered only 7-bit ASCII or UTF-8 compliant data. [2] Assuming the system code page matches the one used when the data was

Re: [sqlite] Some clarification needed about Unicode

2009-10-29 Thread A.J.Millan
-Christophe Deschamps j...@q-e-d.org To: General Discussion of SQLite Database sqlite-users@sqlite.org Sent: Thursday, October 29, 2009 3:04 PM Subject: Re: [sqlite] Some clarification needed about Unicode Hi, Please, follow Igor advices, he is right. [1] Read the actual textual data

Re: [sqlite] Some clarification needed about Unicode

2009-10-29 Thread John Crenshaw
there must exist zillions [working] wrappers to VC++. You would think. In fact, there are only a few, and most are not very good. I used the wrapper at Code Project as a base, then added handling for SQLITE_LOCKED, a date class, better blob handling, transaction support, and other useful

Re: [sqlite] Some clarification needed about Unicode

2009-10-29 Thread Jean-Christophe Deschamps
Hi, ´¯¯¯ Despite of that, I'm aware that I have some more that pure US-ASCII in the blob objects, in fact I'm near your situation because used the Spanish languaje and have 8-bit extended ASCII with some special characters -accented characters and so-. So the question is Yes, I have upper-ANSI

Re: [sqlite] Some clarification needed about Unicode

2009-10-29 Thread John Crenshaw
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Jean-Christophe Deschamps Sent: Thursday, October 29, 2009 9:18 AM To: General Discussion of SQLite Database Subject: Re: [sqlite] Some clarification needed about Unicode My main point is that you can't take the UTF-16 string and safely supply

Re: [sqlite] Some clarification needed about Unicode

2009-10-29 Thread Igor Tandetnik
John Crenshaw johncrens...@priacta.com wrote: 2. MultiByteToWideChar supports a MB_COMPOSITE flag, which appears to give UTF-16 output. MB_COMPOSITE has nothing to do with surrogate pairs, and everything to do with whether, say, Latin-1 character Á (A with accute) is converted to a single

Re: [sqlite] Some clarification needed about Unicode

2009-10-29 Thread Jean-Christophe Deschamps
Hi John, Microsoft never seems to clearly identify whether the wide APIs should be given UTF-16 or UCS-2. Their guide on internationalization would seem to suggest that UCS-2 must be used, however, there is some reason to believe that perhaps UTF-16 is handled correctly as well. Couldn't find

Re: [sqlite] Some clarification needed about Unicode

2009-10-29 Thread John Crenshaw
...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Igor Tandetnik Sent: Thursday, October 29, 2009 5:08 PM To: sqlite-users@sqlite.org Subject: Re: [sqlite] Some clarification needed about Unicode John Crenshaw johncrens...@priacta.com wrote: 2. MultiByteToWideChar supports a MB_COMPOSITE

Re: [sqlite] Some clarification needed about Unicode

2009-10-29 Thread John Crenshaw
Don't worry: we're all confused with MS wording! For what I understand having also myself tried to sort out the question, is that there is a line drawn: before XP unicode support included was nothing else than UCS-2 (W2K). Xp and post-XP system include Unicode 5.1 and use UTF-16

Re: [sqlite] Some clarification needed about Unicode

2009-10-29 Thread Jean-Christophe Deschamps
Thanks for the link. That clarifies things a lot. So, for the OP, if you are targeting Win2k, it would be a good idea to use UCS-2, not UTF-16, with any wide API calls. XP and above should (according to Kaplan and Chen) support UTF-16 for API calls. W2k is clearly something of the past. But

Re: [sqlite] Some clarification needed about Unicode

2009-10-29 Thread Igor Tandetnik
John Crenshaw johncrens...@priacta.com wrote: No, I mean which encoding. You can't give a UTF-16 string to an API that only knows how to handle UCS-2 encoded data Well, most of the time, you can. Only in rare cases do you need to treat surrogate pairs in special way. One such case, relevant

Re: [sqlite] Some clarification needed about Unicode

2009-10-29 Thread John Crenshaw
No, I mean which encoding. You can't give a UTF-16 string to an API that only knows how to handle UCS-2 encoded data Well, most of the time, you can. Only in rare cases do you need to treat surrogate pairs in special way. One such case, relevant to this discussion, is converting UTF-16