Re: [sqlite] UTF support

Teg Tue, 07 Oct 2014 06:41:31 -0700

Hello J,

        string_t        sTest;
        int nLengthNeeded = WideCharToMultiByte(CP_UTF8, 0, pszWide,nLength, 0, 
0, 0, 0);
        if( !nLengthNeeded )
        {
                ASSERT(0);
                return(E_ABORT);
        }


        sTest.resize(nLengthNeeded + 16);
        nLength = WideCharToMultiByte(CP_UTF8, 0, pszWide,nLength, 
reinterpret_cast<char*>(&sTest[0]),(uint32_t)sTest.size(),0, 0);
        sTest[nLength] = 0;
        ASSERT(!strcmp(sTest.c_str(),(char*)(*this)));


Is what I used to use to convert from UTF-16 to UTF-8 in Windows.
There are similar functions for converting in the opposite direction.
Internally my program is 100% UTF8. I do translations to UTF-16 right
at the point I display the strings in Windows.

This code is actually some test code I use today to compare the
conversions I do manually to what Windows generates. In debug mode, it
does two conversions and compares the two.

Tuesday, October 7, 2014, 8:59:07 AM, you wrote:

JD> On Tue, Oct 7, 2014 at 5:39 AM, Richard Hipp <[email protected]> wrote:

>> On Tue, Oct 7, 2014 at 12:06 AM, J Decker <[email protected]> wrote:
>>
>> > I saw a few things go by about unicode... and understand that it should
>> > just work to store the data as characters...
>> >
>> > I'm getting a unrecognized token... and think this page isn't right...
>> > I was playing with greek translation of 'mary had a little lamb'
>> >
>> >
>> I ran the following script through the sqlite3 command-line shell and it
>> works fine:
>>
>> CREATE TABLE option4_values(option_id, string, segment);
>> REPLACE INTO option4_values(`option_id`,`string`,`segment`)
>>  VALUES('8b377a68-4358-11e4-ace4-3085a9903449','Μαίρη είχε ένα μικρό
>> αρνί',0);
>> SELECT * FROM option4_values;
>>
>> Hmm... wonder what it's getting....


>> I suggest that the problem is in your programming language, or in the
>> wrapper that links your programming language to SQLite, not in SQLite
>> itself.  Can you tell us what programming language and what operating
>> system you are using?
>>
>> C, visual studio 2012 build, windows.
JD> built with UNICODE enabled... instead of multi-byte character set....
JD> it could be my conversion routine... I'm using wcstombs_s  with _MSC_VER
JD> set... before it was just faililng, because wcstombs_s doesn't convert
JD> anything with a high bit set... so I added a handler to replace it with a
JD> utf-8 16 bit character encode (expands to 3 bytes  as described here
JD> http://en.wikipedia.org/wiki/UTF-8#Description  )

JD> if( err == 42 )
JD> {
JD> (*ch++) = 0xE0 | ((unsigned char*)wch)[1] >> 4;
JD> (*ch++) = 0x80 | ( ( ((unsigned char*)wch)[1] & 0xF ) << 2 ) | ( (
JD> ((unsigned char*)wch)[0] ) >> 6 );
JD> (*ch++) = 0x80 |  ( ((unsigned char*)wch)[0] & 0x3F );
JD> }

JD> which works... if I mouse-over on char * string it shows the right unicode
JD> characters.
JD> The logging that I included in the first message was converted from
JD> wchar_t* to char* and then the sqlite3_strerror() is expanded from char *
JD> to wchar_t * and still shows the right characters....

JD>  I just cannot identify the unrecognized token... it's obviously not at
JD> character 0... (that's gotten by comparing the pzTail result of
JD> sqlite3_prepare_v2 )...




>> --
>> D. Richard Hipp
>> [email protected]
>> _______________________________________________
>> sqlite-users mailing list
>> [email protected]
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
JD> _______________________________________________
JD> sqlite-users mailing list
JD> [email protected]
JD> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



-- 
Best regards,
 Teg                            mailto:[email protected]

_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UTF support

Reply via email to