Hello, SQLite developers!

Sorry for my bad English.

I use SQLite (great program!), and encounter some unicode problems.
I think, this problems not with dbms kernel, but with command-line
utilite.

So far as SQLite supports unicode, I want at least enter russian
strings into table rows and (probably) create tables and columns
with russian names. I use SQLite from Python via PySQLite 2.0.4
(SQLite version 3.2.5), and it works without any problem -- I can
create russian strings and russian-named tables.

But when I open database with command-line utilite (SQLite version
3.2.6, Windows 2000), I see unreadable junk instead of russian symbols.

Interestingly, I can create create database in command-line utilite,
enter russian strings and create tables with russian names, and all
works without any visible errors (such as 'incorrect symbols in table
name'), but database contains string in native 'cp866' encoding
instead of utf-8. When I try open this file from Python via PySQLite,
I get UnicodeDecodeError.

( There are two russian 8-bit encodings in Windows. Most programs
  store data in 'cp1251' (also known as 'Windows-1251') encoding.
  But command prompt window uses legacy 'cp866' encoding )

It seems today's sqlite command-line utilite completely unaware
about non-ascii symbols and always keep this in current encoding
instead of unicode.

Attached file 'db1' is correct database file created via PySQLite.
All russian strings properly encoded as utf-8. It won't work with
command-line utilite (all russian strings printed as unreadable junk).

Attached file 'db2' is incorrect database with same structure created
via command-line utilite. All russian strings erroneously encoded
as cp866. It printed perfectly in command line, but won't work with
PySQLite (UnicodeDecodeError raised when I try execute query).

I think, command-line utilite must determine current encoding
(In my case, cp866), and perform transformation
native_encoding <-> utf-8.

---------------------------------------------------------------------

Attached text files is SQL scripts containing russian symbols in
different encodings. I try execute this files in command-line utilite.
This files have same content, but in different encodings.

'cp866.txt' generates incorrect database which worked as "normal" from
command-line utilite, but keep non-ascii strings as cp866 instead of unicode.

'cp1251.txt' is usual russian text in Windows. Almost all programs
generates russian text in this encoding. When i try read this file
from command-line utilite, result database contains cp1251-encoded
strings instead of unicode. Screen output is unreadable in command
line, and database not work in Python ('UnicodeDecodeError' raises)

Main russian encoding in Windows is cp1251. I think, ideal solution
for russian strings is read and write dump files as cp1251, but produce
screen output as cp866. As minimal solution, all data and files can be
procesed as cp866, with internal converting to utf-8.

'utf8-bom.txt' is unicode file, created with standard Windows text
editor (Notepad). On Windows, all generated utf-8 files have prefix
0xEF 0xBB 0xBF (this utf-8 'byte order mark' used for distinguish
from plain 8-bit text). sqlite.exe generates error when read this file.

'utf8.txt' is the same file as 'utf8-bom.txt' with header deleted
manually (Standard windows editors don't allow delete this header).
It works well, and generated correct database. Printed russian strings
unreadable from sqlite command-line utilite.

'utf16be.txt' and 'utf16le.txt' are SQL scripts in 16-bit Unicode
format (Big-Endian and Low-Endian correspondingly). Todays
sqlite command-line utilite can't read this files.

---------------------------------------------------------------------

I think, ideal command line utilite must read correctly ALL these
files except 'cp866.txt', and keep strings as unicode.
Russian screen output on Windows must be generated in cp866.


Best regards,
 Alexander                          mailto:[EMAIL PROTECTED]
BEGIN TRANSACTION;
CREATE TABLE t1 (a integer, b text);
INSERT INTO t1 VALUES(1, 'English');
INSERT INTO t1 VALUES(2, 'ãá᪨©');
CREATE TABLE ’ ¡«¨æ  (Š®«®­ª 1, Š®«®­ª 2);
INSERT INTO ’ ¡«¨æ  VALUES('‘âப 1', '‘âப 2');
COMMIT;
BEGIN TRANSACTION;
CREATE TABLE t1 (a integer, b text);
INSERT INTO t1 VALUES(1, 'English');
INSERT INTO t1 VALUES(2, 'Ðóññêèé');
CREATE TABLE Òàáëèöà (Êîëîíêà1, Êîëîíêà2);
INSERT INTO Òàáëèöà VALUES('Ñòðîêà1', 'Ñòðîêà2');
COMMIT;
BEGIN TRANSACTION;
CREATE TABLE t1 (a integer, b text);
INSERT INTO t1 VALUES(1, 'English');
INSERT INTO t1 VALUES(2, 'Русский');
CREATE TABLE Таблица (Колонка1, Колонка2);
INSERT INTO Таблица VALUES('Строка1', 'Строка2');
COMMIT;
BEGIN TRANSACTION;
CREATE TABLE t1 (a integer, b text);
INSERT INTO t1 VALUES(1, 'English');
INSERT INTO t1 VALUES(2, 'Русский');
CREATE TABLE Таблица (Колонка1, Колонка2);
INSERT INTO Таблица VALUES('Строка1', 'Строка2');
COMMIT;
þÿBEGIN TRANSACTION;

CREATE TABLE t1 (a integer, b text);

INSERT INTO t1 VALUES(1, 'English');

INSERT INTO t1 VALUES(2, ' CAA:89');

CREATE TABLE "01;8F0 (>;>=:01, 
>;>=:02);

INSERT INTO "01;8F0 VALUES('[EMAIL 
PROTECTED]>:01', '[EMAIL PROTECTED]>:02');

COMMIT;

ÿþBEGIN TRANSACTION;

CREATE TABLE t1 (a integer, b text);

INSERT INTO t1 VALUES(1, 'English');

INSERT INTO t1 VALUES(2, ' CAA:89');

CREATE TABLE "01;8F0 (>;>=:01, 
>;>=:02);

INSERT INTO "01;8F0 VALUES('[EMAIL 
PROTECTED]>:01', '[EMAIL PROTECTED]>:02');

COMMIT;

Reply via email to