Length() is perfectly consistant with the value's type. As you've demonstrated, Sqlite does not respect the column types in tables as declared in the CREATE statement. Length(text_value) always returns the number of characters and length(blob_value) always returns the number of bytes. The type of the value depends solely on its type at the time of INSERT or UPDATE. I believe the Sqlite author calls this feature "manifest typing".
When coding in C you can easily get the number of bytes in a TEXT or BLOB value with appropriate manipulations of sqlite3_value_type(), sqlite3_result_int() and sqlite3_result_text(). So no problem there. I do not think that Sqlite SQL writers (i.e., non-C progammers) care to know the number of bytes their objects take as long as their assignments and comparison operations work within the framework of the SQL language. 'Bytes' is more of a low-level C concept. But if for some reason you absolutely must need to know the number of bytes that a text or blob value takes from within the Sqlite SQL language, just refer to the documentation of typeof(), CAST, CASE and quote() and you can come up with a pure Sqlite SQL solution if you really wanted to. --- Nuno Lucas <[EMAIL PROTECTED]> wrote: > On 10/28/05, Joe Wilson <[EMAIL PROTECTED]> wrote: > > --- Nuno Lucas <[EMAIL PROTECTED]> wrote: > > > On 10/25/05, Joe Wilson <[EMAIL PROTECTED]> wrote: > > > > The built-in Sqlite length() function works just fine on blobs: > > > > > > > > sqlite> select length(X'0000000000'); > > > > 5 > > > > > > > > Why do you think it doesn't? > > > > > > I remember a few months ago noticing it would count UTF-8 chars, not > > > bytes, so it would not return the right length for blobs in all cases > > > (a '\0' is a valid UTF-8 char, but try with the '(c)' [copyright] sign, > > > which is 2 bytes). > > > > > > I don't have the code in front of me, and can be just my memory, but I > > > don't see that behaviour changing or would break a lot of SQL during > > > normal text manipulation. > > > > > > Regards, > > > ~Nuno Lucas > > > > You are mistaken. Text and blobs are different. > > TEXT is of type "SQLITE_TEXT". BLOB is of type "SQLITE_BLOB". > > length() has always correctly returned the size of a blob - look at the > > code: > > > > [EMAIL PROTECTED]:~/src/sqlite$ sqlite3 test.db3 > SQLite version 3.2.1 > Enter ".help" for instructions > sqlite> create table x ( a TEXT, b BLOB ); > sqlite> insert into x values ( 'N�o', 'N�o' ); > sqlite> select * from x; > N�o|N�o > sqlite> select length(a), length(b) from x; > 3|3 > sqlite> .q > [EMAIL PROTECTED]:~/src/sqlite$ hexdump -c test.db3 > 0000000 S Q L i t e f o r m a t 3 \0 > 0000010 004 \0 001 001 \0 @ \0 \0 \0 002 \0 \0 \0 \0 > 0000020 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 001 \0 \0 \0 001 > 0000030 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 001 \0 \0 \0 \0 > 0000040 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 > * > 0000060 \0 \0 \0 \0 \r \0 \0 \0 001 003 317 \0 003 317 \0 \0 > 0000070 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 > * > 00003c0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 / > 00003d0 001 006 027 017 017 001 O t a b l e x x 002 C > 00003e0 R E A T E T A B L E x ( > 00003f0 a T E X T , b B L O B ) > 0000400 \r \0 \0 \0 001 003 363 \0 003 363 \0 \0 \0 \0 \0 \0 > 0000410 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 > * > 00007f0 \0 \0 \0 \v 001 003 025 025 N 303 243 o N 303 243 o > 0000800 > > As you see from the hexdump, "N�o" is encoded as UTF-8. > > It's true it works if you insert a literal BLOB, like with > x'00112233', but it only means sqlite lacks a byte counting function > for the general case. > > I don't complain about it, as I only use the SQLite C API, but it > means there is no coherency if different programs are used to > manipulate the database (that could insert the data in different > ways). > > Also note that I've seen a lot of sqlite wrappers/managers in the past > that don't care the encoding used (still many people think they can > live with only the first 127 ASCII chars), inserting Latin-1 or others > literally, making the length function returning the wrong value when > accented chars or other symbols - like the (c) symbol - are found. It > would be nice if we could make a quick sanity check on the DB by > comparing the byte count against the string length. > > For me, it's just a low priority feature request, but I believe there > are others who may think otherwise. > > > Regards, > ~Nuno Lucas > > > > /* > > ** Implementation of the length() function > > */ > > static void lengthFunc( > > sqlite3_context *context, > > int argc, > > sqlite3_value **argv > > ){ > > int len; > > > > assert( argc==1 ); > > switch( sqlite3_value_type(argv[0]) ){ > > case SQLITE_BLOB: > > case SQLITE_INTEGER: > > case SQLITE_FLOAT: { > > sqlite3_result_int(context, sqlite3_value_bytes(argv[0])); > > break; > > } > > case SQLITE_TEXT: { > > const char *z = sqlite3_value_text(argv[0]); > > for(len=0; *z; z++){ if( (0xc0&*z)!=0x80 ) len++; } > > sqlite3_result_int(context, len); > > break; > > } > > default: { > > sqlite3_result_null(context); > > break; > > } > > } > > } > __________________________________ Yahoo! Mail - PC Magazine Editors' Choice 2005 http://mail.yahoo.com