Re: [sqlite] built-in functrion suggestion: size of blob

Joe Wilson Mon, 31 Oct 2005 06:49:40 -0800

Length() is perfectly consistant with the value's type. As you've 
demonstrated, Sqlite does not respect the column types in tables 
as declared in the CREATE statement.
Length(text_value) always returns the number of characters and 
length(blob_value) always returns the number of bytes. The type 
of the value depends solely on its type at the time of INSERT 
or UPDATE. I believe the Sqlite author calls this feature 
"manifest typing".


When coding in C you can easily get the number of bytes in a TEXT 
or BLOB value with appropriate manipulations of sqlite3_value_type(), 
sqlite3_result_int() and sqlite3_result_text(). So no problem there.

I do not think that Sqlite SQL writers (i.e., non-C progammers) care 
to know the number of bytes their objects take as long as their 
assignments and comparison operations work within the framework of 
the SQL language. 'Bytes' is more of a low-level C concept. But if 
for some reason you absolutely must need to know the number of bytes 
that a text or blob value takes from within the Sqlite SQL language, 
just refer to the documentation of typeof(), CAST, CASE and quote() 
and you can come up with a pure Sqlite SQL solution if you really 
wanted to.


--- Nuno Lucas <[EMAIL PROTECTED]> wrote:

> On 10/28/05, Joe Wilson <[EMAIL PROTECTED]> wrote:
> > --- Nuno Lucas <[EMAIL PROTECTED]> wrote:
> > > On 10/25/05, Joe Wilson <[EMAIL PROTECTED]> wrote:
> > > > The built-in Sqlite length() function works just fine on blobs:
> > > >
> > > >   sqlite> select length(X'0000000000');
> > > >   5
> > > >
> > > > Why do you think it doesn't?
> > >
> > > I remember a few months ago noticing it would count UTF-8 chars, not
> > > bytes, so it would not return the right length for blobs in all cases
> > > (a '\0' is a valid UTF-8 char, but try with the '(c)' [copyright] sign,
> > > which is 2 bytes).
> > >
> > > I don't have the code in front of me, and can be just my memory, but I
> > > don't see that behaviour changing or would break a lot of SQL during
> > > normal text manipulation.
> > >
> > > Regards,
> > > ~Nuno Lucas
> >
> > You are mistaken. Text and blobs are different.
> > TEXT is of type "SQLITE_TEXT". BLOB is of type "SQLITE_BLOB".
> > length() has always correctly returned the size of a blob - look at the 
> > code:
> >
> 
> [EMAIL PROTECTED]:~/src/sqlite$ sqlite3 test.db3
> SQLite version 3.2.1
> Enter ".help" for instructions
> sqlite> create table x ( a TEXT, b BLOB );
> sqlite> insert into x values ( 'Nï¿½o', 'Nï¿½o' );
> sqlite> select * from x;
> Nï¿½o|Nï¿½o
> sqlite> select length(a), length(b) from x;
> 3|3
> sqlite> .q
> [EMAIL PROTECTED]:~/src/sqlite$ hexdump -c test.db3
> 0000000   S   Q   L   i   t   e       f   o   r   m   a   t       3  \0
> 0000010 004  \0 001 001  \0   @          \0  \0  \0 002  \0  \0  \0  \0
> 0000020  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0 001  \0  \0  \0 001
> 0000030  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0 001  \0  \0  \0  \0
> 0000040  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
> *
> 0000060  \0  \0  \0  \0  \r  \0  \0  \0 001 003 317  \0 003 317  \0  \0
> 0000070  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
> *
> 00003c0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0   /
> 00003d0 001 006 027 017 017 001   O   t   a   b   l   e   x   x 002   C
> 00003e0   R   E   A   T   E       T   A   B   L   E       x       (
> 00003f0   a       T   E   X   T   ,       b       B   L   O   B       )
> 0000400  \r  \0  \0  \0 001 003 363  \0 003 363  \0  \0  \0  \0  \0  \0
> 0000410  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
> *
> 00007f0  \0  \0  \0  \v 001 003 025 025   N 303 243   o   N 303 243   o
> 0000800
> 
> As you see from the hexdump, "Nï¿½o" is encoded as UTF-8.
> 
> It's true it works if you insert a literal BLOB, like with
> x'00112233', but it only means sqlite lacks a byte counting function
> for the general case.
> 
> I don't complain about it, as I only use the SQLite C API, but it
> means there is no coherency if different programs are used to
> manipulate the database (that could insert the data in different
> ways).
> 
> Also note that I've seen a lot of sqlite wrappers/managers in the past
> that don't care the encoding used (still many people think they can
> live with only the first 127 ASCII chars), inserting Latin-1 or others
> literally, making the length function returning the wrong value when
> accented chars or other symbols - like the (c) symbol - are found. It
> would be nice if we could make a quick sanity check on the DB by
> comparing the byte count against the string length.
> 
> For me, it's just a low priority feature request, but I believe there
> are others who may think otherwise.
> 
> 
> Regards,
> ~Nuno Lucas
> 
> 
> > /*
> > ** Implementation of the length() function
> > */
> > static void lengthFunc(
> >   sqlite3_context *context,
> >   int argc,
> >   sqlite3_value **argv
> > ){
> >   int len;
> >
> >   assert( argc==1 );
> >   switch( sqlite3_value_type(argv[0]) ){
> >     case SQLITE_BLOB:
> >     case SQLITE_INTEGER:
> >     case SQLITE_FLOAT: {
> >       sqlite3_result_int(context, sqlite3_value_bytes(argv[0]));
> >       break;
> >     }
> >     case SQLITE_TEXT: {
> >       const char *z = sqlite3_value_text(argv[0]);
> >       for(len=0; *z; z++){ if( (0xc0&*z)!=0x80 ) len++; }
> >       sqlite3_result_int(context, len);
> >       break;
> >     }
> >     default: {
> >       sqlite3_result_null(context);
> >       break;
> >     }
> >   }
> > }
> 



        
                
__________________________________ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com

Re: [sqlite] built-in functrion suggestion: size of blob

Reply via email to