Re: [sqlite] UTF8 and NUL

J Decker Fri, 26 Jan 2018 16:18:56 -0800

On Fri, Jan 26, 2018 at 3:56 PM, Peter Da Silva <
peter.dasi...@flightaware.com> wrote:

> On 2018-01-26, at 17:05, J Decker <d3c...@gmail.com> wrote:
> > On Fri, Jan 26, 2018 at 1:21 PM, Peter Da Silva <
> > peter.dasi...@flightaware.com> wrote:
> >> Sqlite uses NUL as the string terminator internally, the published API
> >> specifies has stuff like this all over the place:
>
> >>> In those routines that have a fourth argument, its value is the number
> of bytes in the parameter. To be clear: the value is the number of bytes in
> the value, not the number of characters. If the fourth parameter to
> sqlite3_bind_text() or sqlite3_bind_text16() is negative, then the length
> of the string is the number of bytes UP TO THE FIRST ZERO TERMINATOR.
>
> > You stressed the wrong part there - *IS NEGATIVE*
>
> Why? Passing -1 as the length is a common way to tell sqlite3 to calculate
> the length itself. It's a documented and widely used part of the API.

Exactly, so on neither side, input or output is there a problem storing a
length of valid characters.
The deficiency is 1) the command line tool for diagnostics
2) always scanning for a nul in prepare() unless the length is before
that.  It's simple to add an option that could change that behavior; or
move the string measuring up to prepare[_v2,_v3,_v4] and even add a V5 that
just passes the length passed without a scan.

The input is read by a tokenizer that returns in-buffer references to the
next SQL token by length.
Some tokens can be quoted, and those end up being a copy of the original;
but the length of the SQL statement should already be known, so it doesn't
need to scan for 0.

Once tokenized it's converted into expressions; those expressions (have
previously) stored only the char*.  It's not a lot of places to change to
include storing the length; which is often known unless the mprintf
internals are used; then any token passed through that does not pass %s.
So %s cannot be used for UTF8 strings; but rather the literal string
fwrite( buf, 1, stringlen, <output) ; gets all the standard character
treatment as the file was opened with (O_BINARY or not, "b" or "t"
specifiers for fopen, or stderr ).

fprintf( out, "%s", (string) );
is exactly the same as
fwrite( out, 1, strlen( string ), string );

(Can anyone dispute that?  I doubt that's specified)

Other than, the fwrite will include outputing the NUL character and trust
the length given to it. \n will still get promoted to \r\n depending on
platform and C library personality.

> Therefore:
>

>
> >> It would be a huge push-up to change this, it would break everything,
> >> including extensions. I don't think it would be possible until something
> >> like sqlite4.
>

maybe I don't understand what you're saying 'it' is.

> _______________________________________________
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UTF8 and NUL

Reply via email to