Re: [sqlite] UTF8 and NUL

J Decker Fri, 26 Jan 2018 16:43:44 -0800

On Fri, Jan 26, 2018 at 4:36 PM, Keith Medcalf <kmedc...@dessus.com> wrote:


>
> I do not understand this at all.
>
> If the definition of a C-String is a 
> "bunch-a-non-zero-byes-terminated-by-a-zero-byte",
> then how is it possible to have a zero/null byte "embedded" within a
> C-Style String?
>
this is about SQL not C.  C is easy \0.  Lots of \\, support. \x00

char thing[] = "Hello\0world";
#define strlen_thing(a)  ( sizeof(a)/(sizeof(a[0])-1 ) )



>
> Similarly, if a C-Style-Wide-String is defined as a
> "bunch-a-non-zero-words-terminated-by-a-zero-word", then how is it
> possible to have a zero/null word "embedded" within a C-Style-Wide-String?
>
> Given that SQLite3 is written in C and uses C-Strings or
> C-Style-Wide-Strings, then you cannot have zero/null bytes embedded in
> those strings.
>
> It certainly can.  It's isolated from the language by working on tokens
and expression pieces for which it has(or can have) the length.



> You may of course argue that perhaps SQLite3 should use something other
> than C-Style-Strings, however, this is not what seems to be proposed.  It
> seems to be proposing the use of some magical C-Style-String that is not
> actually a C-Style-String, without explicitly stating this.
>
>
They don't use c style strings though, the strings passed are SQL, which
only have a single escape for the type of quote it starts with.  If it used
'c style strings' it would behave like MySQL/TSQL but it's not; it's PSQL.

"Hello""World"
'Hello''world'


> SQLite3 does handle non-C-Ctyle-Strings.  They are called "blobs".
>
> It does, and has for a decade or more as char.



> ---
> The fact that there's a Highway to Hell but only a Stairway to Heaven says
> a lot about anticipated traffic volume.
>
>
> >-----Original Message-----
> >From: sqlite-users [mailto:sqlite-users-
> >boun...@mailinglists.sqlite.org] On Behalf Of J Decker
> >Sent: Friday, 26 January, 2018 17:18
> >To: SQLite mailing list
> >Subject: Re: [sqlite] UTF8 and NUL
> >
> >On Fri, Jan 26, 2018 at 3:56 PM, Peter Da Silva <
> >peter.dasi...@flightaware.com> wrote:
> >
> >> On 2018-01-26, at 17:05, J Decker <d3c...@gmail.com> wrote:
> >> > On Fri, Jan 26, 2018 at 1:21 PM, Peter Da Silva <
> >> > peter.dasi...@flightaware.com> wrote:
> >> >> Sqlite uses NUL as the string terminator internally, the
> >published API
> >> >> specifies has stuff like this all over the place:
> >>
> >> >>> In those routines that have a fourth argument, its value is the
> >number
> >> of bytes in the parameter. To be clear: the value is the number of
> >bytes in
> >> the value, not the number of characters. If the fourth parameter to
> >> sqlite3_bind_text() or sqlite3_bind_text16() is negative, then the
> >length
> >> of the string is the number of bytes UP TO THE FIRST ZERO
> >TERMINATOR.
> >>
> >> > You stressed the wrong part there - *IS NEGATIVE*
> >>
> >> Why? Passing -1 as the length is a common way to tell sqlite3 to
> >calculate
> >> the length itself. It's a documented and widely used part of the
> >API.
> >
> >
> >Exactly, so on neither side, input or output is there a problem
> >storing a
> >length of valid characters.
> >The deficiency is 1) the command line tool for diagnostics
> >2) always scanning for a nul in prepare() unless the length is before
> >that.  It's simple to add an option that could change that behavior;
> >or
> >move the string measuring up to prepare[_v2,_v3,_v4] and even add a
> >V5 that
> >just passes the length passed without a scan.
> >
> >The input is read by a tokenizer that returns in-buffer references to
> >the
> >next SQL token by length.
> >Some tokens can be quoted, and those end up being a copy of the
> >original;
> >but the length of the SQL statement should already be known, so it
> >doesn't
> >need to scan for 0.
> >
> >Once tokenized it's converted into expressions; those expressions
> >(have
> >previously) stored only the char*.  It's not a lot of places to
> >change to
> >include storing the length; which is often known unless the mprintf
> >internals are used; then any token passed through that does not pass
> >%s.
> >So %s cannot be used for UTF8 strings; but rather the literal string
> >fwrite( buf, 1, stringlen, <output) ; gets all the standard character
> >treatment as the file was opened with (O_BINARY or not, "b" or "t"
> >specifiers for fopen, or stderr ).
> >
> >fprintf( out, "%s", (string) );
> >is exactly the same as
> >fwrite( out, 1, strlen( string ), string );
> >
> >(Can anyone dispute that?  I doubt that's specified)
> >
> >Other than, the fwrite will include outputing the NUL character and
> >trust
> >the length given to it. \n will still get promoted to \r\n depending
> >on
> >platform and C library personality.
> >
> >
> >
> >
> >
> >> Therefore:
> >>
> >
> >
> >>
> >> >> It would be a huge push-up to change this, it would break
> >everything,
> >> >> including extensions. I don't think it would be possible until
> >something
> >> >> like sqlite4.
> >>
> >
> >maybe I don't understand what you're saying 'it' is.
> >
> >
> >> _______________________________________________
> >> sqlite-users mailing list
> >> sqlite-users@mailinglists.sqlite.org
> >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-
> >users
> >>
> >_______________________________________________
> >sqlite-users mailing list
> >sqlite-users@mailinglists.sqlite.org
> >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UTF8 and NUL

Reply via email to