Re: [sqlite] NUL handling bugs (was Re: c-api document suggestion)

David Garfield Mon, 26 Sep 2011 09:44:26 -0700

Roger Binns writes:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 09/23/2011 05:51 PM, David Garfield wrote:
> >> SQLite's API supports both (mostly).  Internally, you must use one or
> >> the other (or hideously duplicate code),
> 
> Not really.  If your own code only uses NUL termination then use that form
> of APIs.  If you use counted strings then use that form.  As a developer
> using SQLite you do not have to use both.  And you can mix and match with
> what is most convenient at each call, although supporting embedded NUL
> requires the counted form for obvious reasons.


As the implementer of an API, you must use one form or the other, or
hideously duplicate code.  It is TRIVIAL to convert from the
NUL-terminated form to the counted form, so that is what is usually
done.  (I will admit to being surprised that SQLite goes into two
levels of internal function calls before it converts, especially since
by that time it has put the text and text16 forms together.)

> >> and SQLite uses the second --
> >> except for some functions (which use the hybrid model).  That
> >> exception is the bug.
> 
> The "bug" is that a performance optimisation is mentioned in the doc.  The
> internal SQL parsing code always stops at a NUL, and requires that a string
> be NUL terminated.  If you do not explicitly provide one then it will copy
> the string in order to NUL terminate it.  Sure this is a little messy and
> could be explained a little better but it isn't a bug.  The internal code
> could also change in the future to avoid the NUL requirement but I'd expect
> that to be *really* low in the list of priorities.

I went and looked for that performance optimization, and that
documentation.  It applies only to sqlite3_prepare*().  The various
data methods do not have the same clause.  And the data methods are
the only ones I would really expect NUL handling to be right.

> >> Correction: with the exception of a number of BUILT IN functions.
> 
> I meant user defined functions in the sense of components of a SQL statement
> (like verbs, operators and collations are components).  Yet another pesky
> ambiguation introduced by the user word!
> 
> Note that you can override all built in user defined functions - just
> register one with the same name.  You do however have to ensure that you
> register variants for the different Unicode encodings.
> 
> ie you can make your installation of SQLite behave exactly how you want.
> Should the built in implementations be fixed?  IMHO yes, but it isn't a
> priority. In the 6 years since SQLite 3 has been available you are only the
> second person to complain.  (I was the first :-)

So is it in the bug system?  Or is it being dismissed or ignored?  I
would have been willing to accept that this was a known bug and
documented bug, but that isn't what I'm hearing here.  Indeed, Richard
claimed that this was a "feature", and likened it to ignoring UTF-8
encoding errors.

And I am not the second.  I started this subject as a follow-up to a
message from "Mira Suk" <mira....@centrum.cz> complaining about the
same thing, so I am at least the third.  There are probably more.

> >> sqlite3_value_*() and sqlite3_result_*() are fully capable of using
> >> the counted model,
> 
> Indeed.  It is how I ensure my code is NUL safe/correct.  A far bigger bug
> for those functions is that they use int for the size of data rather than
> size_t.  I did a survey a few years back using google code search and every
> instance I could find where -1 was not passed in as the length treated them
> as though they used size_t and would result in (silent) truncation on 64 bit
> machines.  My own code explicitly makes sure the values about to be passed
> in are less than 2GB.

Using int instead of size_t A) unnaturally limits data length,
possibly to 64K and B) doesn't well describe the data.  Beyond that,
it isn't necessarily a bug.  The language needed size_t to provide a
type guaranteed to be sufficient to hold any sizeof() result, but no
API need use it.

> >> Of course, the SQLite shell does it anyway.  So "cannot" is not really
> >> correct.
> 
> Well you can always spew arbitrary bytes to stdout which generally works for
> people who only ever use ASCII.  But the rule really is that bytes cannot be
> converted to characters without knowing the encoding.

Right.  The SQLite shell just assumes you are running in a uniform
character encoding environment (usually safe), and that it is UTF-8
(safe for many users). 

> > The SQLite shell isn't particular well structured for easy developer
> > extension.
> >> I've seen that...  ouch.
> 
> It is best to think of the shell as a convenience tool for the SQLite
> developers to throw commands at the library as they add them, not as some
> formal tool for SQL access.  It works reasonably well.
> 
> >> And your python wrapper is probably implemented using the counted
> >> string form exclusively.  :-)
> 
> Originally it used whichever forms were most convenient at that place in the
> code.  This is further complicated by Python being compilable with two
> different forms of Unicode character size and SQLite having UTF8 and UTF16
> apis.  Then one day I discovered that SQLite allows embedded NUL in string
> values and made sure my code always works correctly - I'm OCD like that with
> my wrapper.
> 
> Roger
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
> 
> iEYEARECAAYFAk5+B3cACgkQmOOfHg372QSoPACfRNsbvh4ztr9MtGCQsAtxVMtU
> 09oAoN+U8AfKsebx+sqoUIKBorNUq6Hz
> =eFoT
> -----END PGP SIGNATURE-----
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] NUL handling bugs (was Re: c-api document suggestion)

Reply via email to