[sqlite] Re: fts1 corruption debugging.

Scott Hess Fri, 12 Jan 2007 16:46:28 -0800

http://www.sqlite.org/cvstrac/tktview?tn=2166


I'm probably not going to be back on this until Monday or Tuesday,
unfortunately.

-scott


On 1/12/07, Scott Hess <[EMAIL PROTECTED]> wrote:

[Find attached the file I'm using to debug this.]

I think I've found the difference causing this, but I don't understand
why it matters.  It all should apply to fts2, the code in question
didn't change in a way likely to change this.

When an insert is done against an fts1 table, index_insert() is called
with the list of sqlite3_values passed in from sqlite code, which in
turn calls content_insert() with those values, which runs an insert
statement binding each value to the appropriate parameter.  Then
index_insert() calls insertTerms() to tokenize the data and insert the
terms into the fulltext index.

When an update is done, index_update() is called with the list of
sqlite3_values.  Here, it calls insertTerms() to insert the terms into
the fulltext index, then content_update() to write the data into the
content table.

The important point is that insertTerms() calls sqlite3_value_text()
on the values.  This call appears to destructively convert a
UTF16LE-encoded value to a UTF8-encoded value.  So, on insert the
values bound are UTF16 values, while on update the values bound are
UTF8 values.

At this time I don't understand why this would be the case, I would
expect sqlite to convert things as needed (the enc variable in the
sqlite3_value _appears_ to be correct).  But, indeed, if I rearrange
the calls to insertTerms() and content_update() in fts1.c
index_update(), things work as expected.

-scott


-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

[sqlite] Re: fts1 corruption debugging.

Reply via email to