A while ago I  reported
<http://sqlite.1065341.n5.nabble.com/What-can-be-deduced-from-integrity-check-td70451.html>
  
about DB corruption issues that we occasionally receive from our users. They
always have the same pattern:
- A few rowid's missing from a few indexes, and
- A few "wrong # of entries in index xxx" errors.
No other integrity errors, just these two types.

In an earlier  reply
<http://sqlite.1065341.n5.nabble.com/What-can-be-deduced-from-integrity-check-tp70451p70454.html>
  
Mr. Hipp said that "the fact that the corruption is restricted to an index
is probably a coincidence".

Well, based on the number of reports we received it hardly can be a
coincidence. Does anybody have any theory how these things happen? Based on
my limited understanding of the sqlite source code it looks like (my
speculation) as if all b-trees (tables and indexes) were stored correctly,
except the contents of some indexes referred to some point in the past.

A few details:

- We use sqlite 3.7.15.2, custom encryption and custom collation. The
collation shouldn't be the culprit because the majority of the indexes
corrupted relate to non-textual columns.

- Sqlite setup:
  -- WAL mode
  -- PRAGMA synchronous=Normal    // Should we change this to FULL?
  -- iOS only: PRAGMA fullfsync=1     // Important. Substantially decreases
# of corruptions.

- In case we get sqlite-related exception containing the word "malformed",
we perform an integrity_check and log its output.

- Problems happen on iOS platform. This may be a coincidence because the
majority of our users use that platform.

- DB writes are grouped into transactions. What is strange - corrupted
indexes do not relate to a single transaction. In other words, several write
transaction would have to fail in order to produce described corruption.

- There is one potentionally risky operation that our app performs: The data
exchange with a remote WebService. This can take long (10+ min). Users often
switch to email reading or similar activity, the app then runs on the
background and can be eventually killed by OS. This might happen during an
unfinished transaction. 

- (Continution of the previous point) What is worse, WAL logs can be huge:
In one case I saw a 900MB log file (the whole DB was roughly 2/3 of that).
This means that we cannot exclude that the OS kills the application while
running a checkpoint operation.

On another note. If we cannot avoid occasional corruption, it would be great
if we had a kind of autorepair. Specifically in our case, when the integrity
checks indicate that a REINDEX operation should help.

Suppose that integrity_check
a) does not fail with a fatal error
b) lists only 2 types of messages:
- rowid ... missing from index ...
- wrong # of entries in index ...

then rebuilding of all indexes listed in the report should repair the
database. (Note that I am trying to avoid full REINDEX because it is a slow
operation.)

Does this make sense?



--
View this message in context: 
http://sqlite.1065341.n5.nabble.com/Integrity-check-tp77519.html
Sent from the SQLite mailing list archive at Nabble.com.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to