A while ago I reported <http://sqlite.1065341.n5.nabble.com/What-can-be-deduced-from-integrity-check-td70451.html> about DB corruption issues that we occasionally receive from our users. They always have the same pattern: - A few rowid's missing from a few indexes, and - A few "wrong # of entries in index xxx" errors. No other integrity errors, just these two types.
In an earlier reply <http://sqlite.1065341.n5.nabble.com/What-can-be-deduced-from-integrity-check-tp70451p70454.html> Mr. Hipp said that "the fact that the corruption is restricted to an index is probably a coincidence". Well, based on the number of reports we received it hardly can be a coincidence. Does anybody have any theory how these things happen? Based on my limited understanding of the sqlite source code it looks like (my speculation) as if all b-trees (tables and indexes) were stored correctly, except the contents of some indexes referred to some point in the past. A few details: - We use sqlite 3.7.15.2, custom encryption and custom collation. The collation shouldn't be the culprit because the majority of the indexes corrupted relate to non-textual columns. - Sqlite setup: -- WAL mode -- PRAGMA synchronous=Normal // Should we change this to FULL? -- iOS only: PRAGMA fullfsync=1 // Important. Substantially decreases # of corruptions. - In case we get sqlite-related exception containing the word "malformed", we perform an integrity_check and log its output. - Problems happen on iOS platform. This may be a coincidence because the majority of our users use that platform. - DB writes are grouped into transactions. What is strange - corrupted indexes do not relate to a single transaction. In other words, several write transaction would have to fail in order to produce described corruption. - There is one potentionally risky operation that our app performs: The data exchange with a remote WebService. This can take long (10+ min). Users often switch to email reading or similar activity, the app then runs on the background and can be eventually killed by OS. This might happen during an unfinished transaction. - (Continution of the previous point) What is worse, WAL logs can be huge: In one case I saw a 900MB log file (the whole DB was roughly 2/3 of that). This means that we cannot exclude that the OS kills the application while running a checkpoint operation. On another note. If we cannot avoid occasional corruption, it would be great if we had a kind of autorepair. Specifically in our case, when the integrity checks indicate that a REINDEX operation should help. Suppose that integrity_check a) does not fail with a fatal error b) lists only 2 types of messages: - rowid ... missing from index ... - wrong # of entries in index ... then rebuilding of all indexes listed in the report should repair the database. (Note that I am trying to avoid full REINDEX because it is a slow operation.) Does this make sense? -- View this message in context: http://sqlite.1065341.n5.nabble.com/Integrity-check-tp77519.html Sent from the SQLite mailing list archive at Nabble.com. _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users