On Thu, Apr 10, 2014 at 12:04 PM, Grzegorz Sikorski <g.sikor...@kelvatek.com
> wrote:

> Hi,
>
> We developed an application which base on SQLite3. It is running on ARM
> processor with Linux 2.6.37 (no, there is no easy way to upgrade it). We
> put our database on microSD card (industrial grade, very reliable)
> formatted with ext4 partition. The database is used in WAL mode with
> default settings. Database schema is relatively simple: two tables, from
> which one stores only point names and the other is heavily used to store
> incoming data points. To speed up searching the data table, we setup R*tree
> indexes separately for each name (let say around 200 R*tree indexes). The
> problem is we occasionally observe database corruption and whole database
> becomes rubbish. The problem is very rare and occurs only (as far as we
> were able to confirm so far) if there is a power lose during write to the
> database. We have chosen WAL mode to allow read access to database during
> writing and also for performance reasons (database is on SD card and we run
> on embedded system). It is not yet confirmed for sure, but most likely the
> problem occurs only if reader and writer access the database at the same
> time and power lose occurs.
>
> We admit there may be a problem with SD drivers, POSIX mutex
> implementation or ext4 driver (we are running relatively old version of
> Linux kernel), however we have never observed corruption of any other file
> on the same partition, even though this partition is heavily used for other
> purposes (syslog, all temporary data, configuration, some exported data
> files in zip format). We also use POSIX mutexes in other programs and have
> never discovered any issue (apart from invalid use in developer's code;)
> Does anyone know if there is any "unstable/unknown" part of SQLite3 code
> which may cause this issue? We recently upgraded SQLite to the latest
> version, but it did not help. I am aware we use sort of new (maybe edge
> case) features, like WAL mode and R*tree indexes. Any advice will be
> appreciated.
>

There are no unstable/unknown parts of the code.  SQLite3 is used daily by
billions, literally.

For further information on how to corrupt an SQLite database see
http://www.sqlite.org/howtocorrupt.html

My guess is that the cause of corruption is that your hardware/os
combination is ignoring fsync()s during a checkpoint operation, causing
writes to occur out-of-order.   During a checkpoint operation, SQLite
copies content from the WAL file into the main database.  Then it
fsync()s.  Then it truncates and overwrites the header of the WAL file.
Now suppose writes are occurring out-of-order and there is a power loss.
Some of the content that overwrites the main database makes it onto the SD
card, but some does not, resulting in hopelessly scrambed main database.
But the truncation and overwrite of the WAL file does occur, so that the
checkpoint cannot be repeated to clear the error.  You are left with a
corrupted database file and no way to recover it.  fsync() is very
important.

Log files and ZIP archives do not commonly go corrupt on a power loss
because they are append-only.  They are not written randomly.

One other thing:  Are you sure that the power loss didn't corrupt the
filesystem?  Did you run fsck?


-- 
D. Richard Hipp
d...@sqlite.org
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to