Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

Jim Wilcoxson Mon, 14 Feb 2011 08:48:55 -0800

On Mon, Feb 14, 2011 at 10:56 AM, Pavel Ivanov <paiva...@gmail.com> wrote:
>> So my question is, does it maintain the other 3 parts of ACID, so that
>> the database will never be in a corrupted state after a power loss,
>> even though it may be missing some writes that were confirmed?
>
> Jim, I think the answer to your question is already in Max's tests:
> the USB drive is completely unreliable and can easily lead to database
> corruption. I'll explain. Max's tests showed that there were
> situations when database and journal were different not by one
> transaction but by several ones. So if one makes changes to several
> database pages (located in different disk sectors) and/or makes
> transactions touching several database pages (which makes multi-page
> journal files) then these pages can be way out of sync with each other
> (including pages inside journal). And this will easily lead to
> database corruption.
>
>
> Pavel


Hmm... I guess I am not quite convinced. :)

I believe (though am not sure) that the drive can reorder individual
sector/record writes between syncs however it likes, as long as it
doesn't allow writes to cross a sync and also executes the sync groups
(for lack of a better term - the group of writes between syncs) in
order.

As an example, say you are inserting records on each commit instead of
updating.  Maybe the last record inserted was 1000 and the plug is
pulled.  When plugged back in, if you only have 800 records in the db,
it's not so bad.  However, if there is a corrupt db instead, it is
much worse.

Maybe Max would be kind enough to re-run his test with an insert
instead of an update and see if there ever is corruption or if just
the last n inserts are missing.

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://www.hashbackup.com

>
> On Mon, Feb 14, 2011 at 8:49 AM, Jim Wilcoxson <pri...@gmail.com> wrote:
>> On Mon, Feb 14, 2011 at 2:33 AM, Max Vlasov <max.vla...@gmail.com> wrote:
>>>
>>> On Sun, Feb 13, 2011 at 8:53 PM, Jim Wilcoxson <pri...@gmail.com> wrote:
>>>
>>> > > But I thought about how it would be possible to test this explanation .
>>> > I'm
>>> > > going to do some tests that works like this. The same id updating, but 
>>> > > in
>>> > > the middle of 10,000 operation I will unplug the cord, the sqlite will
>>> > say
>>> > > that it can't commit particular insert and I can see what is the value 
>>> > > of
>>> > > this insert. After replugging two variants possible:
>>> > > - I will find the previous value in the base. If for several dozens 
>>> > > tests
>>> > > the result is the same, that would mean that the situation is still
>>> > needed
>>> > > to be explained.
>>> > > - I will find some value less than previous to the one sqlite could not
>>> > > commit at least in some of the tests (maybe -3, -4 to the failed one). 
>>> > > In
>>> > > this case the explanation will be confirmed.
>>> > >
>>> > > How about this? Does it sound reasonable?
>>> > >
>>> > >
>>> > Sounds like a clever test to me!
>>> >
>>> >
>>> >
>>> Thanks for supporting :) now the results.
>>>
>>> I switched off the journal:
>>>  PRAGMA journal_mode=OFF;
>>> As I described, the only record contained id=1 and the sql query was
>>>  UPDATE TestTable SET Id=Id + 1
>>> The cord was unplugged in the middle of the 10,000 operations when about
>>> 5000 records were updated. The hd was bus-powered external hard drive and I
>>> repeated the test several times. No test showed expected value that confirms
>>> the Jim's explanation about postponed writing. The actual values are below
>>> (the first one is the expected value to be found after reopening, the next
>>> one is actual value and the difference is self-explainable)
>>>
>>> 5094 -> 5086 = -8
>>> 5084 -> 5083 = -1
>>> 5070 -> 5049 = -21
>>> 5082 -> 5069 = -13
>>> 5095 -> 5086 = -9
>>> 5072 -> 5033 = -39
>>> 5118 -> 5053 = -65
>>> 5081 -> 5075 = -6
>>>
>>> So the maximum of non-flushed commits was 65
>>>
>>> I also made a couple of tests with journal on and see what is the difference
>>> between expected, non-restored value, and restored value. One of the
>>> results:
>>> 5078 (expeced) -> 5077 (non restored) -> 5021 (restored)
>>> It seems that for correctly operating os/hardware the numbers should be
>>> 5078->5077->5077 or no journal present depending on the moment of
>>> unplugging. So this postponing also made possible existing of some prior
>>> version of journal file.
>>>
>>> So, the next question was 'where?'. Is this software or hardware to blame.
>>> Richard always mentioned hardware in this context, I decided to check this
>>> with another device. This time it was a harddisk box with separated bus and
>>> power (the hd was Maxtor IDE 80Gb drive).
>>>
>>> The first variation was similar to the one with bus-powered, this time I
>>> used hard button on the box that is equivalent to unplugging both connection
>>> and power. The difference for a single test was really impressive 5355 ->
>>> 4445 = -910. And when I calculated numbers for non-interrupted test the
>>> drive showed about 300 commits per second.
>>>
>>> The second variation was just unplugging the cord but keeping the power
>>> intact, so if it's drive that caches, it would end its operations
>>> completely. This time the results were perfect, for example 4822 -> 4822,
>>> and even 5371 -> 5372 = +1 that actually would mean the process was
>>> interrupted after all data is written but before sqlite made winsync, os
>>> reported failure, but the data was already there.
>>>
>>> So the sad news about faulty hardware is probably true once again.
>>
>> Hi Max - this turned out to be a good test you devised!  You know, you
>> can never learn *too* much. :)
>>
>> Your test confirms that the USB drive is saying data is written to the
>> drive when it really isn't, which seems bad.  But it got me thinking a
>> little more.  Your drive for sure has lost the Durability property of
>> ACID: when SQLite comes back from a commit, the data just written may
>> or may not  be on the drive, as you proved.
>>
>> So my question is, does it maintain the other 3 parts of ACID, so that
>> the database will never be in a corrupted state after a power loss,
>> even though it may be missing some writes that were confirmed?  There
>> are many use cases where a drive like yours would be usable by SQLite:
>> it would be as if the async vfs thing was being used.  But if using
>> this drive could lead to a corrupted database, the drive isn't safe to
>> use at all with SQLite, IMO.
>>
>> It would be interesting to know the difference, and even better, if
>> SQLite or an SQLite app could somehow run a test on a drive to figure
>> out whether the drive is:
>>
>> a) perfectly safe: every commit is on the metal
>> b) somewhat safe: the last n commits may be missing, but the db will
>> never be corrupted after a power loss
>> c) not safe: the db may be corrupted on power loss
>>
>> I think the difference between a) and b) has to do with write
>> ordering.  If you issue the write sequence:
>>
>> a1 a2 a3 sync
>> b1 b2 sync
>> c1 c2 c3 sync
>>
>> then (I think) as long as all a's are written before b's and b's are
>> written before c's, the drive should be "somewhat safe" even it lies
>> and says everything is on the metal.  If the drive writes data in any
>> order it wants, regardless of syncs, then I think it could lead to a
>> corrupted db on a power loss.
>>
>> Jim
>> --
>> HashBackup: easy onsite and offsite Unix backup
>> http://www.hashbackup.com
>> _______________________________________________
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

Reply via email to