On Mon, Feb 14, 2011 at 10:56 AM, Pavel Ivanov <paiva...@gmail.com> wrote:
>> So my question is, does it maintain the other 3 parts of ACID, so that
>> the database will never be in a corrupted state after a power loss,
>> even though it may be missing some writes that were confirmed?
>
> Jim, I think the answer to your question is already in Max's tests:
> the USB drive is completely unreliable and can easily lead to database
> corruption. I'll explain. Max's tests showed that there were
> situations when database and journal were different not by one
> transaction but by several ones. So if one makes changes to several
> database pages (located in different disk sectors) and/or makes
> transactions touching several database pages (which makes multi-page
> journal files) then these pages can be way out of sync with each other
> (including pages inside journal). And this will easily lead to
> database corruption.

You are right I think.  I wrote my own test program and ran it on a
Macbook Pro with a Seagate USB drive.  Here is the Python test
program:

import os, sqlite3, time

path = '/Volumes/TM/file.db'
exists = os.path.exists(path)
con = sqlite3.connect(path)
if exists:
    k = con.execute('select max(k) from t').fetchone()[0]
    print 'Previous run:', k
    con.execute('drop table t')
    con.commit()
con.execute('create table t (k integer primary key)')

t0 = time.time()
try:
    for i in range(5000):
        con.execute('insert into t values(?)', (i,))
        con.commit()
    con.close()
except Exception, err:
    print err, 'at record', i

et = time.time() - t0
print i, 'records inserted in', et, 'seconds;', i/et, 'recs/sec'


Here are the results:

1. First run, no unplug:

[jim@mb backup]$ time py t.py
4999 records inserted in 17.7488458157 seconds; 281.652117097 recs/sec

real    0m17.794s
user    0m0.729s
sys     0m2.139s

2. Unplugged during the run:

[jim@mb backup]$ time py t.py
Previous run: 4999
disk I/O error at record 2681
2681 records inserted in 9.2398519516 seconds; 290.156164194 recs/sec

real    0m9.294s
user    0m0.410s
sys     0m1.193s

3. This shows that only 161 records made it to disk in the previous
run, not 2681:

[jim@mb backup]$ time py t.py
Previous run: 161
disk I/O error at record 977
977 records inserted in 3.34149599075 seconds; 292.384010845 recs/sec

real    0m3.453s
user    0m0.162s
sys     0m0.432s

4. This time there was no unplug, but SQLite could not recover the db:

[jim@mb backup]$ time py t.py
Previous run: 161
database disk image is malformed at record 1323
1323 records inserted in 4.26392698288 seconds; 310.277358246 recs/sec

real    0m4.341s
user    0m0.193s
sys     0m0.535s

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://www.hashbackup.com

>
>
> Pavel
>
> On Mon, Feb 14, 2011 at 8:49 AM, Jim Wilcoxson <pri...@gmail.com> wrote:
>> On Mon, Feb 14, 2011 at 2:33 AM, Max Vlasov <max.vla...@gmail.com> wrote:
>>>
>>> On Sun, Feb 13, 2011 at 8:53 PM, Jim Wilcoxson <pri...@gmail.com> wrote:
>>>
>>> > > But I thought about how it would be possible to test this explanation .
>>> > I'm
>>> > > going to do some tests that works like this. The same id updating, but 
>>> > > in
>>> > > the middle of 10,000 operation I will unplug the cord, the sqlite will
>>> > say
>>> > > that it can't commit particular insert and I can see what is the value 
>>> > > of
>>> > > this insert. After replugging two variants possible:
>>> > > - I will find the previous value in the base. If for several dozens 
>>> > > tests
>>> > > the result is the same, that would mean that the situation is still
>>> > needed
>>> > > to be explained.
>>> > > - I will find some value less than previous to the one sqlite could not
>>> > > commit at least in some of the tests (maybe -3, -4 to the failed one). 
>>> > > In
>>> > > this case the explanation will be confirmed.
>>> > >
>>> > > How about this? Does it sound reasonable?
>>> > >
>>> > >
>>> > Sounds like a clever test to me!
>>> >
>>> >
>>> >
>>> Thanks for supporting :) now the results.
>>>
>>> I switched off the journal:
>>>  PRAGMA journal_mode=OFF;
>>> As I described, the only record contained id=1 and the sql query was
>>>  UPDATE TestTable SET Id=Id + 1
>>> The cord was unplugged in the middle of the 10,000 operations when about
>>> 5000 records were updated. The hd was bus-powered external hard drive and I
>>> repeated the test several times. No test showed expected value that confirms
>>> the Jim's explanation about postponed writing. The actual values are below
>>> (the first one is the expected value to be found after reopening, the next
>>> one is actual value and the difference is self-explainable)
>>>
>>> 5094 -> 5086 = -8
>>> 5084 -> 5083 = -1
>>> 5070 -> 5049 = -21
>>> 5082 -> 5069 = -13
>>> 5095 -> 5086 = -9
>>> 5072 -> 5033 = -39
>>> 5118 -> 5053 = -65
>>> 5081 -> 5075 = -6
>>>
>>> So the maximum of non-flushed commits was 65
>>>
>>> I also made a couple of tests with journal on and see what is the difference
>>> between expected, non-restored value, and restored value. One of the
>>> results:
>>> 5078 (expeced) -> 5077 (non restored) -> 5021 (restored)
>>> It seems that for correctly operating os/hardware the numbers should be
>>> 5078->5077->5077 or no journal present depending on the moment of
>>> unplugging. So this postponing also made possible existing of some prior
>>> version of journal file.
>>>
>>> So, the next question was 'where?'. Is this software or hardware to blame.
>>> Richard always mentioned hardware in this context, I decided to check this
>>> with another device. This time it was a harddisk box with separated bus and
>>> power (the hd was Maxtor IDE 80Gb drive).
>>>
>>> The first variation was similar to the one with bus-powered, this time I
>>> used hard button on the box that is equivalent to unplugging both connection
>>> and power. The difference for a single test was really impressive 5355 ->
>>> 4445 = -910. And when I calculated numbers for non-interrupted test the
>>> drive showed about 300 commits per second.
>>>
>>> The second variation was just unplugging the cord but keeping the power
>>> intact, so if it's drive that caches, it would end its operations
>>> completely. This time the results were perfect, for example 4822 -> 4822,
>>> and even 5371 -> 5372 = +1 that actually would mean the process was
>>> interrupted after all data is written but before sqlite made winsync, os
>>> reported failure, but the data was already there.
>>>
>>> So the sad news about faulty hardware is probably true once again.
>>
>> Hi Max - this turned out to be a good test you devised!  You know, you
>> can never learn *too* much. :)
>>
>> Your test confirms that the USB drive is saying data is written to the
>> drive when it really isn't, which seems bad.  But it got me thinking a
>> little more.  Your drive for sure has lost the Durability property of
>> ACID: when SQLite comes back from a commit, the data just written may
>> or may not  be on the drive, as you proved.
>>
>> So my question is, does it maintain the other 3 parts of ACID, so that
>> the database will never be in a corrupted state after a power loss,
>> even though it may be missing some writes that were confirmed?  There
>> are many use cases where a drive like yours would be usable by SQLite:
>> it would be as if the async vfs thing was being used.  But if using
>> this drive could lead to a corrupted database, the drive isn't safe to
>> use at all with SQLite, IMO.
>>
>> It would be interesting to know the difference, and even better, if
>> SQLite or an SQLite app could somehow run a test on a drive to figure
>> out whether the drive is:
>>
>> a) perfectly safe: every commit is on the metal
>> b) somewhat safe: the last n commits may be missing, but the db will
>> never be corrupted after a power loss
>> c) not safe: the db may be corrupted on power loss
>>
>> I think the difference between a) and b) has to do with write
>> ordering.  If you issue the write sequence:
>>
>> a1 a2 a3 sync
>> b1 b2 sync
>> c1 c2 c3 sync
>>
>> then (I think) as long as all a's are written before b's and b's are
>> written before c's, the drive should be "somewhat safe" even it lies
>> and says everything is on the metal.  If the drive writes data in any
>> order it wants, regardless of syncs, then I think it could lead to a
>> corrupted db on a power loss.
>>
>> Jim
>> --
>> HashBackup: easy onsite and offsite Unix backup
>> http://www.hashbackup.com
>> _______________________________________________
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to