On Mon, Feb 14, 2011 at 10:56 AM, Pavel Ivanov <paiva...@gmail.com> wrote: >> So my question is, does it maintain the other 3 parts of ACID, so that >> the database will never be in a corrupted state after a power loss, >> even though it may be missing some writes that were confirmed? > > Jim, I think the answer to your question is already in Max's tests: > the USB drive is completely unreliable and can easily lead to database > corruption. I'll explain. Max's tests showed that there were > situations when database and journal were different not by one > transaction but by several ones. So if one makes changes to several > database pages (located in different disk sectors) and/or makes > transactions touching several database pages (which makes multi-page > journal files) then these pages can be way out of sync with each other > (including pages inside journal). And this will easily lead to > database corruption.
You are right I think. I wrote my own test program and ran it on a Macbook Pro with a Seagate USB drive. Here is the Python test program: import os, sqlite3, time path = '/Volumes/TM/file.db' exists = os.path.exists(path) con = sqlite3.connect(path) if exists: k = con.execute('select max(k) from t').fetchone()[0] print 'Previous run:', k con.execute('drop table t') con.commit() con.execute('create table t (k integer primary key)') t0 = time.time() try: for i in range(5000): con.execute('insert into t values(?)', (i,)) con.commit() con.close() except Exception, err: print err, 'at record', i et = time.time() - t0 print i, 'records inserted in', et, 'seconds;', i/et, 'recs/sec' Here are the results: 1. First run, no unplug: [jim@mb backup]$ time py t.py 4999 records inserted in 17.7488458157 seconds; 281.652117097 recs/sec real 0m17.794s user 0m0.729s sys 0m2.139s 2. Unplugged during the run: [jim@mb backup]$ time py t.py Previous run: 4999 disk I/O error at record 2681 2681 records inserted in 9.2398519516 seconds; 290.156164194 recs/sec real 0m9.294s user 0m0.410s sys 0m1.193s 3. This shows that only 161 records made it to disk in the previous run, not 2681: [jim@mb backup]$ time py t.py Previous run: 161 disk I/O error at record 977 977 records inserted in 3.34149599075 seconds; 292.384010845 recs/sec real 0m3.453s user 0m0.162s sys 0m0.432s 4. This time there was no unplug, but SQLite could not recover the db: [jim@mb backup]$ time py t.py Previous run: 161 database disk image is malformed at record 1323 1323 records inserted in 4.26392698288 seconds; 310.277358246 recs/sec real 0m4.341s user 0m0.193s sys 0m0.535s Jim -- HashBackup: easy onsite and offsite Unix backup http://www.hashbackup.com > > > Pavel > > On Mon, Feb 14, 2011 at 8:49 AM, Jim Wilcoxson <pri...@gmail.com> wrote: >> On Mon, Feb 14, 2011 at 2:33 AM, Max Vlasov <max.vla...@gmail.com> wrote: >>> >>> On Sun, Feb 13, 2011 at 8:53 PM, Jim Wilcoxson <pri...@gmail.com> wrote: >>> >>> > > But I thought about how it would be possible to test this explanation . >>> > I'm >>> > > going to do some tests that works like this. The same id updating, but >>> > > in >>> > > the middle of 10,000 operation I will unplug the cord, the sqlite will >>> > say >>> > > that it can't commit particular insert and I can see what is the value >>> > > of >>> > > this insert. After replugging two variants possible: >>> > > - I will find the previous value in the base. If for several dozens >>> > > tests >>> > > the result is the same, that would mean that the situation is still >>> > needed >>> > > to be explained. >>> > > - I will find some value less than previous to the one sqlite could not >>> > > commit at least in some of the tests (maybe -3, -4 to the failed one). >>> > > In >>> > > this case the explanation will be confirmed. >>> > > >>> > > How about this? Does it sound reasonable? >>> > > >>> > > >>> > Sounds like a clever test to me! >>> > >>> > >>> > >>> Thanks for supporting :) now the results. >>> >>> I switched off the journal: >>> PRAGMA journal_mode=OFF; >>> As I described, the only record contained id=1 and the sql query was >>> UPDATE TestTable SET Id=Id + 1 >>> The cord was unplugged in the middle of the 10,000 operations when about >>> 5000 records were updated. The hd was bus-powered external hard drive and I >>> repeated the test several times. No test showed expected value that confirms >>> the Jim's explanation about postponed writing. The actual values are below >>> (the first one is the expected value to be found after reopening, the next >>> one is actual value and the difference is self-explainable) >>> >>> 5094 -> 5086 = -8 >>> 5084 -> 5083 = -1 >>> 5070 -> 5049 = -21 >>> 5082 -> 5069 = -13 >>> 5095 -> 5086 = -9 >>> 5072 -> 5033 = -39 >>> 5118 -> 5053 = -65 >>> 5081 -> 5075 = -6 >>> >>> So the maximum of non-flushed commits was 65 >>> >>> I also made a couple of tests with journal on and see what is the difference >>> between expected, non-restored value, and restored value. One of the >>> results: >>> 5078 (expeced) -> 5077 (non restored) -> 5021 (restored) >>> It seems that for correctly operating os/hardware the numbers should be >>> 5078->5077->5077 or no journal present depending on the moment of >>> unplugging. So this postponing also made possible existing of some prior >>> version of journal file. >>> >>> So, the next question was 'where?'. Is this software or hardware to blame. >>> Richard always mentioned hardware in this context, I decided to check this >>> with another device. This time it was a harddisk box with separated bus and >>> power (the hd was Maxtor IDE 80Gb drive). >>> >>> The first variation was similar to the one with bus-powered, this time I >>> used hard button on the box that is equivalent to unplugging both connection >>> and power. The difference for a single test was really impressive 5355 -> >>> 4445 = -910. And when I calculated numbers for non-interrupted test the >>> drive showed about 300 commits per second. >>> >>> The second variation was just unplugging the cord but keeping the power >>> intact, so if it's drive that caches, it would end its operations >>> completely. This time the results were perfect, for example 4822 -> 4822, >>> and even 5371 -> 5372 = +1 that actually would mean the process was >>> interrupted after all data is written but before sqlite made winsync, os >>> reported failure, but the data was already there. >>> >>> So the sad news about faulty hardware is probably true once again. >> >> Hi Max - this turned out to be a good test you devised! You know, you >> can never learn *too* much. :) >> >> Your test confirms that the USB drive is saying data is written to the >> drive when it really isn't, which seems bad. But it got me thinking a >> little more. Your drive for sure has lost the Durability property of >> ACID: when SQLite comes back from a commit, the data just written may >> or may not be on the drive, as you proved. >> >> So my question is, does it maintain the other 3 parts of ACID, so that >> the database will never be in a corrupted state after a power loss, >> even though it may be missing some writes that were confirmed? There >> are many use cases where a drive like yours would be usable by SQLite: >> it would be as if the async vfs thing was being used. But if using >> this drive could lead to a corrupted database, the drive isn't safe to >> use at all with SQLite, IMO. >> >> It would be interesting to know the difference, and even better, if >> SQLite or an SQLite app could somehow run a test on a drive to figure >> out whether the drive is: >> >> a) perfectly safe: every commit is on the metal >> b) somewhat safe: the last n commits may be missing, but the db will >> never be corrupted after a power loss >> c) not safe: the db may be corrupted on power loss >> >> I think the difference between a) and b) has to do with write >> ordering. If you issue the write sequence: >> >> a1 a2 a3 sync >> b1 b2 sync >> c1 c2 c3 sync >> >> then (I think) as long as all a's are written before b's and b's are >> written before c's, the drive should be "somewhat safe" even it lies >> and says everything is on the metal. If the drive writes data in any >> order it wants, regardless of syncs, then I think it could lead to a >> corrupted db on a power loss. >> >> Jim >> -- >> HashBackup: easy onsite and offsite Unix backup >> http://www.hashbackup.com >> _______________________________________________ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> > _______________________________________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users