Re: [sqlite] Hashing 2 SQLite db files with the same data

Webdude Mon, 02 Apr 2012 21:40:12 -0700

/  Inserting the same data in the same order on the same platform

/>>>/  with the same (PRAGMA) settings would result in the files
/>>>/  matching identically.
//


/>>/  Do you feel that the platform - Hardware / OS / some other factor could 
influence the way SQLite performed its sequence?
/

 SQLite stores data inside its files in blocks called 'pages'.
 When you create a new database file SQLite has to pick a page size.
 The page size it picks depends on some details about the hard disk
 the file will be created on (and also on some compilation settings).
 To optimize speed it might, for instance, make pages the size of the
 disk's sectors.  So you can run code on a computer, one time writing
 your file to one hard disk, and another time writing to a hard disk
 with a different sector size, and end up with files with different
 page sizes, and these files will, of course, have different hashes.

 For details, see

 <http://www.sqlite.org/pragma.html#pragma_page_size>



Cool...
this still sounds promising though, the first part of the paragraph states...

"When a new database is created, SQLite assigned a default page size based on 
information received from the xSectorSize and xDeviceCharacteristics methods of 
thesqlite3_io_methods  <http://www.sqlite.org/c3ref/io_methods.html>  object of the 
newly created database file.
The page_size pragma will only cause an immediate change in the page size if it is 
issued while the database is still empty, prior to the first CREATE TABLE 
statement."

...and the second paragraph talks about how it comes up with the default value, 
but doesn't imply it will re-change your value explicitly set with PRAGMA 
page_size.

So it sounds like if I create a new empty DB file, set the  PRAGMA 
page_size=myAllwaysGoingToBeTheSameInt  (before any tables are written), then 
begin the systematic sequential dump of data previously collected in the 
editing file,
that maybe, just maybe I might be able to build a finished file that will hash 
the same when created on any hardware / HDD / OS using this same program and 
SQLite version?!?!?

I may get a slight file access performance hit where the page_size is not 
optimal for the system.
And it will probably restrict me to lock in to using the same SQLite version, 
as this documentation implies that the version number is also added in the 
header starting at offset 96...

http://www.sqlite.org/fileformat2.html#usable_size

...which is probably also the only way of ensuring that SQLite doesn't change 
the way it does things in future releases, but also means I'll have to stick to 
one version that is known to be the most reliable (any suggestions?).


I will have to write up a little test case and try it on a few different 
machines / OS' before I burst into raptures or tears.

Does anyone else know of any other hidden file variables or SQLite system 
process' that would prevent a byte-for-byte perfect re-creation of an SQLite db 
file using the same data but on 2 or more different machines?

I'm sure everyone thinks I'm mad, but I still haven't seen proof of "Can't be 
done".



Cheers Simon, thanks for your time and effort, you have been very helpful.

David.



_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Hashing 2 SQLite db files with the same data

Reply via email to