On Mon, Apr 2, 2012 at 6:27 PM, Webdude <webd...@thewebdudes.com> wrote: > I am building a file comparison tool that is free as in beer and speech. > The program allows people to put certain things in an SQLite database file, > then an MD5 or SHA hash is run on the resulting file for identity of the > total package contents. > > A key part of the design requires that if another user who is using the same > program, (and probably would have to be using the same version of the SQLite > engine I suspect), if they put exactly the same items into their database > and the program runs a hash on their resulting file, then both hash values > would be the same (= both files are exactly the same). > > Does anyone know if SQLite stores additional unique internal information > such as timestamps etc. that would affect this, and if so could these > "additional to the data" variable features be disabled in any way?
You can't rely on two SQLite3 DBs with the same contents being equal files. The sequences of INSERT/UPDATE/DELETE statements that created the two files with the same contents can differ and thus result in different b-tree layouts. > Also, would they have to add the data in the same order etc? Or would/could > the data be encouraged to be internally ordered the same? No. > Parsing through the database item by item in some fixed pattern to try to > determine a unique content id is not an option, the two separately produced > files must be the exactly the same. > > In short is it possible to use or tweak SQLite so that 2 identical sets of > data produced by 2 separate people would both file hash to the same value? What I recommend is that you have the application (or triggers) keep a sum (XOR) of hashes of all the rows in each table. These running checksums, as it were, would be updated after each INSERT/UPDATE/DELETE, for each row. Then to check the contents of two DB files for equality quickly you'd compare these running XORs. You might have a single running XOR for the whole DB. You might want to use SHA-1 for this, or better: SHA-256. Nico -- _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users