On Mon, Apr 2, 2012 at 6:27 PM, Webdude <webd...@thewebdudes.com> wrote:
> I am building a file comparison tool that is free as in beer and speech.
> The program allows people to put certain things in an SQLite database file,
> then an MD5 or SHA hash is run on the resulting file for identity of the
> total package contents.
>
> A key part of the design requires that if another user who is using the same
> program, (and probably would have to be using the same version of the SQLite
> engine I suspect), if they put exactly the same items into their database
> and the program runs a hash on their resulting file, then both hash values
> would be the same (= both files are exactly the same).
>
> Does anyone know if SQLite stores additional unique internal information
> such as timestamps etc. that would affect this, and if so could these
> "additional to the data" variable features be disabled in any way?

You can't rely on two SQLite3 DBs with the same contents being equal
files.  The sequences of INSERT/UPDATE/DELETE statements that created
the two files with the same contents can differ and thus result in
different b-tree layouts.

> Also, would they have to add the data in the same order etc? Or would/could
> the data be encouraged to be internally ordered the same?

No.

> Parsing through the database item by item in some fixed pattern to try to
> determine a unique content id is not an option, the two separately produced
> files must be the exactly the same.
>
> In short is it possible to use or tweak SQLite so that 2 identical sets of
> data produced by 2 separate people would both file hash to the same value?

What I recommend is that you have the application (or triggers) keep a
sum (XOR) of hashes of all the rows in each table.  These running
checksums, as it were, would be updated after each
INSERT/UPDATE/DELETE, for each row.

Then to check the contents of two DB files for equality quickly you'd
compare these running XORs.  You might have a single running XOR for
the whole DB.

You might want to use SHA-1 for this, or better: SHA-256.

Nico
--
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to