On 2013/07/27 16:35, Stephen Chrzanowski wrote:
...//verification purposes.  The file time stamp "should" be enough, but there
is that one time of the year when 2am comes twice in a day in most parts of
the world, so, covering that aspect with the concept of looking at file
sizes as part of unique identification.  **As I write this, I just thought
about reading a chunk of the file on the drive and running it against an
MD5 or CRC64/128 check sum algorithm, and storing that result in the
database instead of relying on a file size.... hmmm...  That'd be a balance
between speed and accuracy.  I don't want to chew too much IO time on lower
end machines.  Both cases don't give a 100% accurate assessment to a truly
"unique" file, but I wonder which would give the better rate of accuracy?
Maybe integrating all three?  .. Sorry.. rambling.. tired... heh

It's very much possible for an MD5 Hash to return the same result for two different files, moreso than hitting the the exact same timestamp twice on Timezone day, but less likely than a CRC doing the same, which in turn is less likely than a non-unique filesize. a Timestamp+MD5 Hash is the way to go to ensure uniqueness, or at least, render it's re-occurance likelihood to the anals of oblivion. In this regard, you only need to Hash the first n bytes of the file to save FLOPS where n need only be as big as experimentally determined to avoid hitting the same file header content more than once (the file header is likely to be quite unique from save to save) - probably 4K to 8K bytes would suffice.


As for saving to the SSD.. I hear ya...  I love the speed (Raid-0 dual
250gig SATA3) but hate that they'll die due to running out of good memory
blocks, but, at least not as 'randomly' as a platter drive, and ooohhh so
much quieter. ;)

Well yes, but the IO systems employed by these drives are smart, data do not get written if it doesn't change the underlying memory states etc. Even in strenuous use you should get a good 5 years out of a modern SSD, and when it dies, it will be gradual and with a lot of warning. Beware these"security" utilities that promises to really wipe data by overwriting it several times etc, they will eat through a fresh SSD in a few months. Other than that, an SSD has some longevity and my comment about NOT logging to it has more to do with the normal space restrictions, and of course it is unnecessary to visit abuse on a drive known to not enjoy it.


_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to