When I get my act together and get un-lazy about it, I want to take a large
set of "small" files and put them into a SQLite database blobs in a table,
then have my app run some kind of CRC check
(CRC32/MD5SUM/SHA1/custom/whatever - Haven't decided on a CRC or just
byte-for byte comparisons) to identify byte exact duplicate files.  I'm not
so much concerned about the physical file size of the database file and how
the page size can cross 4k sized cluster boundaries on a disk drive, but a
possible optimization on the page size within the database for reading and
writing the data from and to the database file.

ALL of the files are going to be a multiple of 174,848 bytes.  So that,
plus that CRC check, plus an integer based auto-inc'd field, I'm wondering
if there is going to be any optimization internally from SQLite by making
the page size, say, 180k? (I know, there'll be unused bytes in the page,
but, that's a low priority thing -- I'll tighten up that page size and
re-import data when I get more to a "production" state)

My app is going to be on Windows, and I'm not concerned about constraints
of memory or CPUs or anything of the sort.  I'm just thinking about reading
in and out of the database as a POSSIBLE early (Not exactly premature I
believe) optimization.  I figure that if SQLite is going to read in the
174,848 byte blob, it'd be quicker to read in that one page file into
memory rather than take multiple stabs across a fragmented file, of course,
having the full understanding that fragmentation will happen at the file
system level.

These raw blobs are going to be sitting in a table by themselves, with a
separate many-to-one table sitting elsewhere for other nefarious reasons.
I know that this table, plus any others, are going to end up sitting in the
180k sized pages, but beyond that, is there any practical gain going to
180k instead of the default 4k?
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to