Hi Rense,

>Thanks for this idea. In fact, the purpose of my original query is
>exactly to reduce the database. The 800 mln rows were exported from
>another source, and I was hoping to be able to use sqlite to manage
>this massive amount of data (e.g., removing redundant information)
>before I proceed with my analyses.
>So, would your approach also work if I import the data, rather than 
>insert?

I'm afraid I'm not understanding your question correctly. What's the 
distinction between import (into SQLite) and insert (into SQLite)?

If 10 rows of your raw data looks like e.g.
111, 222        (this couple is direct, n1 <= n2)
115, 227                   "
143, 333                   "
254, 555                   "
567, 666                   "
410, 189        (this couple is reverse, n1 > n2)
333, 143                   "
227, 115                   "
555, 254                   "
666, 567                   "

where only (111, 222) and (189, 410) are not mirrored, you could end up 
with only 6 rows since all couple of pairs which also have a mirror 
will collapse into a single row (first value is the flags):
1, 111, 222
2, 189, 410
3, 115, 227
3, 143, 333
3, 254, 555
3, 567, 666

So yes, with a large ratio of mirrored pairs, your DB will shrink and 
more important will hold roughly half the number of rows (making 
queries slightly less than twice faster).

If your 800M rows have, say, 20M of pairs without mirror, your new 
Schema would make that:
20M orphan rows with flags in (1, 2)
(800 - 20) / 2 = 390M rows with flags = 3
All in all, that would be 410M rows.

The initial 800M-row DB holds (800M * 2) = 1600M integers, but the new 
DB would contains only (410M * 3) = 1230M integers, 410M of them taking 
only one byte of storage (flags).

If your DB is "build once, insert/update rarely and query mostly" you 
will certainly save much both in space and coffee ;-)

You can obviously apply the same schema to the alien DB as well.

--
<mailto:j...@q-e-d.org>j...@antichoc.net  

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to