Hi Rense,
>Thanks for this idea. In fact, the purpose of my original query is >exactly to reduce the database. The 800 mln rows were exported from >another source, and I was hoping to be able to use sqlite to manage >this massive amount of data (e.g., removing redundant information) >before I proceed with my analyses. >So, would your approach also work if I import the data, rather than >insert? I'm afraid I'm not understanding your question correctly. What's the distinction between import (into SQLite) and insert (into SQLite)? If 10 rows of your raw data looks like e.g. 111, 222 (this couple is direct, n1 <= n2) 115, 227 " 143, 333 " 254, 555 " 567, 666 " 410, 189 (this couple is reverse, n1 > n2) 333, 143 " 227, 115 " 555, 254 " 666, 567 " where only (111, 222) and (189, 410) are not mirrored, you could end up with only 6 rows since all couple of pairs which also have a mirror will collapse into a single row (first value is the flags): 1, 111, 222 2, 189, 410 3, 115, 227 3, 143, 333 3, 254, 555 3, 567, 666 So yes, with a large ratio of mirrored pairs, your DB will shrink and more important will hold roughly half the number of rows (making queries slightly less than twice faster). If your 800M rows have, say, 20M of pairs without mirror, your new Schema would make that: 20M orphan rows with flags in (1, 2) (800 - 20) / 2 = 390M rows with flags = 3 All in all, that would be 410M rows. The initial 800M-row DB holds (800M * 2) = 1600M integers, but the new DB would contains only (410M * 3) = 1230M integers, 410M of them taking only one byte of storage (flags). If your DB is "build once, insert/update rarely and query mostly" you will certainly save much both in space and coffee ;-) You can obviously apply the same schema to the alien DB as well. -- <mailto:j...@q-e-d.org>j...@antichoc.net _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users