A few observations.... 1. My condolences with those dimensions you are heading for "big data"/hadoop land.
2. Worry about the number of rows; that's what feeds into the big-oh: O(n). Assuming your 150 columns translate into a 1.5k to 2k record length that means your 300 gigabyte file must have 150 (300 gig/2k) to 200 (300 gig/1.5k) million records. That's a lot of n for O(n). 3. SSDs are faster than spinning disks because they have potentially zero track to track seek times. On the other hand, once the read-write head is above the correct disk sector disks have very highly evolved I/O for consecutive sectors (not fragmented). By contrast, the controllers that provide the interface to SSDs are not nearly as evolved so SSDs may still have lower bandwidth (for example a "Class 10" SD card designed for HD Video still (unless otherwise marked) only has 10 Mbyte per second speed). http://en.wikipedia.org/wiki/Secure_Digital However, a device speed of 10 Mbyte per second would put a SCSI device near the bottom of the speed hierarchy. http://en.wikipedia.org/wiki/SCSI Serial ATA (SATA) is an even faster interface. http://en.wikipedia.org/wiki/Serial_ATA So, I would recommend double checking the specs of your SSD and not necessarily making the reasonable, but not yet true, assumption that solid-state has to be faster than mechanical. One strategy that might work is to have an entirely separate (not joined) 8 column table; develop queries (targets) on that database and then write out the primary key of the rows you are interested in to a separate table (CREATE TABLE AS SELECT primarykey FROM 8columntable WHERE yourquery;) and then JOIN the row reduced table to the main table. If your rowreduced table has millions of fewer records (primary keys) that's millions of rows (in the main 150 column table) where the precompiled SQL query doesn't have to be executed. HTH, Jim Callahan Orlando, FL On Wed, Aug 6, 2014 at 11:51 PM, Paul Dillon <paul.dil...@gmail.com> wrote: > Hello, > > I would like to know if splitting a big table into two smaller ones, and > then using a join in my queries would speed up performance. > > My table is 100-300GB in size and has about 150 columns. There are 8 > fields that I frequently use in my queries, which require full table scans > to complete. I usually query each field once per data load, so the time to > index them is not worth it. (FYI I am data mining). > > So my questions are: > > 1. Will moving these 8 query fields to a smaller table improve query > performance when joined to the larger table? My logic is that this small > table would only be about 5% the size of the full table, so the full table > scan to service the query might be faster. > > 2. Would it be worthwhile having that smaller table in a separate .db file, > so that the data is closer together on my SSD drive? > > 3. Would the data loading performance be heavily impacted if I had to > insert into two tables instead of one? I use "INSERT OR REPLACE" for my > loading, with a single index. > > Many Thanks, > > Paul > _______________________________________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users