> Any threads or processes would still be in contention for the same resource: 
> access to that file on disk.


Midway through the run, `iostat` is showing me exactly 0 disk accesses, so I'm 
assuming it's a locking problem and not a physical I/O bottleneck. I'm hoping 
that you folks could give me an idea of what I'm doing wrong. (It could also be 
a problem with the Mac OS locking.)

> The best indexes to create will depend on whether this is a one-time job or 
> whether you're going to have to do it repeatedly with similar or completely 
> different data.


For each row, I'm making a query for similar rows. The full DB is 17 million 
rows. I've tried adding all the indexes I can think of, but the only one that 
helps this query is the R*tree index. They were all simple indices on one or 
more columns, so if you have an idea on a more complex index, I'd apply it and 
test it out.

~Seth


On Aug 4, 2011, at 9:54 AM, Simon Slavin wrote:

> 
> On 4 Aug 2011, at 4:44pm, Seth Price wrote:
> 
>> I was hoping that SQLite was doing that with the JOIN statement. It's 
>> narrowing down the results with the query from the R*table (col*min & 
>> col*max). A large query will have ~14k rows at this point. Then JOINs with 
>> the original table (data) and it's narrowing results further with the 
>> remaining part of the WHERE (col*). The large query will have ~11k rows. 
>> After this it groups by the 'class' column in 'data' and counts how many 
>> rows per class.
>> 
>> I'm really skeptical that I can chop this up into many queries to make it 
>> run faster. I've always assumed that if I can get "final" data back from a 
>> query, then it's exactly the query I'm looking for. I would think that 
>> running three queries would use three times the resources.
> 
> Threading is a red herring.  Any threads or processes would still be in 
> contention for the same resource: access to that file on disk.  This is what 
> caused the observation in your 'Subject' header.  All the threads are doing 
> is waiting for another thread to get out the way.
> 
> Speed increases can come from adding one or more indexes.  The best indexes 
> to create will depend on whether this is a one-time job or whether you're 
> going to have to do it repeatedly with similar or completely different data.
> 
> Simon.
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to