On Tue, 12 Apr 2005, Maksim Yevmenkin wrote:

>Dear SQLite users,
>
>consider this
>
> [snip]
>
>it only took 4+ seconds to read, parse, perform hash table lookup and
>sum the data. note that for unique 1417 keys it had to do hash lookup
>and hash insert.
>
>so, just with plain ascii file i get four times the speed i get with
>sqlite. note that my c program will scale linearly with the size of
>dataset (just like i see with sqlite).
>
>so,
>
>- what am i doing wrong here?


Not managing your expectations. Try the test with another SQL database,
and see what sort of speed you get.


>
>- any suggestions on how to speed things up with sqlite?


Buy a faster machine. No, I'm not being facetious.


>
>- is sqlite optimized to retrieve one row from the table?


When fetching that one row using an index, yes, that's what indexes are
for. But when an index scan touches a significant proportion of the
corresponding table, then it becomes less than optimal, due the reasons
given previously in the thread:
- Cache thrashing
- Index + Table access
- CPU overhead
- Non-linear file access

>
>i'm starting to think that having index and data in the same file is
>not such a great idea when you want to perform range requests.


Unless the index and data are on different disks, you'll get no benefit by
splitting them from the same file.


>
>thanks,
>max


Christian


>
>On Apr 12, 2005 11:33 AM, Maksim Yevmenkin <[EMAIL PROTECTED]> wrote:
>> Gé,
>>
>> thanks for the suggestion. unfortunately it did not make any
>> difference :( below is the results. as you can see it takes 7+ seconds
>> to "group by"  333,392 records and i'm grouping by column on which i
>> have index. again, i'm not a database guy, but i think that is slow.
>> perhaps someone can comment if that is the best sqlite can do?
>>
>>
>> sqlite> .schema data
>> CREATE TABLE data(
>>   a INTEGER,
>>   b INTEGER,
>>   c CHAR,
>>   d INTEGER,
>>   e INTEGER,
>>   n1 FLOAT,
>>   n2 FLOAT
>> );
>> CREATE INDEX data_by_a on data (a);
>>
>>
>> sqlite> select count(*) from data;
>> 333392
>>
>> test-1.sql
>> ------------
>> PRAGMA cache_size = 300000;
>> PRAGMA cache_size;
>> PRAGMA page_size;
>> SELECT count(*) FROM (SELECT a FROM data WHERE a <= 18234721 GROUP BY a);
>>
>> test-2.sql
>> ------------
>> PRAGMA cache_size;
>> PRAGMA page_size;
>> SELECT count(*) FROM (SELECT a FROM data WHERE a <= 18234721 GROUP BY a);
>>
>> > time sqlite3 db < test-1.sql
>> 300000
>> 1024
>> 1417
>> 6.89u 0.33s 0:07.55 95.6%
>>
>> > time sqlite3 db < test-2.sql
>> 2000
>> 1024
>> 1417
>> 6.91u 0.19s 0:07.39 96.0%
>>
>> > time sqlite3 db2048 < test-1.sql
>> 300000
>> 2048
>> 1417
>> 6.80u 0.08s 0:07.32 93.9%
>>
>> > time sqlite3 db2048 < test-2.sql
>> 2000
>> 2048
>> 1417
>> 6.77u 0.12s 0:07.10 97.0%
>>
>> > time sqlite3 db4096 < test-1.sql
>> 300000
>> 4096
>> 1417
>> 6.80u 0.15s 0:07.21 96.3%
>>
>> > time sqlite3 db4096 < test-2.sql
>> 2000
>> 4096
>> 1417
>> 6.79u 0.15s 0:07.15 97.0%
>>
>> > time sqlite3 db8192 < test-1.sql
>> 300000
>> 8192
>> 1417
>> 6.70u 0.11s 0:07.01 97.1%
>>
>> > time sqlite3 db8192 < test-2.sql
>> 2000
>> 8192
>> 1417
>> 6.73u 0.09s 0:07.01 97.2%
>>
>> thanks,
>> max
>>
>>
>> On Apr 12, 2005 7:10 AM, Gé Weijers <[EMAIL PROTECTED]> wrote:
>> > Maksim,
>> >
>> > Some things you could try:
>> >
>> > 1) increase cache memory
>> >
>> > You may be causing a lot of cache misses if the size of the query result
>> > is very large compared to the size of the cache. Index-based searches
>> > can cause multiple reloads of the same page because of a lack of
>> > locality in the cache. An index-less search will just load each page once.
>> >
>> > as an experiment, try 'PRAGMA cache_size = 300000', before you run the
>> > query. 1 GB of ram should be able to support 300MB of cache.
>> >
>> > 2) use 8192-byte pages
>> >
>> > Larger pages seem to improve performance quite a bit, in my experience.
>> >
>> > Do 'PRAGMA page_size = 8192' before you create the database.
>> >
>> > Doing both may cause excessive memory use (200000 * 8K = ...). I've
>> > never tried that.
>> >
>> >
>> > Gé
>> >
>> >
>> > Maksim Yevmenkin wrote:
>> >
>> > >Robert,
>> > >
>> > >
>> > >
>> > >>[snip]
>> > >>
>> > >>
>> > >>
>> > >>>i said i print these rows to /dev/null too in my perl code. plus the
>> > >>>perl code does some other things such as joining these rows with other
>> > >>>hashes and summing the numbers.
>> > >>>
>> > >>>
>> > >>That's fine.  I was merely trying to account for the 50% speed difference
>> > >>between the two differing column tests, which has been accomplished.
>> > >>
>> > >>
>> > >>
>> > >>>>As for the temp table ... I haven't tried this, but isn't
>> > >>>>
>> > >>>>
>> > >>>"temp" a reserved
>> > >>>
>> > >>>
>> > >>>>word in SQLite?  More importantly, you should be doing this
>> > >>>>
>> > >>>>
>> > >>>statement inside
>> > >>>
>> > >>>yes, it is. i really want to create 'temporary table' in memory. i was
>> > >>>really hoping it would speed things up.
>> > >>>
>> > >>>
>> > >>I misread the statement, so ignore me on that part.  However, 339,000 
>> > >>rows
>> > >>into a temporary in-memory table ... I tried some experiments locally 
>> > >>here
>> > >>and none of them took more than 2 seconds to execute.  Are you sure 
>> > >>you're
>> > >>not using up all available memory, which is causing the system to hit the
>> > >>swapfile?  What does this same query look like when you drop the "temp" 
>> > >>from
>> > >>the query?
>> > >>
>> > >>
>> > >
>> > >the system has 1G of ram. i was "monitoring" sqlite3 memory usage with
>> > >'top'. the SIZE and RES did not exceed 30M. so i do not think the
>> > >memory is the issue here.
>> > >
>> > >
>> > >
>> > >>time sqlite3 db 'create table foo as select * from data where a <= 
>> > >>18234721' > /dev/null
>> > >>
>> > >>
>> > >22.06u 1.39s 0:27.75 84.5%
>> > >
>> > >so pretty much the same time without 'temp'.
>> > >
>> > >i'm starting to suspect disk. here is what i did. i created a separate
>> > >database with only one table. this table contains subset of 333392
>> > >rows from original data table. it also has the same index on "a"
>> > >column, i.e. i did
>> > >
>> > >
>> > >
>> > >>sqlite3 db1
>> > >>
>> > >>
>> > >sqlite> attach db as s;
>> > >sqlite> create table data as select * from s.data where a <= 18234721;
>> > >sqlite> create index data_by_a on data (a);
>> > >
>> > >full scan
>> > >
>> > >
>> > >
>> > >>time sqlite3 db1 'select n1 from data' > /dev/null
>> > >>
>> > >>
>> > >17.19u 0.55s 0:19.06 93.0%
>> > >
>> > >"bad" index scan, because it is guaranteed then the table only has
>> > >keys that match "where"
>> > >
>> > >
>> > >
>> > >>time sqlite3 db1 'select n1 from data where a <= 18234721' > /dev/null
>> > >>
>> > >>
>> > >25.73u 0.59s 0:28.37 92.7%
>> > >
>> > >+10 seconds! is this the overhead of "indexed" scan? is this what it
>> > >really takes to seek back and forth between index and data? what am i
>> > >missing here?
>> > >
>> > >thanks,
>> > >max
>> > >
>> > >
>> >
>> >
>>
>

-- 
    /"\
    \ /    ASCII RIBBON CAMPAIGN - AGAINST HTML MAIL
     X                           - AGAINST MS ATTACHMENTS
    / \

Reply via email to