Dear SQLite users,

consider this

schema
---------
sqlite> .schema data
CREATE TABLE data(
  a INTEGER,
  b INTEGER,
  c CHAR,
  d INTEGER,
  e INTEGER,
  n1 FLOAT,
  n2 FLOAT
);
CREATE INDEX data_by_a on data (a);



data set
---------
sqlite> select count(*) from data where a <= 18234721;
333392



test-4.sql
-----------
PRAGMA cache_size = 300000;
PRAGMA cache_size;
PRAGMA page_size;
SELECT COUNT(*) FROM (SELECT a,sum(n1) FROM data WHERE a <= 18234721
GROUP BY a);



test-5.sql
-----------
PRAGMA cache_size = 300000;
PRAGMA cache_size;
PRAGMA page_size;
SELECT COUNT(*) FROM (SELECT a FROM data WHERE a <= 18234721 GROUP BY a);


and finally


> time sqlite3 db < test-4.sql
300000
1024
1417
13.95u 0.99s 0:15.73 94.9%


> time sqlite3 db < test-5.sql
300000
1024
1417
6.88u 0.22s 0:07.35 96.5%



with sum(n1) added query runs twice as slow. as i was told its because
sqlite has to fetch data row. fine, but why its soo slow?! and it
scales linearly. if i give it twice as much data it takes twice as
much time to process it. i wrote simple c program that reads data of
the disk as sums the values for unique keys

> wc -l test.file 
 334412 test.file

> head test.file 
0 22891
1 22311
2 20040
3 31891
4 17681
5 16198
6 221
7 32377
8 14080
9 3212

> time x -r test.file 
number of unique keys = 1417, mallocs = 334413, frees = 332996
4.33u 0.06s 0:04.51 97.3%

it only took 4+ seconds to read, parse, perform hash table lookup and
sum the data. note that for unique 1417 keys it had to do hash lookup
and hash insert.

so, just with plain ascii file i get four times the speed i get with
sqlite. note that my c program will scale linearly with the size of
dataset (just like i see with sqlite).

so,

- what am i doing wrong here?

- any suggestions on how to speed things up with sqlite?

- is sqlite optimized to retrieve one row from the table?

i'm starting to think that having index and data in the same file is
not such a great idea when you want to perform range requests.

thanks,
max

On Apr 12, 2005 11:33 AM, Maksim Yevmenkin <[EMAIL PROTECTED]> wrote:
> Gé,
> 
> thanks for the suggestion. unfortunately it did not make any
> difference :( below is the results. as you can see it takes 7+ seconds
> to "group by"  333,392 records and i'm grouping by column on which i
> have index. again, i'm not a database guy, but i think that is slow.
> perhaps someone can comment if that is the best sqlite can do?
> 
> 
> sqlite> .schema data
> CREATE TABLE data(
>   a INTEGER,
>   b INTEGER,
>   c CHAR,
>   d INTEGER,
>   e INTEGER,
>   n1 FLOAT,
>   n2 FLOAT
> );
> CREATE INDEX data_by_a on data (a);
> 
> 
> sqlite> select count(*) from data;
> 333392
> 
> test-1.sql
> ------------
> PRAGMA cache_size = 300000;
> PRAGMA cache_size;
> PRAGMA page_size;
> SELECT count(*) FROM (SELECT a FROM data WHERE a <= 18234721 GROUP BY a);
> 
> test-2.sql
> ------------
> PRAGMA cache_size;
> PRAGMA page_size;
> SELECT count(*) FROM (SELECT a FROM data WHERE a <= 18234721 GROUP BY a);
> 
> > time sqlite3 db < test-1.sql
> 300000
> 1024
> 1417
> 6.89u 0.33s 0:07.55 95.6%
> 
> > time sqlite3 db < test-2.sql
> 2000
> 1024
> 1417
> 6.91u 0.19s 0:07.39 96.0%
> 
> > time sqlite3 db2048 < test-1.sql
> 300000
> 2048
> 1417
> 6.80u 0.08s 0:07.32 93.9%
> 
> > time sqlite3 db2048 < test-2.sql
> 2000
> 2048
> 1417
> 6.77u 0.12s 0:07.10 97.0%
> 
> > time sqlite3 db4096 < test-1.sql
> 300000
> 4096
> 1417
> 6.80u 0.15s 0:07.21 96.3%
> 
> > time sqlite3 db4096 < test-2.sql
> 2000
> 4096
> 1417
> 6.79u 0.15s 0:07.15 97.0%
> 
> > time sqlite3 db8192 < test-1.sql
> 300000
> 8192
> 1417
> 6.70u 0.11s 0:07.01 97.1%
> 
> > time sqlite3 db8192 < test-2.sql
> 2000
> 8192
> 1417
> 6.73u 0.09s 0:07.01 97.2%
> 
> thanks,
> max
> 
> 
> On Apr 12, 2005 7:10 AM, Gé Weijers <[EMAIL PROTECTED]> wrote:
> > Maksim,
> >
> > Some things you could try:
> >
> > 1) increase cache memory
> >
> > You may be causing a lot of cache misses if the size of the query result
> > is very large compared to the size of the cache. Index-based searches
> > can cause multiple reloads of the same page because of a lack of
> > locality in the cache. An index-less search will just load each page once.
> >
> > as an experiment, try 'PRAGMA cache_size = 300000', before you run the
> > query. 1 GB of ram should be able to support 300MB of cache.
> >
> > 2) use 8192-byte pages
> >
> > Larger pages seem to improve performance quite a bit, in my experience.
> >
> > Do 'PRAGMA page_size = 8192' before you create the database.
> >
> > Doing both may cause excessive memory use (200000 * 8K = ...). I've
> > never tried that.
> >
> >
> > Gé
> >
> >
> > Maksim Yevmenkin wrote:
> >
> > >Robert,
> > >
> > >
> > >
> > >>[snip]
> > >>
> > >>
> > >>
> > >>>i said i print these rows to /dev/null too in my perl code. plus the
> > >>>perl code does some other things such as joining these rows with other
> > >>>hashes and summing the numbers.
> > >>>
> > >>>
> > >>That's fine.  I was merely trying to account for the 50% speed difference
> > >>between the two differing column tests, which has been accomplished.
> > >>
> > >>
> > >>
> > >>>>As for the temp table ... I haven't tried this, but isn't
> > >>>>
> > >>>>
> > >>>"temp" a reserved
> > >>>
> > >>>
> > >>>>word in SQLite?  More importantly, you should be doing this
> > >>>>
> > >>>>
> > >>>statement inside
> > >>>
> > >>>yes, it is. i really want to create 'temporary table' in memory. i was
> > >>>really hoping it would speed things up.
> > >>>
> > >>>
> > >>I misread the statement, so ignore me on that part.  However, 339,000 rows
> > >>into a temporary in-memory table ... I tried some experiments locally here
> > >>and none of them took more than 2 seconds to execute.  Are you sure you're
> > >>not using up all available memory, which is causing the system to hit the
> > >>swapfile?  What does this same query look like when you drop the "temp" 
> > >>from
> > >>the query?
> > >>
> > >>
> > >
> > >the system has 1G of ram. i was "monitoring" sqlite3 memory usage with
> > >'top'. the SIZE and RES did not exceed 30M. so i do not think the
> > >memory is the issue here.
> > >
> > >
> > >
> > >>time sqlite3 db 'create table foo as select * from data where a <= 
> > >>18234721' > /dev/null
> > >>
> > >>
> > >22.06u 1.39s 0:27.75 84.5%
> > >
> > >so pretty much the same time without 'temp'.
> > >
> > >i'm starting to suspect disk. here is what i did. i created a separate
> > >database with only one table. this table contains subset of 333392
> > >rows from original data table. it also has the same index on "a"
> > >column, i.e. i did
> > >
> > >
> > >
> > >>sqlite3 db1
> > >>
> > >>
> > >sqlite> attach db as s;
> > >sqlite> create table data as select * from s.data where a <= 18234721;
> > >sqlite> create index data_by_a on data (a);
> > >
> > >full scan
> > >
> > >
> > >
> > >>time sqlite3 db1 'select n1 from data' > /dev/null
> > >>
> > >>
> > >17.19u 0.55s 0:19.06 93.0%
> > >
> > >"bad" index scan, because it is guaranteed then the table only has
> > >keys that match "where"
> > >
> > >
> > >
> > >>time sqlite3 db1 'select n1 from data where a <= 18234721' > /dev/null
> > >>
> > >>
> > >25.73u 0.59s 0:28.37 92.7%
> > >
> > >+10 seconds! is this the overhead of "indexed" scan? is this what it
> > >really takes to seek back and forth between index and data? what am i
> > >missing here?
> > >
> > >thanks,
> > >max
> > >
> > >
> >
> >
>

Reply via email to