Re: [sqlite] speeding up row by row lookup in a large db

P Kishor Sun, 22 Mar 2009 05:20:32 -0700

On Sun, Mar 22, 2009 at 6:14 AM, Stefan Evert <stefan.ev...@uos.de> wrote:
..
>
> However, Perl does introduce a lot of overhead, so it may not be the best
> choice for high-performance applications.  In my simplistic benchmarks, Perl
> did not achieve more than approx. 400,000 subroutine calls per second
> (MacBook Pro, 2.5 GHz).  If you're reading your 7,300 result rows in a loop
> with fetchrow_* method calls (or if similar subroutines are called
> internally when using fetchall_* or selectall_*), then you're already
> limited to less than 60 queries/s by the function call overhead.
>
> If you don't need to access the rows separately, you could try to pack the
> complete data for each met_id with Storable.pm and store it in a single
> database blob.  I haven't tried that myself, though, so I don't know whether
> that would be any faster.


Aha! So, if you read my original post that started this thread, that
is exactly the approach I have been thinking of... see my perlmonks
post on the same topic -- http://www.perlmonks.org/?node_id=752247

The problem I am facing is the size of the BLOB as well as the speed
of the SELECT query + INSERT BLOB query.

Each BLOB, compressed (with Compress::Zlib -- I am using
Data::Serializer to serialize and compress using Storable) is approx.
430 KB. That translates into a bit over 400 GB for the entire db,
which ain't gonna fit on my laptop. So, can't test it here. Ok, so I
have to chop up the db into a smaller problem first so I can do my
portable testing .The production machines won't have the space
problem, but still, moving a 400 GB file around, which has ballooned
from an original 430 MB file (my current SQLite db file is 430 MB with
all 1 million rows) is going to be less than convenient.

The second problem is the conversion from the current normalized db
structure to a de-normalized BLOB. With each SELECT query taking about
33 ms, just the SELECTs for all 1 million rows will take about 9
hours. Add to that the time to INSERT the BLOBs in the new table...
and, if the INSERT rate (or the SELECT) rate is not linear as the
SQLite file grows, I could be in for a long time doing the conversion.
As I noted in my original post, after an overnight run of conversion,
only a fraction of the rows had been converted to BLOBs before I
killed the process.

Of course, I could test with a 100 or a 1000 rows and do my
benchmarking, but those numbers are going to be misleading if the
SELECT and/or INSERTs are non-linear with file size.


>
>>> Another thing to keep in mind is that the SQLite version included in the
>>> DBD::SQLite distribution is fairly old (3.4.0 on my Mac -- I doubt
>>> there's a
>>> more recent version of DBD::SQLite around), and AFAIK there have been
>>> some
>>> speed improvements in SQLite recently.
>>
>> I am using DBD::SQLite::Amalgamation with SQLite 3.6.11. I am not sure
>> there is anything wrong with DBD::SQLite 1.1.4 at all (other than its
>> use of the older SQLite code, which is easily corrected). One might
>> get better bang by ensuring that the latest version of DBI is
>> installed, which Tim Bunce and company are probably always
>> fine-tuning.
>
> DBD::SQLite is fine and works very well for me, but hasn't seen any updates
> to a newer SQLite version in quite a long time.  My understanding is that
> it's not entirely trivial to replace the SQLite source code (without
> understanding all the XS interfacing code), and that it would be better and
> easier to use the amalgamation anyway.
>
> I'd love to switch to DBD::SQLite::Amalgamation, but I need to register my
> own callback functions, and this package fails the callback tests with a bus
> error.
>


Ok... first, it *is* trivial to replace the SQLite source, and if I
can do it, anyone can do it... certainly you who seems to know the
first C about programming (I don't).

Step 1. Download DBD::SQLite::Amalgamation
Step 2. Replace *.h in #1 with all the headers from new version.
Replace sqlite-amalgamation.c with the new sqlite3.c
Step 3. perl Makefile.PL, make && make install

Second, I am using a few callbacks, and I have no problem at all. At
one point, I do remember getting bus errors, but that was many, many
versions ago (with DBD::SQLite::Amalgamation's included version of
sqlite3). I replaced the db code in the package as shown in the steps
1-3 above, and the bus error went away.





-- 
Puneet Kishor http://www.punkish.org/
Nelson Institute for Environmental Studies http://www.nelson.wisc.edu/
Carbon Model http://carbonmodel.org/
Open Source Geospatial Foundation http://www.osgeo.org/
Sent from: Madison WI United States.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] speeding up row by row lookup in a large db

Reply via email to