Not an expert, but I doubt that WAL will have much of anything to do with
your read performance after you've populated the database and committed the
last insert transaction.  WAL mostly deals with inserts
(http://www.sqlite.org/draft/wal.html).  Once the last transaction is
written and you are only reading, WAL is basically out of the picture.

For this same reason, I doubt that turning off journaling altogether will
have much effect on your read performance, because if all you're doing is
reading then there shouldn't be any journaling at all (reads are not
transactional).

I was thinking that something like what was already suggested might work, if
you could effectively hash values as you insert them using some method that
would make finding duplicates a matter of hash collisions, and especially if
you could order the rows with an index on the hash to help ensure that they
are on database pages close to each other.  Not sure if that will work,
however, with something like a Levenshtein distance if you must compare
words to words that already exist.

I'm also kind of wondering if the full-text search extension (FTS3 or FTS4)
might help you process words faster.

As for your read performance, if you are doing or can do things
single-threaded at read time, either compiling SQLite as single-threaded
(SQLITE_THREADSAFE=0) or setting it single threaded at start time
(SQLITE_CONFIG_SINGLETHREADED passed to sqlite3config()) would probably
improve things noticeably because mutexes would not be used at all.

Other than that, the more cache you can have in memory at once may make a
big difference.  The default SQLite page size is 1Kb, and the default number
of pages kept in the in-memory cache is 2000, so only 2MB of cache.  If you
have enough memory to prevent the OS from doing excessive paging of virtual
memory, you might adjust those values up significantly.  A 4Kb page size is
often a good choice because it often matches hard disk sector size.  If you
do this plus bump up the number of cache pages, you may be able to
significantly reduce disk i/o (which is the slowest operation).

You may be able to go a step further with a large page cache allocated from
a static buffer using sqlite3_config() with SQLITE_CONFIG_PAGECACHE.

If you have enough memory for your cache, you might eek out even some extra
performance if you could allocate this static buffer and lock it in memory
via API calls such as VirtualLock, SetProcessWorkingSetSize(Ex), etc., to
try to ensure that the OS does not push the pages to disk.

-----Original Message-----
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Frank Chang
Sent: Friday, May 27, 2011 11:06 AM
To: sqlite-users@sqlite.org
Subject: Re: [sqlite] Is it possible to optimize the read performanceof a
C++ app using sqlite pragma journal_mode = wal &pragma


  Roger Binns, Thank you suggesting that we run a benchmark that tests our
prototype deduper with and without WAL using different page sizes and
different transactions.
 
>> You never answered the important bit - is your concern about initial 
>> population of the database or about runtime later on.
 
 I apologize for not answering your question. Our primary concern is about
the runtime later on rather than the initial population of the database. Is
it possible for you to recommend how we should use the latest sqlite
distribution(i.e. pragmas,sqlite function parameters) if we are concerned
about the run time later on. Thank you for your help.
                                          
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to