Re: [sqlite] Copying from a sqlite database in parallel.

2013-08-06 Thread Bo Peng
Hi, RSmith, >> One of the problems we have is the performance of analyzing large >> tables one by one. For example, we are puzzled that it takes from 40 >> seconds to 10 minutes to copy tables with similar sizes from a >> database to another. > > You should be puzzled... that cannot be right

Re: [sqlite] Copying from a sqlite database in parallel.

2013-08-05 Thread Bo Peng
>> For >> example, if we create 1000 tables and insert records to them in batch, > > Do you really have 1000 different schemas so you need 1000 different tables > to store data ? Can these not be merged into one table ? Unfortunately, the 1000 tables can have slightly different schemas and I do

Re: [sqlite] Copying from a sqlite database in parallel.

2013-08-05 Thread Bo Peng
ly > benefits the processor's ability to push more FLOPS through different > pipelines, this helps calculations a lot, but is the least concern when > copying data. > > > > On 2013/08/06 01:15, Bo Peng wrote: >> >> Dear sqlite experts, >> >> I would like to cop

[sqlite] Copying from a sqlite database in parallel.

2013-08-05 Thread Bo Peng
Dear sqlite experts, I would like to copy some tables from a sqlite database to others (e.g. 'INSERT INTO to_db.table SELECT * FROM from_db.table' where to_db is attached to from_db). To improve the performance of this process, I am trying to run several processes, each read from "from_db" and

Re: [sqlite] How to know the memory usage of an in-memory database.

2012-06-25 Thread Bo Peng
>> In any case, you set the cache size in pages by executing "PRAGMA >> cache_size=" after opening the database connection. I am using "PRAGMA cache_size=-300" to set the cache to 3G, but the process is still slow-going using 23M of RAM, despite the fact that it is reading small pieces of

Re: [sqlite] How to know the memory usage of an in-memory database.

2012-06-25 Thread Bo Peng
> Are these multiple tables in a single database (file), or multiple databases > (files)?  Multiple connections or a single connection? Right now there are multiple read-only processes to read the same file. If I go with any RAM-based solution, I will have to use a single process to read

Re: [sqlite] How to know the memory usage of an in-memory database.

2012-06-25 Thread Bo Peng
> Most of the problem is that by keeping the data in separate tables, you are > ensuring that you almost never get hits from cache.  Do these tables need to > be separate or can you merge them on disk ?  Do they have the same columns ? Unfortunately these tables can have slightly different

Re: [sqlite] How to know the memory usage of an in-memory database.

2012-06-25 Thread Bo Peng
On Mon, Jun 25, 2012 at 8:29 PM, Keith Medcalf wrote: > What OS is it that does not have a block disk cache? > > Have you tried simply allocating a large page cache? The application is cross-platform (python / sqlite / C). I frankly do not know how to create large page cache

[sqlite] How to know the memory usage of an in-memory database.

2012-06-25 Thread Bo Peng
Dear sqlite experts, My application needs to run a lot of queries from a large sqlite database (>100G) with many small tables (>10,000). The performance of the queries are acceptable if the database is on a SSD drive, but can be 50 times or more slower on a regular or network drive. Because some

Re: [sqlite] Optimizing a query with range comparison.

2011-12-28 Thread Bo Peng
significantly boost the performance. I am very happy that I finally find a solution for my problem and I appreciate all the responses from the list. Thank you! Bo On Tue, Dec 27, 2011 at 11:57 AM, Bo Peng <ben@gmail.com> wrote: > I will report back my if I can use this module > to

Re: [sqlite] Optimizing a query with range comparison.

2011-12-27 Thread Bo Peng
On Tue, Dec 27, 2011 at 10:47 AM, Igor Tandetnik wrote: > If you need to do this with any regularity, you should look at R-Tree > module: > > http://www.sqlite.org/rtree.html I do have a lot of range-based queries and rtree seems to be a perfect solution for my problem. I am

Re: [sqlite] Optimizing a query with range comparison.

2011-12-27 Thread Bo Peng
>> Try using a JOIN instead.  In fact, try both ways around: >> >> DELETE FROM ids; >> INSERT INTO ids SELECT tableA.id FROM tableA JOIN tableB ON pos BETWEEN >> start AND end; >> >> then try >> >> DELETE FROM ids; >> INSERT INTO ids SELECT tableA.id FROM tableB JOIN tableA ON pos BETWEEN >>

[sqlite] Optimizing a query with range comparison.

2011-12-27 Thread Bo Peng
Dear Sqlite experts, The schema of my test tables is as follows: CREATE TABLE tableA (id INT, pos INT); CREATE TABLE tableB (start INT, end INT); CREATE INDEX tableA_idx on tableA (pos asc); CREATE INDEX tableB_idx on tableB (start asc, end asc); CREATE TABLE ids (id INT); tableA saves

Re: [sqlite] EXT :Re: Concurrent readonly access to a large database.

2011-10-25 Thread Bo Peng
> Doing vacuum on a 288 Gb database is probably going to take some time. I submitted the command yesterday night and nothing seems to be happening after 8 hours (sqlite3 is running and there is disk activity, but I do not see a .journal file). > I wonder if it would be better on just having the

Re: [sqlite] EXT :Re: Concurrent readonly access to a large database.

2011-10-24 Thread Bo Peng
> Can I ask which file-system you were using on the SSD drive when you > obtained this result? It is ext4 on a 512G SSD on a Ubuntu system. Bo ___ sqlite-users mailing list sqlite-users@sqlite.org

Re: [sqlite] EXT :Re: Concurrent readonly access to a large database.

2011-10-24 Thread Bo Peng
>> Other than using a SSD to speed up random access, I hope a VACUUM >> operation would copy tables one by one so content of the tables would >> not scatter around the whole database. If this is the case, disk >> caching should work much better after VACUUM... fingers crossed. > > VACUUM will

Re: [sqlite] EXT :Re: Concurrent readonly access to a large database.

2011-10-23 Thread Bo Peng
On Sun, Oct 23, 2011 at 8:57 AM, Simon Slavin wrote: > It seems that this was the first problem he found with the way he arranged > this database.  But our solution to it would be different depending on > whether he wanted to do this just the once, or it was a regular

Re: [sqlite] EXT :Re: Concurrent readonly access to a large database.

2011-10-23 Thread Bo Peng
On Sun, Oct 23, 2011 at 8:12 AM, Black, Michael (IS) wrote: > #1 What's the size of your database? 288G, 5000 table, each with ~1.4 million records > #2 What's your cache_size setting? default > #3 How are you loading the data?  Are your table inserts interleaved or by

Re: [sqlite] Concurrent readonly access to a large database.

2011-10-22 Thread Bo Peng
> It's not only speed in KB/sec that matters. It's also disk system > usage as reported by iostat. If it's close to 100% then SQLite can't > do any better. A sad day. I copied the database to a faster driver with RAID 0, made another copy of the database (referred to as DB1), and ran another set

Re: [sqlite] Concurrent readonly access to a large database.

2011-10-22 Thread Bo Peng
> You may create multiple threads, but your hard drive only has one set of > heads. I now realize this problem and is moving the data to a faster drive. However, when copying the data out, the activity monitor reports 40MB/sec read speed. I admit this is not a fast drive, but comparing to the

Re: [sqlite] Concurrent readonly access to a large database.

2011-10-22 Thread Bo Peng
On Sat, Oct 22, 2011 at 4:18 PM, Pavel Ivanov wrote: >> Using three tables (e.g. test.sh 111 112 113), the first command takes >> 13m3s, the second command takes 12m45s. I am wondering if there is any >> magic to make the second script finish in 5 minutes by executing the >>

Re: [sqlite] Concurrent readonly access to a large database.

2011-10-22 Thread Bo Peng
> > Multithreaded mode allows SQLite to be accessed via multiple threads as long > as threads don't shared connection handles.  This is the what's sometimes > called the apartment model of multithreading. Thank you very much for your quick reply. Is there a way to enable multi-thread mode from

[sqlite] Concurrent readonly access to a large database.

2011-10-22 Thread Bo Peng
Dear all, I have a database with about 5000 tables each with more than 1 million records. I needed to get some summary statistics of each table but find that it will take days to run 'SELECT count(*) FROM table_XX' (XX=1,...,5000) sequentially. I therefore created 10 threads, each having its own