Re: [sqlite] Using cache stats to balance transaction size for optimal performance

Black, Michael (IS) Thu, 27 May 2010 04:47:48 -0700

Maybe I'm confused but perhaps you don't understand the pseudo code or I don't 
understand your problem.  The idea is to process in batches instead of 
all-at-once.  I don't see where you came up with 1,000,000 inserts as though 
the pseudo-code is generating extra insert statements for you.  If you made N=1 
you would effectively be the same as running without BEGIN/COMMIT.
 
1-to-999 insert records with N=1000 would result in one BEGIN/COMMIT sequence.  
 
1000 INSERTS woud result in two BEGIN/COMMIT sequences (the 2nd BEGIN/COMMIT 
would actually have 0 records to commit as you would be right on the boundary)
2000 INSERTs would have 3 BEGIN/COMMIT
3000 INSERTs would have 4 BEGIN/COMMIT
 
It doesn't matter how many inserts you do...the pseudo-code logic just does a 
new COMMIT/BEGIN on every 1000 inserts.  You adjust N to make the system 
responsive to your users -- the smaller the N the faster the user response time 
for competing queries.
 
Am I missing something?  Do you not know what BEGIN/COMMIT is?
Let me rephase using your pseudocode --- this will commit every 1000 INSERTs.
 
count=0
batch=1000
BEGIN
while (have data) {
        count++
        if ((count % batch)==0) {
           COMMIT
           BEGIN
        }
        process next piece of data
        }
}
COMMIT

Michael D. Black
Senior Scientist
Northrop Grumman Mission Systems

________________________________

From: [email protected] on behalf of Igor Sereda
Sent: Thu 5/27/2010 6:29 AM
To: [email protected]
Subject: Re: [sqlite] Using cache stats to balance transaction size for optimal 
performance

Michael,

Thank you for your suggestion! The problem with this approach is that N
would not be a constant that we could tune.

As I mentioned, the amount of updates may vary, depending on the data
received.

For example, one piece of data may lead to a single INSERT. So it would be
safe and effective to have N=1000, for example. Another piece of data may
lead to 1000 INSERTs. Now, if we still have N=1000, then we'll have
1,000,000 INSERTs in a single transaction. It's completely unpredictable.
And when the amount of data changed in a single transaction is large enough,
it would cause either cache spill and exclusive lock on the database, or the
growth of cache and memory consumption.

Do you think this makes sense?

We could theoretically count the number of DML statements or steps, but this
would imply changing the underlying architecture of the application, so that
any plug-in or extension that accesses SQLite also reports how much data did
they change. It's not very convenient.

Kind regards,
Igor

Black, Michael (IS) wrote:
>
> So only do N many records in one batch.  That's the easiest thing.  Forget
> about the cache and just use responsiveness to adjust how many records you
> allow at once.
> 
> Pseudo-code:
> recnum=0
> BEGIN;
> while more records
>     INSERT....
>     recnum++
>     if (recnum % 1000)
>         COMMIT;
>         BEGIN;
> end
> COMMIT;
>

--
View this message in context: 
http://old.nabble.com/Using-cache-stats-to-balance-transaction-size-for-optimal-performance-tp28690967p28692687.html
Sent from the SQLite mailing list archive at Nabble.com.

_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Using cache stats to balance transaction size for optimal performance

Reply via email to