[sqlite] Efficient way to store counters

David King Tue, 12 Mar 2013 20:29:45 -0700

I'm trying to find an efficient way to store simple incrementing integers but 
I'm having trouble finding an efficient way to do it


My database looks like:

CREATE TABLE counters
  k1, k2,
  count, -- how many we've seen
  expires,
  PRIMARY KEY (k1, k2)
);
CREATE INDEX counters_expires_idx ON counters(expires);

It is about 1.9gb and contains ~22 million of these rows. A given transaction 
updates or creates between 10k and 100k of them.

At first I was just doing something like this pseducode:

update_counter(k1, k2, count=count+1, expires=now+count*1day)
if rows_updated != 1: insert_counter(k1, k2, count=1, expires=now+1day)

but was having serious performance problems that seems to be confined to those 
lines. So I converted ir to INSERT OR REPLACE which had no noticeable impact on 
performance. Convinced the problem was in my code, I decided to offload as much 
as possible to sqlite. Now my code looks like:

======= cut here =========

PRAGMA synchronous=OFF;
PRAGMA temp_store=MEMORY;



CREATE TEMPORARY TABLE trans_counters(k1, k2);

-- (add all of the items to that temporary table)

CREATE TEMPORARY VIEW trans_counters_v AS
SELECT k1 AS k1,
k2 AS k2,
COUNT(*) AS count
FROM trans_counters
GROUP BY (k1, k2);



INSERT OR REPLACE INTO counters
SELECT c.k1 AS k1,
c.k2 AS k2,
COALESCE((SELECT count FROM counters WHERE k1 = c.k1 AND k2 = c.k2),
0)+c.count AS count,
(COALESCE((SELECT count FROM counters WHERE k1 = c.k1 AND k2 = c.k2),
0)+c.count)*24*60*60+? AS expires
FROM trans_counters_v AS c





======= cut here =========

Now the code that inserts all of the rows into the memory table executes nearly 
instantly, but the big INSERT takes 15+ minutes. Meanwhile the journal (in 
either rollback or wal mode) balloons to over 300mb in size. The temporary 
table itself is only about 1.8mb of data (102,603 rows, 94,064 unique) so where 
is all of the journal coming from?. The process takes nearly 0 CPU during this 
time, the disk becomes very active (but low throughput, reading and writing 
maybe 200k/s judging by the rate of growth of the journal), and sampling the 
process with OS X's Activity Monitor while it's busy outputs:

100% 2869 _pysqlite_query_execute (in _sqlite3.so) + 1886 [0x101945e5e]
100% 2869 pysqlite_step (in _sqlite3.so) + 47 [0x10194893f]
100% 2869 sqlite3_step (in libsqlite3.dylib) + 1883 [0x7fff8d95ca5b]
100% 2869 sqlite3VdbeExec (in libsqlite3.dylib) + 3327 [0x7fff8d95e3af]
100% 2869 sqlite3BtreeMovetoUnpacked (in libsqlite3.dylib) + 761 
[0x7fff8d97ab89]
100% 2869 moveToChild (in libsqlite3.dylib) + 146 [0x7fff8d96c872]
100% 2869 sqlite3PagerAcquire (in libsqlite3.dylib) + 194 [0x7fff8d93dc22]
100% 2869 sqlite3PcacheFetch (in libsqlite3.dylib) + 475 [0x7fff8d93e02b]
100% 2869 pagerStress (in libsqlite3.dylib) + 670 [0x7fff8d9c407e]
100% 2869 pager_write_pagelist (in libsqlite3.dylib) + 149 [0x7fff8d999a35]
100% 2869 unixWrite (in libsqlite3.dylib) + 83 [0x7fff8d98bd73]
100% 2869 pwrite (in libsystem_kernel.dylib) + 10 [0x7fff8130bab6]



That is, 2869 of 2869 samples, 100% of the time, was spent in sqlite3_step 
writing the data to disk. Further samples look basically the same with an 
occasional read-path taking up to ~10% of the time.

VACUUM ANALYZE doesn't look to have any effect. I'm running sqlite 3.7.7 on Mac 
OS X 10.7.5 via the Python sqlite3 module

So I feel like something about what I'm doing is fundamentally flawed given 
something about sqlite's performance model. All I want is a count of the number 
of times that I've seen each pair (k1, k2), is there a better way to do this 
without storing them all individually and grouping them later? (This would be 
prohibitively large.) 


_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Efficient way to store counters

Reply via email to