On 19/06/2013 1:41 AM, jhnlmn wrote:
Thank you for your response
Simon Slavin <slavins@...> writes:
UPDATE T SET C1 = calculation(C2) WHERE C1 IS NULL AND rowid < 10000
This is the best solution when the table is freshly created
and max(rowid) == number of rows.
But after many deletes and inserts many rowid will be unused,
max(rowid) may grow indefinitely and the number of required updates will be
also indefinite.
So, no perfect solution so far.
Why not use a trigger on T to update C1 whenever it gets set to NULL? Is
it actually important for the NULL to persist until your next batch
update comes along?
If a trigger would work, that's by far the cleanest solution, and will
have almost no impact on concurrency. Otherwise...
Assuming you want to process between 10k and 20k rows at a time, you
might try something like this (writing in python, you should be able to
translate it easily to the language you actually use) :
import sqlite3
c = sqlite3.open('my-database.db')
histo = list(conn.execute('select min(rowid) lo, max(rowid) hi, count(*)
n from lineitem group by rowid/10000' order by lo))
buckets,i = [(0,0,0),], 0
while i < len(histo):
a,b,n = buckets[-1]
c,d,m = histo[i]
buckets[-1] = (a,d,n+m)
i += 1
if n+m >= 10000
buckets.append((0,0,0))
for lo,hi,n in buckets:
conn.execute('update T set C1=calculation(C2) where C1 is NULL and
rowid between ? and ?', (lo, hi))
Translated into plain English: count the number of rows in each slice of
10k, being sure to return slices in order. That query will be fast
because it should read from your rowid index and has small output size
(even a billion-row input will only produce 100k rows). Merge too-small
slices so that each contains somewhere between 10k and 20k-1 rows, then
run your update query, passing the lower and upper bound of each slice
to limit where it looks. Again, the index on rowid will enforce the
range limit without a table scan (but double-check the output of
"explain query plan").
NOTE: by splitting the transaction, you risk the database changing out
from under you before the last slice is done. Somebody *could* delete a
whole chunk of the rowid space, for example, and throw off your
carefully computed slice sizes. Or they could add rows after you make
the slices, and those rows would be ignored. Or they could set C1=NULL
on rows you already looked at. Most likely, you can squint and claim
that all those kinds of things just happened "after" the batch update,
but whether that's allowed is application dependent.
Ryan
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users