On 19/06/2013 1:41 AM, jhnlmn wrote:
Thank you for your response

Simon Slavin <slavins@...> writes:
UPDATE T SET C1 = calculation(C2) WHERE C1 IS NULL AND rowid < 10000
This is the best solution when the table is freshly created
and max(rowid) == number of rows.
But after many deletes and inserts many rowid will be unused,
max(rowid) may grow indefinitely and the number of required updates will be
also indefinite.

So, no perfect solution so far.
Why not use a trigger on T to update C1 whenever it gets set to NULL? Is it actually important for the NULL to persist until your next batch update comes along?

If a trigger would work, that's by far the cleanest solution, and will have almost no impact on concurrency. Otherwise...

Assuming you want to process between 10k and 20k rows at a time, you might try something like this (writing in python, you should be able to translate it easily to the language you actually use) :

import sqlite3
c = sqlite3.open('my-database.db')
histo = list(conn.execute('select min(rowid) lo, max(rowid) hi, count(*) n from lineitem group by rowid/10000' order by lo))
buckets,i = [(0,0,0),], 0
while i < len(histo):
    a,b,n = buckets[-1]
    c,d,m = histo[i]
    buckets[-1] = (a,d,n+m)
    i += 1
    if n+m >= 10000
        buckets.append((0,0,0))

for lo,hi,n in buckets:
conn.execute('update T set C1=calculation(C2) where C1 is NULL and rowid between ? and ?', (lo, hi))

Translated into plain English: count the number of rows in each slice of 10k, being sure to return slices in order. That query will be fast because it should read from your rowid index and has small output size (even a billion-row input will only produce 100k rows). Merge too-small slices so that each contains somewhere between 10k and 20k-1 rows, then run your update query, passing the lower and upper bound of each slice to limit where it looks. Again, the index on rowid will enforce the range limit without a table scan (but double-check the output of "explain query plan").

NOTE: by splitting the transaction, you risk the database changing out from under you before the last slice is done. Somebody *could* delete a whole chunk of the rowid space, for example, and throw off your carefully computed slice sizes. Or they could add rows after you make the slices, and those rows would be ignored. Or they could set C1=NULL on rows you already looked at. Most likely, you can squint and claim that all those kinds of things just happened "after" the batch update, but whether that's allowed is application dependent.

Ryan

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to