>> First of all in the statement above you don't gain benefit from >> uniqueness and replace about 10k rows twice. > > Are you sure? The SELECT in the INSERT OR UPDATE selects "FROM > trans_counters_v AS c", the grouped temporary view. So it should only see any > given key pair once before it starts doing any inserting at all
Sorry, you are right. I missed the GROUP BY part... Pavel On Tue, Mar 12, 2013 at 11:03 PM, David King <dk...@ketralnis.com> wrote: >> > At first I was just doing something like this pseducode: >> > update_counter(k1, k2, count=count+1, expires=now+count*1day) >> > if rows_updated != 1: insert_counter(k1, k2, count=1, expires=now+1day) >> >> Assuming these 2 statements constitute each of the 10k-100k steps you >> mentioned above and all of these steps are wrapped up in BEGIN-COMMIT >> block this is probably the most efficient way of doing this. The only >> improvement could be if you are doing creates more often than updates. >> Then you can switch and do INSERT first and then UPDATE if necessary. >> It could gain you a little time. > > > Yeah. I even tried keeping track of how many hits/misses I had and > re-ordering the attempted INSERT/UPDATE as appropriate. A batch of 100k of > these is done in a single transaction > >> > but was having serious performance problems that seems to be confined to >> > those lines. So I converted ir to INSERT OR REPLACE which had no >> > noticeable impact on performance. >> Actually my understanding would suggest that INSERT OR REPLACE should >> execute slower than UPDATE + INSERT (or INSERT + UPDATE). > > […] >> > Convinced the problem was in my code, I decided to offload as much as >> > possible to sqlite. Now my code looks like: >> >> This should be much-much slower than UPDATE + INSERT. > > > That's unfortunate because the overall performance was about the same ±10% > between all three approaches :( > >> First of all in the statement above you don't gain benefit from >> uniqueness and replace about 10k rows twice. > > > Are you sure? The SELECT in the INSERT OR UPDATE selects "FROM > trans_counters_v AS c", the grouped temporary view. So it should only see any > given key pair once before it starts doing any inserting at all > >> Second with such low >> repeatability you don't gain much from doing it with such complicated >> INSERT. And about journal size: imagine that you've got "lucky" and >> all those 94k rows are each in it's own page in the counters table. >> SQLite will have to save each of that pages in the journal which will >> give journal size of about 94k * 4096 ~ 400M. > > > I hadn't thought about it that way, that's true. And it's probably wildly > seeking all over the disk to do it. The reads are probably fine because the > machine has plenty of RAM to devote to page cache, it's the random writes > that are killing it. > >> I don't think there's anything better than what you did initially. > > As for the fundamental approach, I figured as much. The rearrangement into > the giant INSERT OR REPLACE was just to prove to myself that the problem > wasn't elsewhere in my code > > For optimising it on the sqlite front, I've played with page sizes, > journaling modes, and changing the transaction batch size without much luck. > I don't have strong consistency requirements for e.g. power failures or OS > crashes but I do need an application crash to not take it out so I can't just > go without the journal altogether (which does help the problem, but isn't > huge). > > > _______________________________________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users