On Monday, 12 December, 2016 18:24, Vince Scafaria <[email protected]> wrote:
> Our application does a lot of SQLite read/write in a background process.
> On some environments this hammers the disk I/O (e.g. Task Manager ->
> Disk). Do you have suggestions on how we might minimize this?
> Perhaps related, one operation that seems to use a lot of the Disk I/O is
> when we use INSERT OR REPLACE INTO. We use this as a workaround for
> performing an UPDATE together with JOIN, which is not allowed by SQLite. I
> feel as though one way we could minimize disk usage would be if we could
> use the MSSQL syntax of:
> UPDATE x
> SET ValueA = y.ValueA
> FROM TableOne x
> JOIN TableTwo y ON y.Id = x.TableTwoId
> WHERE COALESCE(x.ValueA,'') != COALESCE(y.ValueA,'');
> Instead we do:
> INSERT OR REPLACE INTO TableOne ( Id, TableTwoId, ValueA )
> SELECT x.Id, x.TableTwoId, y.ValueA
> FROM TableOne x
> JOIN TableTwo y ON y.Id = x.TableTwoId
> WHERE COALESCE(x.ValueA,'') != COALESCE(y.ValueA,'');
> Our tables all have a primary key on Id. By including Id in the statement,
> it does a REPLACE. As I understand it, replace includes a behind-the-
> scenes DELETE which triggers a re-indexing. Maybe that is part of the
> problem, but I don't see a workaround. Ideas?
I do not see any need for INSERT, you are doing a simple update. Since your
select is constrained by existing rows in TableOne (which you are updating),
you only need to update, not delete and insert (which you are doing for every
row). How about something like:
UPDATE TableOne
SET ValueA = (SELECT ValueA
FROM TableTwo
WHERE TableTwo.Id = TableOne.TableTwoID)
WHERE COALESCE(TableOne.ValueA, '') != COALESCE((SELECT ValueA
FROM TableTwo
WHERE TableTwo.Id =
TableOne.TableTwoID). '')
Note that if your database were properly normalized, you would not have to do
this (that is, ValueA is entirely dependant on TableTwoID, and TableTwoID is a
foreign key into TableTwo). If you were not duplicating data then this issue
would not arise. This is part of the problem with having multiple datastores
-- you never know which one is accurate. And if TableOne.ValueA is merely
duplicated data (and therefore always to be assumed to be incorrect), then you
may as well not have it in TableOne just do a lookup on the fly. If you think
that makes your queries look too complicated, then you can define a view which
does it for you and only select through the view.
If you really must de-normalize your data for some reason, then you should put
a trigger on TableOne (and perhaps TableTwo) that keep them in sync when
updated (or inserted into) so that you do not have to run updates such as the
above.
_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users