Droedel, Thanks for the comprehensive reply. We have actually done all of this.
The system has been running for 2-3 years and we have taken the opportunity to try and prune the database from 60GB down to 600MB. Currently the live system is working OK with a 60GB database, but our test system (which is smaller) is struggling with 600MB.
The system has a restriction of IOPS as it's a Virtual Private Server. Technically it's running Ubuntu 16.04 under OpenVZ. We can get 69MB/sec with a disk to disk copy, which isn't brilliant if we had sustained disk traffic which we don't.
We log approx 600 - 800 items of around 3-5K every five minutes. These 600-800 items are mainly an insert into a single table, there are other things happening as well, but this is the bulk of the work. We can see that the -wal files grow a small amount (4-5MB) just before the commit. It then takes 7 seconds to execute the commit. This is the bit that we're struggling with. We know we can get circa 70MB/sec data throughput, so this should take a fraction of a second. Now SQLite needs to work out which pages to commit so thats a little longer, but we know SQLite is fast, so that shouldn't take 7 seconds on the small database as it doesn't take that long on the large 60GB database. Thats the puzzling bit, the large database is quick, the small one slow.
We have no logging turned on, we can turn SQL logging on at the DBI level but that turns a 20 sec run into a 2-3 minute run as it captures everything :) Nothing in the log files gives us any concern (apart from the really long commit time). Simon Slavin suggested dropping the indexes which we did, that turned the commit into a fast commit, so its something to do with the indexes but we can't see what.
What we are now doing is going back to the very beginning:1. We built a replacement system yesterday with 8GB memory and 8 cores and 150GB disk space. Its virtualised (ESXI) but under our control.
2. We've installed a copy of the old 60GB database on the new system.3. We're going to benchmark the new system over a couple of thousand runs to see what the average time is. 4. We'll then work our way through the deduping of the database step by step to see when the commit time blow up. This will take a few days as working out the duplications of 200,000,000 rows isn't that quick :) As we found out, dropping a very large table is really, really, really slow. 5. We'll apply some of the ideas that people have suggested since yesterday to see if they work, but I'm keen that we have a repeatable problem that we solve rather than we use a scatter gun approach to fixing it. We think SQLite is well written so we figure the problem is ours to solve rather than simply blaming the software.
Thanks Rob On 30 Jul 2018, at 11:11, Droedel wrote:
Hi,When having bad performance, I usually first try to find out if the slowness is due to disk througput (sequential), slow random access or something else. In Linux, try "iostat -xtc 5". Do this with and without your application writing to disk.If you see high CPU %iowait and high %util on your disk, then disk is the bottleneck. If not: start profiling / analyzing other bottlenecks (CPU / network / ...)If the disk throughput (wMB/s) is close to your normal sequential throughput (69 MB/s): try to write less data or get a faster disk. If the disk troughput is low, but high numbers of writes (w/s): there's too much seeking / too many small writes to your disk. Page cache too small ? Checkpointing too often ?Sometimes this kind of problems is caused by other applications (logging / ...) causing too much baseload. %util should be low when your application isn't running.Just my 2 cents. Kind regards, Droedel On Sun, Jul 29, 2018, at 10:14, Rob Willett wrote:Hi, Background We've been refactoring our database to reduce the size of it. Through some simple logic we've managed to pull out 99% of the data to reducethe size from 51GB down to approx 600MB. This logic has been to removerows that are almost the same but not quite identical. As with all things, the thinking was the difficult bit, the execution somewhat easier. As part of the testing we've been doing, we've now hit on an odd andweird problem to do with the COMMIT statement. A commit of a few hundred (circa 600-800) rows takes approx 7 seconds whereas before we never evennoticed it, though we now know it was two seconds before. Each row is probably 1-2K of data, so its not very much at all. Details of what we have tried: 1. We've turned synchronous on and off PRAGMA synchronous=ON and thats not made any difference. 2. We are using and have been using WAL mode for years. PRAGMA journal_mode; journal_mode wal3. We've tested that the server copies OK, we get a consistent 69MB/sec.This is not as fast we would like, but it's the same across all our virtual servers.4. We've tested the commit on our existing 60GB database and it takes 2 seconds, which is longer than we thought it would be. The server for the 60GB database is a large VPS with 8GB/8 cores and runs Ubuntu 14.04. Theserver we are testing on is a 2GB/2 core test server running Ubuntu16.04. Whilst the test server is smaller, we wouldn't expect it to take3 times longer to do a commit.5. The code is identical across the servers. We are running Perl and theDBI module. The code for doing a commit in Perl::DBI is $dbh->do("COMMIT");We are getting the expected performance elsewhere on the system andin the code. It's just the commit that is taking a long time. 6. The code we are committing is adding 600-800 lines to a table thatused to be 200,000,000 rows in size. It's now 400,000 lines in size. We are wondering if the deletion of the lines has had an impact we didn'texpect. We have vacuumed and analysed the database. The schema for the table we insert into is CREATE TABLE IF NOT EXISTS "Disruptions" ( "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT, "version" integer NOT NULL, "Disruption_id" INTEGER NOT NULL, "status" integer NOT NULL, "severity" integer NOT NULL, "levelOfInterest" integer NOT NULL, "category" integer NOT NULL, "subCategory" integer NOT NULL, "startTime" TEXT NOT NULL, "endTime" text NOT NULL, "location" integer NOT NULL, "corridor" integer NOT NULL, "comments" integer NOT NULL, "currentUpdate" integer NOT NULL, "remarkTime" TEXT NOT NULL, "lastModTime" TEXT NOT NULL, "CauseAreaPointX" real NOT NULL, "CauseAreaPointY" real NOT NULL, "Direction" TEXT ); CREATE INDEX "Disruptions_Idx1" ON Disruptions ("location" COLLATENOCASE ASC, "corridor" COLLATE NOCASE ASC, "status" COLLATE NOCASE ASC,"category" COLLATE NOCASE ASC, "severity" COLLATE NOCASE ASC, "levelOfInterest" COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, "subCategory" COLLATE NOCASE ASC, "startTime" COLLATE NOCASE ASC);CREATE UNIQUE INDEX "Disruptions_Idx2" ON Disruptions ("Disruption_id"COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, "category" COLLATE NOCASE ASC, "subCategory" COLLATE NOCASE ASC);CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version" COLLATENOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location" COLLATE NOCASE ASC);CREATE INDEX Disruptions_Idx5 ON Disruptions ("status", "Disruption_id","Severity", "levelOfInterest", "category", "subCategory", "version"); We have checked that this schema is consistent across the databases. We're about to recreate the table to see if that makes a difference. Any help or advice welcomed. Thanks Rob _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users_______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users