On Sep 28, 2011, at 2:41 PM, Black, Michael (IS) wrote: > What happens if you create an index on uris(feed_history_id) > > >
Yeah, I noticed that lacking as well. sqlite> EXPLAIN QUERY PLAN SELECT u.uri_id ...> FROM projects p ...> JOIN feeds f ON f.project_id = p.project_id ...> JOIN feed_history fh ON f.feed_id = fh.feed_id ...> JOIN uris u ON fh.feed_history_id = u.feed_history_id ...> WHERE p.project_id = 3; 0|0|0|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) 0|1|3|SCAN TABLE uris AS u (~46608 rows) 0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) 0|3|1|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) CPU Time: user 0.000079 sys 0.000014 sqlite> CREATE INDEX uris_feed_history_id ON uris (feed_history_id); CPU Time: user 12.766977 sys 82.766372 sqlite> EXPLAIN QUERY PLAN SELECT u.uri_id ...> FROM projects p ...> JOIN feeds f ON f.project_id = p.project_id ...> JOIN feed_history fh ON f.feed_id = fh.feed_id ...> JOIN uris u ON fh.feed_history_id = u.feed_history_id ...> WHERE p.project_id = 3; 0|0|0|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) 0|1|1|SCAN TABLE feeds AS f (~7 rows) 0|2|2|SEARCH TABLE feed_history AS fh USING AUTOMATIC COVERING INDEX (feed_id=?) (~5 rows) 0|3|3|SEARCH TABLE uris AS u USING COVERING INDEX uris_feed_history_id (feed_history_id=?) (~10 rows) CPU Time: user 0.000123 sys 0.000015 But, no joy. SELECT u.uri_id uri_id, u.uri uri, u.u_downloaded_on FROM fts_uri f JOIN uris u ON f.uri_id = u.uri_id JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id JOIN feeds f ON fh.feed_id = f.feed_id JOIN projects p ON f.project_id = p.project_id WHERE p.project_id = 3 AND u.u_downloaded_on >= p.u_project_start AND fts_uri MATCH 'education school' ORDER BY u.uri_id, u_downloaded_on DESC; .. CPU Time: user 28.599581 sys 108.518205 > > > > ________________________________ > From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on > behalf of Puneet Kishor [punk.k...@gmail.com] > Sent: Wednesday, September 28, 2011 2:00 PM > To: General Discussion of SQLite Database > Subject: EXT :Re: [sqlite] speeding up FTS4 > > > On Sep 28, 2011, at 1:14 PM, Black, Michael (IS) wrote: > >> strftime returns a text representation. So you didn't really change >> anything. >> > > > That's not true at all. I added u_downloaded_on (u_ for unixtime) > > CREATE TABLE uris ( > uri_id INTEGER PRIMARY KEY, > uri TEXT, > uri_html TEXT, > uri_content TEXT, > downloaded_on DATETIME DEFAULT CURRENT_TIMESTAMP, > feed_history_id INTEGER, > u_downloaded_on INTEGER > ); > > sqlite> SELECT u_downloaded_on FROM uris LIMIT 5; > 1306450769 > 1306450769 > 1306450770 > 1306450774 > 1306450776 > > >> You need to use juliandays() as I said. >> >> >> >> And you want a REAL number...not integer...though SQLite doesn't really care >> what you call it. It's more for your own reference. >> >> >> >> You just added a bunch more strings increasing the size of your >> database...ergo it ran slower. >> >> > > > If I understand correctly, the *size* of the database should not matter. Or, > at least not matter as much. Imagine a database with one table with only one > row but with so much content in that row that it is 27 GB vs. a database with > a few million rows, each with small amount of content but together totaling > 27 GB. > > What should matter is using the indexes correctly. In this case, my query > plan shows that I am hitting all the indexes. But, I think I am getting > closer to understanding this. > > I started backward with > > sqlite> SELECT p.project_id > ...> FROM projects p > ...> WHERE p.project_id = 3; > 3 > CPU Time: user 0.000080 sys 0.000089 > > sqlite> SELECT f.feed_id, p.project_id > ...> FROM projects p JOIN feeds f ON f.project_id = p.project_id > ...> WHERE p.project_id = 3; > .. > CPU Time: user 0.000239 sys 0.000170 > > SELECT fh.feed_history_id, f.feed_id, p.project_id > ...> FROM projects p > ...> JOIN feeds f ON f.project_id = p.project_id > ...> JOIN feed_history fh ON f.feed_id = fh.feed_id > ...> WHERE p.project_id = 3; > .. > CPU Time: user 0.008491 sys 0.008054 > > SELECT u.uri_id, fh.feed_history_id, f.feed_id, p.project_id > ...> FROM projects p > ...> JOIN feeds f ON f.project_id = p.project_id > ...> JOIN feed_history fh ON f.feed_id = fh.feed_id > ...> JOIN uris u ON fh.feed_history_id = u.feed_history_id > ...> WHERE p.project_id = 3; > > > BOOM! Adding that last table makes my query way too slow. > > Taking out the uris table and querying only the fts table > > sqlite> SELECT uri_id > ...> FROM fts_uri > ...> WHERE fts_uri MATCH 'education school'; > > > starts producing the results immediately, but gives me back way too many > matches. I am now getting all the matches, but I want only the matches for > project id = 3 or matches throttled by u_downloaded_on (both of which are > achievable only via a JOIN with uris table). > > >> >> Michael D. Black >> >> Senior Scientist >> >> NG Information Systems >> >> Advanced Analytics Directorate >> >> >> >> ________________________________ >> From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on >> behalf of Puneet Kishor [punk.k...@gmail.com] >> Sent: Wednesday, September 28, 2011 12:44 PM >> To: General Discussion of SQLite Database >> Subject: EXT :Re: [sqlite] speeding up FTS4 >> >> >> On Sep 28, 2011, at 11:00 AM, Black, Michael (IS) wrote: >> >>> Your change to numeric date/time may not take a long as you think. >>> >>> >>> >> >> >> Took an hour and a half. >> >> Step 1: Alter all tables with datetime columns, converting those columns to >> integer; >> >> Step 2: Update all tables setting new datetime columns to unixtime >> >> UPDATE table SET new_column = strftime('%s', old_column); >> >> Step 3: CREATE INDEX uris_downloaded_on ON uris (u_downloaded_on); >> >> Step 4: Run the following query >> >> SELECT u.uri_id uri_id, u.uri uri, u.u_downloaded_on >> FROM fts_uri f >> JOIN uris u ON f.uri_id = u.uri_id >> JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id >> JOIN feeds f ON fh.feed_id = f.feed_id >> JOIN projects p ON f.project_id = p.project_id >> WHERE p.project_id = 3 >> AND u.u_downloaded_on >= p.u_project_start >> AND fts_uri MATCH 'education school' >> ORDER BY u.uri_id, u_downloaded_on DESC; >> >> Terrible time. >>>> CPU Time: user 27.584849 sys 115.219293 >> >> Step 5: EXPLAIN QUERY PLAN (above SELECT query) >> >> 0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows) >> 0|1|1|SEARCH TABLE uris AS u USING INTEGER PRIMARY KEY (rowid=?) (~1 >> rows) >> 0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY >> (rowid=?) (~1 rows) >> 0|3|3|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 >> rows) >> 0|4|4|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) >> (~1 rows) >> 0|0|0|USE TEMP B-TREE FOR ORDER BY >> CPU Time: user 0.000099 sys 0.000008 >> >> Step 6: ANALYZE; >> >> Step 7: Run the above SELECT query again >> >> Terrible time. >>>> CPU Time: user 27.703538 sys 116.684390 >> >> >> >> This train is going nowhere. The times are actually worse than they were >> when I was using a non-text column for date time. >> >> By the way, have tried this on two machines -- the previous one was a >> dual-Xeon Xserve with 12 GB RAM. The current machine of choice is the top of >> the line iMac (quad core 3.4 GHz Intel Core i7) with 12 GB RAM and a 7200 >> RPM SATA drive. >> >> .. > _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users