What happens if you create an index on uris(feed_history_id)
Michael D. Black Senior Scientist NG Information Systems Advanced Analytics Directorate ________________________________ From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Puneet Kishor [punk.k...@gmail.com] Sent: Wednesday, September 28, 2011 2:00 PM To: General Discussion of SQLite Database Subject: EXT :Re: [sqlite] speeding up FTS4 On Sep 28, 2011, at 1:14 PM, Black, Michael (IS) wrote: > strftime returns a text representation. So you didn't really change anything. > That's not true at all. I added u_downloaded_on (u_ for unixtime) CREATE TABLE uris ( uri_id INTEGER PRIMARY KEY, uri TEXT, uri_html TEXT, uri_content TEXT, downloaded_on DATETIME DEFAULT CURRENT_TIMESTAMP, feed_history_id INTEGER, u_downloaded_on INTEGER ); sqlite> SELECT u_downloaded_on FROM uris LIMIT 5; 1306450769 1306450769 1306450770 1306450774 1306450776 > You need to use juliandays() as I said. > > > > And you want a REAL number...not integer...though SQLite doesn't really care > what you call it. It's more for your own reference. > > > > You just added a bunch more strings increasing the size of your > database...ergo it ran slower. > > If I understand correctly, the *size* of the database should not matter. Or, at least not matter as much. Imagine a database with one table with only one row but with so much content in that row that it is 27 GB vs. a database with a few million rows, each with small amount of content but together totaling 27 GB. What should matter is using the indexes correctly. In this case, my query plan shows that I am hitting all the indexes. But, I think I am getting closer to understanding this. I started backward with sqlite> SELECT p.project_id ...> FROM projects p ...> WHERE p.project_id = 3; 3 CPU Time: user 0.000080 sys 0.000089 sqlite> SELECT f.feed_id, p.project_id ...> FROM projects p JOIN feeds f ON f.project_id = p.project_id ...> WHERE p.project_id = 3; .. CPU Time: user 0.000239 sys 0.000170 SELECT fh.feed_history_id, f.feed_id, p.project_id ...> FROM projects p ...> JOIN feeds f ON f.project_id = p.project_id ...> JOIN feed_history fh ON f.feed_id = fh.feed_id ...> WHERE p.project_id = 3; .. CPU Time: user 0.008491 sys 0.008054 SELECT u.uri_id, fh.feed_history_id, f.feed_id, p.project_id ...> FROM projects p ...> JOIN feeds f ON f.project_id = p.project_id ...> JOIN feed_history fh ON f.feed_id = fh.feed_id ...> JOIN uris u ON fh.feed_history_id = u.feed_history_id ...> WHERE p.project_id = 3; BOOM! Adding that last table makes my query way too slow. Taking out the uris table and querying only the fts table sqlite> SELECT uri_id ...> FROM fts_uri ...> WHERE fts_uri MATCH 'education school'; starts producing the results immediately, but gives me back way too many matches. I am now getting all the matches, but I want only the matches for project id = 3 or matches throttled by u_downloaded_on (both of which are achievable only via a JOIN with uris table). > > Michael D. Black > > Senior Scientist > > NG Information Systems > > Advanced Analytics Directorate > > > > ________________________________ > From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on > behalf of Puneet Kishor [punk.k...@gmail.com] > Sent: Wednesday, September 28, 2011 12:44 PM > To: General Discussion of SQLite Database > Subject: EXT :Re: [sqlite] speeding up FTS4 > > > On Sep 28, 2011, at 11:00 AM, Black, Michael (IS) wrote: > >> Your change to numeric date/time may not take a long as you think. >> >> >> > > > Took an hour and a half. > > Step 1: Alter all tables with datetime columns, converting those columns to > integer; > > Step 2: Update all tables setting new datetime columns to unixtime > > UPDATE table SET new_column = strftime('%s', old_column); > > Step 3: CREATE INDEX uris_downloaded_on ON uris (u_downloaded_on); > > Step 4: Run the following query > > SELECT u.uri_id uri_id, u.uri uri, u.u_downloaded_on > FROM fts_uri f > JOIN uris u ON f.uri_id = u.uri_id > JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id > JOIN feeds f ON fh.feed_id = f.feed_id > JOIN projects p ON f.project_id = p.project_id > WHERE p.project_id = 3 > AND u.u_downloaded_on >= p.u_project_start > AND fts_uri MATCH 'education school' > ORDER BY u.uri_id, u_downloaded_on DESC; > > Terrible time. >>> CPU Time: user 27.584849 sys 115.219293 > > Step 5: EXPLAIN QUERY PLAN (above SELECT query) > > 0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows) > 0|1|1|SEARCH TABLE uris AS u USING INTEGER PRIMARY KEY (rowid=?) (~1 > rows) > 0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY > (rowid=?) (~1 rows) > 0|3|3|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 > rows) > 0|4|4|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) > (~1 rows) > 0|0|0|USE TEMP B-TREE FOR ORDER BY > CPU Time: user 0.000099 sys 0.000008 > > Step 6: ANALYZE; > > Step 7: Run the above SELECT query again > > Terrible time. >>> CPU Time: user 27.703538 sys 116.684390 > > > > This train is going nowhere. The times are actually worse than they were when > I was using a non-text column for date time. > > By the way, have tried this on two machines -- the previous one was a > dual-Xeon Xserve with 12 GB RAM. The current machine of choice is the top of > the line iMac (quad core 3.4 GHz Intel Core i7) with 12 GB RAM and a 7200 RPM > SATA drive. > > .. _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users