On Sep 28, 2011, at 2:41 PM, Black, Michael (IS) wrote:
> What happens if you create an index on uris(feed_history_id)
>
>
>
Yeah, I noticed that lacking as well.
sqlite> EXPLAIN QUERY PLAN SELECT u.uri_id
...> FROM projects p
...> JOIN feeds f ON f.project_id = p.project_id
...> JOIN feed_history fh ON f.feed_id = fh.feed_id
...> JOIN uris u ON fh.feed_history_id = u.feed_history_id
...> WHERE p.project_id = 3;
0|0|0|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
0|1|3|SCAN TABLE uris AS u (~46608 rows)
0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY (rowid=?) (~1
rows)
0|3|1|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
CPU Time: user 0.000079 sys 0.000014
sqlite> CREATE INDEX uris_feed_history_id ON uris (feed_history_id);
CPU Time: user 12.766977 sys 82.766372
sqlite> EXPLAIN QUERY PLAN SELECT u.uri_id
...> FROM projects p
...> JOIN feeds f ON f.project_id = p.project_id
...> JOIN feed_history fh ON f.feed_id = fh.feed_id
...> JOIN uris u ON fh.feed_history_id = u.feed_history_id
...> WHERE p.project_id = 3;
0|0|0|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
0|1|1|SCAN TABLE feeds AS f (~7 rows)
0|2|2|SEARCH TABLE feed_history AS fh USING AUTOMATIC COVERING INDEX
(feed_id=?) (~5 rows)
0|3|3|SEARCH TABLE uris AS u USING COVERING INDEX uris_feed_history_id
(feed_history_id=?) (~10 rows)
CPU Time: user 0.000123 sys 0.000015
But, no joy.
SELECT u.uri_id uri_id, u.uri uri, u.u_downloaded_on
FROM fts_uri f
JOIN uris u ON f.uri_id = u.uri_id
JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id
JOIN feeds f ON fh.feed_id = f.feed_id
JOIN projects p ON f.project_id = p.project_id
WHERE p.project_id = 3
AND u.u_downloaded_on >= p.u_project_start
AND fts_uri MATCH 'education school'
ORDER BY u.uri_id, u_downloaded_on DESC;
..
CPU Time: user 28.599581 sys 108.518205
>
>
>
> ________________________________
> From: [email protected] [[email protected]] on
> behalf of Puneet Kishor [[email protected]]
> Sent: Wednesday, September 28, 2011 2:00 PM
> To: General Discussion of SQLite Database
> Subject: EXT :Re: [sqlite] speeding up FTS4
>
>
> On Sep 28, 2011, at 1:14 PM, Black, Michael (IS) wrote:
>
>> strftime returns a text representation. So you didn't really change
>> anything.
>>
>
>
> That's not true at all. I added u_downloaded_on (u_ for unixtime)
>
> CREATE TABLE uris (
> uri_id INTEGER PRIMARY KEY,
> uri TEXT,
> uri_html TEXT,
> uri_content TEXT,
> downloaded_on DATETIME DEFAULT CURRENT_TIMESTAMP,
> feed_history_id INTEGER,
> u_downloaded_on INTEGER
> );
>
> sqlite> SELECT u_downloaded_on FROM uris LIMIT 5;
> 1306450769
> 1306450769
> 1306450770
> 1306450774
> 1306450776
>
>
>> You need to use juliandays() as I said.
>>
>>
>>
>> And you want a REAL number...not integer...though SQLite doesn't really care
>> what you call it. It's more for your own reference.
>>
>>
>>
>> You just added a bunch more strings increasing the size of your
>> database...ergo it ran slower.
>>
>>
>
>
> If I understand correctly, the *size* of the database should not matter. Or,
> at least not matter as much. Imagine a database with one table with only one
> row but with so much content in that row that it is 27 GB vs. a database with
> a few million rows, each with small amount of content but together totaling
> 27 GB.
>
> What should matter is using the indexes correctly. In this case, my query
> plan shows that I am hitting all the indexes. But, I think I am getting
> closer to understanding this.
>
> I started backward with
>
> sqlite> SELECT p.project_id
> ...> FROM projects p
> ...> WHERE p.project_id = 3;
> 3
> CPU Time: user 0.000080 sys 0.000089
>
> sqlite> SELECT f.feed_id, p.project_id
> ...> FROM projects p JOIN feeds f ON f.project_id = p.project_id
> ...> WHERE p.project_id = 3;
> ..
> CPU Time: user 0.000239 sys 0.000170
>
> SELECT fh.feed_history_id, f.feed_id, p.project_id
> ...> FROM projects p
> ...> JOIN feeds f ON f.project_id = p.project_id
> ...> JOIN feed_history fh ON f.feed_id = fh.feed_id
> ...> WHERE p.project_id = 3;
> ..
> CPU Time: user 0.008491 sys 0.008054
>
> SELECT u.uri_id, fh.feed_history_id, f.feed_id, p.project_id
> ...> FROM projects p
> ...> JOIN feeds f ON f.project_id = p.project_id
> ...> JOIN feed_history fh ON f.feed_id = fh.feed_id
> ...> JOIN uris u ON fh.feed_history_id = u.feed_history_id
> ...> WHERE p.project_id = 3;
>
>
> BOOM! Adding that last table makes my query way too slow.
>
> Taking out the uris table and querying only the fts table
>
> sqlite> SELECT uri_id
> ...> FROM fts_uri
> ...> WHERE fts_uri MATCH 'education school';
>
>
> starts producing the results immediately, but gives me back way too many
> matches. I am now getting all the matches, but I want only the matches for
> project id = 3 or matches throttled by u_downloaded_on (both of which are
> achievable only via a JOIN with uris table).
>
>
>>
>> Michael D. Black
>>
>> Senior Scientist
>>
>> NG Information Systems
>>
>> Advanced Analytics Directorate
>>
>>
>>
>> ________________________________
>> From: [email protected] [[email protected]] on
>> behalf of Puneet Kishor [[email protected]]
>> Sent: Wednesday, September 28, 2011 12:44 PM
>> To: General Discussion of SQLite Database
>> Subject: EXT :Re: [sqlite] speeding up FTS4
>>
>>
>> On Sep 28, 2011, at 11:00 AM, Black, Michael (IS) wrote:
>>
>>> Your change to numeric date/time may not take a long as you think.
>>>
>>>
>>>
>>
>>
>> Took an hour and a half.
>>
>> Step 1: Alter all tables with datetime columns, converting those columns to
>> integer;
>>
>> Step 2: Update all tables setting new datetime columns to unixtime
>>
>> UPDATE table SET new_column = strftime('%s', old_column);
>>
>> Step 3: CREATE INDEX uris_downloaded_on ON uris (u_downloaded_on);
>>
>> Step 4: Run the following query
>>
>> SELECT u.uri_id uri_id, u.uri uri, u.u_downloaded_on
>> FROM fts_uri f
>> JOIN uris u ON f.uri_id = u.uri_id
>> JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id
>> JOIN feeds f ON fh.feed_id = f.feed_id
>> JOIN projects p ON f.project_id = p.project_id
>> WHERE p.project_id = 3
>> AND u.u_downloaded_on >= p.u_project_start
>> AND fts_uri MATCH 'education school'
>> ORDER BY u.uri_id, u_downloaded_on DESC;
>>
>> Terrible time.
>>>> CPU Time: user 27.584849 sys 115.219293
>>
>> Step 5: EXPLAIN QUERY PLAN (above SELECT query)
>>
>> 0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows)
>> 0|1|1|SEARCH TABLE uris AS u USING INTEGER PRIMARY KEY (rowid=?) (~1
>> rows)
>> 0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY
>> (rowid=?) (~1 rows)
>> 0|3|3|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1
>> rows)
>> 0|4|4|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?)
>> (~1 rows)
>> 0|0|0|USE TEMP B-TREE FOR ORDER BY
>> CPU Time: user 0.000099 sys 0.000008
>>
>> Step 6: ANALYZE;
>>
>> Step 7: Run the above SELECT query again
>>
>> Terrible time.
>>>> CPU Time: user 27.703538 sys 116.684390
>>
>>
>>
>> This train is going nowhere. The times are actually worse than they were
>> when I was using a non-text column for date time.
>>
>> By the way, have tried this on two machines -- the previous one was a
>> dual-Xeon Xserve with 12 GB RAM. The current machine of choice is the top of
>> the line iMac (quad core 3.4 GHz Intel Core i7) with 12 GB RAM and a 7200
>> RPM SATA drive.
>>
>> ..
>
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users