Re: [sqlite] speeding up FTS4

Puneet Kishor Wed, 28 Sep 2011 12:00:48 -0700

On Sep 28, 2011, at 1:14 PM, Black, Michael (IS) wrote:

> strftime returns a text representation.  So you didn't really change anything.
>



That's not true at all. I added u_downloaded_on (u_ for unixtime)

CREATE TABLE uris (
    uri_id INTEGER PRIMARY KEY, 
    uri TEXT,
    uri_html TEXT,
    uri_content TEXT,
    downloaded_on DATETIME DEFAULT CURRENT_TIMESTAMP,
    feed_history_id INTEGER, 
    u_downloaded_on INTEGER
);

sqlite> SELECT u_downloaded_on FROM uris LIMIT 5;
1306450769
1306450769
1306450770
1306450774
1306450776


> You need to use juliandays() as I said.
> 
> 
> 
> And you want a REAL number...not integer...though SQLite doesn't really care 
> what you call it.  It's more for your own reference.
> 
> 
> 
> You just added a bunch more strings increasing the size of your 
> database...ergo it ran slower.
> 
> 


If I understand correctly, the *size* of the database should not matter. Or, at 
least not matter as much. Imagine a database with one table with only one row 
but with so much content in that row that it is 27 GB vs. a database with a few 
million rows, each with small amount of content but together totaling 27 GB.

What should matter is using the indexes correctly. In this case, my query plan 
shows that I am hitting all the indexes. But, I think I am getting closer to 
understanding this.

I started backward with

sqlite> SELECT p.project_id 
   ...> FROM projects p
   ...> WHERE p.project_id = 3;
3
CPU Time: user 0.000080 sys 0.000089

sqlite> SELECT f.feed_id, p.project_id 
   ...> FROM projects p JOIN feeds f ON f.project_id = p.project_id 
   ...> WHERE p.project_id = 3; 
..
CPU Time: user 0.000239 sys 0.000170

SELECT fh.feed_history_id, f.feed_id, p.project_id 
   ...> FROM projects p 
   ...>   JOIN feeds f ON f.project_id = p.project_id 
   ...>   JOIN feed_history fh ON f.feed_id = fh.feed_id 
   ...> WHERE p.project_id = 3;
..
CPU Time: user 0.008491 sys 0.008054

SELECT u.uri_id, fh.feed_history_id, f.feed_id, p.project_id 
   ...> FROM projects p 
   ...>   JOIN feeds f ON f.project_id = p.project_id 
   ...>   JOIN feed_history fh ON f.feed_id = fh.feed_id 
   ...>   JOIN uris u ON fh.feed_history_id = u.feed_history_id 
   ...> WHERE p.project_id = 3;


BOOM! Adding that last table makes my query way too slow.

Taking out the uris table and querying only the fts table

sqlite> SELECT uri_id 
   ...> FROM fts_uri 
   ...> WHERE fts_uri MATCH 'education school';


starts producing the results immediately, but gives me back way too many 
matches. I am now getting all the matches, but I want only the matches for 
project id = 3 or matches throttled by u_downloaded_on (both of which are 
achievable only via a JOIN with uris table).


> 
> Michael D. Black
> 
> Senior Scientist
> 
> NG Information Systems
> 
> Advanced Analytics Directorate
> 
> 
> 
> ________________________________
> From: [email protected] [[email protected]] on 
> behalf of Puneet Kishor [[email protected]]
> Sent: Wednesday, September 28, 2011 12:44 PM
> To: General Discussion of SQLite Database
> Subject: EXT :Re: [sqlite] speeding up FTS4
> 
> 
> On Sep 28, 2011, at 11:00 AM, Black, Michael (IS) wrote:
> 
>> Your change to numeric date/time may not take a long as you think.
>> 
>> 
>> 
> 
> 
> Took an hour and a half.
> 
> Step 1: Alter all tables with datetime columns, converting those columns to 
> integer;
> 
> Step 2: Update all tables setting new datetime columns to unixtime
> 
> UPDATE table SET new_column = strftime('%s', old_column);
> 
> Step 3: CREATE INDEX uris_downloaded_on ON uris (u_downloaded_on);
> 
> Step 4: Run the following query
> 
>        SELECT u.uri_id uri_id, u.uri uri, u.u_downloaded_on
>        FROM fts_uri f
>                JOIN uris u ON f.uri_id = u.uri_id
>                JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id
>                JOIN feeds f ON fh.feed_id = f.feed_id
>                JOIN projects p ON f.project_id = p.project_id
>        WHERE p.project_id = 3
>                AND u.u_downloaded_on >= p.u_project_start
>                AND fts_uri MATCH 'education school'
>        ORDER BY u.uri_id, u_downloaded_on DESC;
> 
> Terrible time.
>>> CPU Time: user 27.584849 sys 115.219293
> 
> Step 5: EXPLAIN QUERY PLAN (above SELECT query)
> 
>        0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows)
>        0|1|1|SEARCH TABLE uris AS u USING INTEGER PRIMARY KEY (rowid=?) (~1 
> rows)
>        0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY 
> (rowid=?) (~1 rows)
>        0|3|3|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 
> rows)
>        0|4|4|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) 
> (~1 rows)
>        0|0|0|USE TEMP B-TREE FOR ORDER BY
>        CPU Time: user 0.000099 sys 0.000008
> 
> Step 6: ANALYZE;
> 
> Step 7: Run the above SELECT query again
> 
> Terrible time.
>>> CPU Time: user 27.703538 sys 116.684390
> 
> 
> 
> This train is going nowhere. The times are actually worse than they were when 
> I was using a non-text column for date time.
> 
> By the way, have tried this on two machines -- the previous one was a 
> dual-Xeon Xserve with 12 GB RAM. The current machine of choice is the top of 
> the line iMac (quad core 3.4 GHz Intel Core i7) with 12 GB RAM and a 7200 RPM 
> SATA drive.
> 
> ..

_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] speeding up FTS4

Reply via email to