On Sep 28, 2011, at 1:14 PM, Black, Michael (IS) wrote:
> strftime returns a text representation. So you didn't really change anything.
>
That's not true at all. I added u_downloaded_on (u_ for unixtime)
CREATE TABLE uris (
uri_id INTEGER PRIMARY KEY,
uri TEXT,
uri_html TEXT,
uri_content TEXT,
downloaded_on DATETIME DEFAULT CURRENT_TIMESTAMP,
feed_history_id INTEGER,
u_downloaded_on INTEGER
);
sqlite> SELECT u_downloaded_on FROM uris LIMIT 5;
1306450769
1306450769
1306450770
1306450774
1306450776
> You need to use juliandays() as I said.
>
>
>
> And you want a REAL number...not integer...though SQLite doesn't really care
> what you call it. It's more for your own reference.
>
>
>
> You just added a bunch more strings increasing the size of your
> database...ergo it ran slower.
>
>
If I understand correctly, the *size* of the database should not matter. Or, at
least not matter as much. Imagine a database with one table with only one row
but with so much content in that row that it is 27 GB vs. a database with a few
million rows, each with small amount of content but together totaling 27 GB.
What should matter is using the indexes correctly. In this case, my query plan
shows that I am hitting all the indexes. But, I think I am getting closer to
understanding this.
I started backward with
sqlite> SELECT p.project_id
...> FROM projects p
...> WHERE p.project_id = 3;
3
CPU Time: user 0.000080 sys 0.000089
sqlite> SELECT f.feed_id, p.project_id
...> FROM projects p JOIN feeds f ON f.project_id = p.project_id
...> WHERE p.project_id = 3;
..
CPU Time: user 0.000239 sys 0.000170
SELECT fh.feed_history_id, f.feed_id, p.project_id
...> FROM projects p
...> JOIN feeds f ON f.project_id = p.project_id
...> JOIN feed_history fh ON f.feed_id = fh.feed_id
...> WHERE p.project_id = 3;
..
CPU Time: user 0.008491 sys 0.008054
SELECT u.uri_id, fh.feed_history_id, f.feed_id, p.project_id
...> FROM projects p
...> JOIN feeds f ON f.project_id = p.project_id
...> JOIN feed_history fh ON f.feed_id = fh.feed_id
...> JOIN uris u ON fh.feed_history_id = u.feed_history_id
...> WHERE p.project_id = 3;
BOOM! Adding that last table makes my query way too slow.
Taking out the uris table and querying only the fts table
sqlite> SELECT uri_id
...> FROM fts_uri
...> WHERE fts_uri MATCH 'education school';
starts producing the results immediately, but gives me back way too many
matches. I am now getting all the matches, but I want only the matches for
project id = 3 or matches throttled by u_downloaded_on (both of which are
achievable only via a JOIN with uris table).
>
> Michael D. Black
>
> Senior Scientist
>
> NG Information Systems
>
> Advanced Analytics Directorate
>
>
>
> ________________________________
> From: [email protected] [[email protected]] on
> behalf of Puneet Kishor [[email protected]]
> Sent: Wednesday, September 28, 2011 12:44 PM
> To: General Discussion of SQLite Database
> Subject: EXT :Re: [sqlite] speeding up FTS4
>
>
> On Sep 28, 2011, at 11:00 AM, Black, Michael (IS) wrote:
>
>> Your change to numeric date/time may not take a long as you think.
>>
>>
>>
>
>
> Took an hour and a half.
>
> Step 1: Alter all tables with datetime columns, converting those columns to
> integer;
>
> Step 2: Update all tables setting new datetime columns to unixtime
>
> UPDATE table SET new_column = strftime('%s', old_column);
>
> Step 3: CREATE INDEX uris_downloaded_on ON uris (u_downloaded_on);
>
> Step 4: Run the following query
>
> SELECT u.uri_id uri_id, u.uri uri, u.u_downloaded_on
> FROM fts_uri f
> JOIN uris u ON f.uri_id = u.uri_id
> JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id
> JOIN feeds f ON fh.feed_id = f.feed_id
> JOIN projects p ON f.project_id = p.project_id
> WHERE p.project_id = 3
> AND u.u_downloaded_on >= p.u_project_start
> AND fts_uri MATCH 'education school'
> ORDER BY u.uri_id, u_downloaded_on DESC;
>
> Terrible time.
>>> CPU Time: user 27.584849 sys 115.219293
>
> Step 5: EXPLAIN QUERY PLAN (above SELECT query)
>
> 0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows)
> 0|1|1|SEARCH TABLE uris AS u USING INTEGER PRIMARY KEY (rowid=?) (~1
> rows)
> 0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY
> (rowid=?) (~1 rows)
> 0|3|3|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1
> rows)
> 0|4|4|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?)
> (~1 rows)
> 0|0|0|USE TEMP B-TREE FOR ORDER BY
> CPU Time: user 0.000099 sys 0.000008
>
> Step 6: ANALYZE;
>
> Step 7: Run the above SELECT query again
>
> Terrible time.
>>> CPU Time: user 27.703538 sys 116.684390
>
>
>
> This train is going nowhere. The times are actually worse than they were when
> I was using a non-text column for date time.
>
> By the way, have tried this on two machines -- the previous one was a
> dual-Xeon Xserve with 12 GB RAM. The current machine of choice is the top of
> the line iMac (quad core 3.4 GHz Intel Core i7) with 12 GB RAM and a 7200 RPM
> SATA drive.
>
> ..
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users