P Kishor wrote:
On 2/23/07, Dennis Cote <[EMAIL PROTECTED]> wrote:
P Kishor wrote:
>
>
> Most of the time I am looking at one site, so there is a speed gain by
> not plowing through other sites' data. This is what is causing me to
> pause before I rush forward.
>
If you have an index on the readings table by site_id (and perhaps
timestamp) sqlite will use the index to go directly to that site's
records and will not plow through all the other site's data. Adding a
second field, like the timestamp, to the index will let you do fast
searches or selections based on time within a single site as well.

P.S. I would recommend keeping the integer primary key ID fields.
Because sqlite uses them for the rowid (which each record in each table
must have anyway) they cost you nothing and make joins more direct and
obvious.

HTH

yes, Dennis, this does help. Thanks. I liked John Stanton's suggestion
as well, very elegant, but I think your approach is more scalable, as
once the prototype is done and working well, it will be migrated to
PostGres. A question -- you say

If you have an index on the readings table by site_id (and perhaps
timestamp) sqlite will use the index to go directly to that site's
records and will not plow through all the other site's data. Adding a
second field, like the timestamp, to the index will let you do fast

well, site_id is a PRIMARY KEY, so it is automatically indexed. I am
not creating another index on it. I am indexing timestamp, but are you
suggesting a compound site,timestamp index?

Puneet,

The column site_id is a primary key in the sites table only. It is an unindexed column in the readings table. I was suggesting an index on site_id in readings table. This will provide an index so sqlite can start at the first site specific record when doing selects from the reading table with a where clause that includes the site. This sill eliminate the table scan of the readings table.

   create index reading_site on readings(site)

   select ... from readings where site = ? and ...

Yes, I was suggesting a compound index. If you use the following compound index

   create index reading_site_timestamp on reading(site, timestamp)

it will be used for site specific queries and will also let you order and/or condition your search by time.

select ... from readings where site = ? and timestamp > ? order by timestamp.

This will start at the first record that matches both the site and timestamp criteria and scan only records that match those criteria.

Also, if I have two cols a and b, and I create separate indexes on a
and b, would it hurt if I also have an additional compound index on
a,b? In other words, does SQLite have the smarts to know when to use
what?

Yes and no. It will take more space to store the additional index, and it also take longer to do inserts since sqlite must update the additional indexes. And, sqlite will only use one of the indexes in any particular query, regardless of how many might be useful. If you do an analyze after the tables are populated, sqlite should pick the best available index for a given query. It is not useful to have an index on b if you never search by that column by itself. It can't be used in conjunction with the index on a to satisfy a query like the second one above, you need a compound index.

HTH
Dennis Cote

-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to