Re: [sqlite] designing a db to hold repeating data

P Kishor Sun, 15 Mar 2009 18:37:13 -0700

On Sun, Mar 15, 2009 at 8:28 PM, Jim Wilcoxson <pri...@gmail.com> wrote:
> You could eliminate met_grid_id from the cells table and replace it
> with an expression cell_id/2500.  This expression will automatically
> truncate, giving you met_grid_id whenever you need it.  This will save
> around 5 MB for a 1M row cell table.
>
> Also, queries in the cells table by met_grid_id, if you do that, can
> be made much faster using the relationship above.  If you query
> directly on met_grid_id, sqlite will have to do a table scan.  But if
> you use the relationship, sqlite can use the index, for example:
>
>   select * from cells where met_grid_id = N  (table scan required)
>
> becomes:
>
>   select *,cell_id/2500 as met_grid_id from cells where cell_id
> between N*2500 and (N*5000)-1   (can use the cell_id index)
>
> I think by the same method you can eliminate met_grid_id from the met
> table, saving around 14.5MB



Interesting idea, and worth thinking about.

Of course, I can index met_grid_id and achieve the same speed boost, no?

I should have emphasized in my original post --

We are not constrained by db size, memory, and CPU power; we have
plenty of those to spare (we will be running our model on an Xgrid of
10 quad core Xserves). In fact, each machine is kitted with 32 GB of
RAM and 3 TB disk space, so I could load the entire db in memory and
work from there.

My only concern is query speed, data integrity (of course, that would
be helped by avoiding redundancy), and ease of querying. If I have a
300+ MB db, saving 5 or 14 MB doesn't gain me anything.

I want to do something like this --

"Give me all the land and weather data for cell number ?"

or

"Give me all the land and weather data for all the cells lying within the box ?"

Hence the R*Tree.


>
> Jim
>
>
> On 3/15/09, P Kishor <punk.k...@gmail.com> wrote:
>> I have a grid of 1000 x 1000 cells with their own data as well as 20
>> years of daily weather data (20 * 365 = 7300 rows) for each contiguous
>> group of 50 x 50 cell.
>>
>> CREATE TABLE cells (
>>       cell_id INTEGER PRIMARY KEY,
>>       other cell attributes,
>>       lat,
>>       lon,
>>       met_grid_id INTEGER
>> );
>>
>> cell_id met_grid_id
>> ------- ------------
>> 0       0
>> 1       0
>> ..
>> 2499    0
>> 2500    1
>> 2501    1
>> ..
>> 4999    1
>>
>> CREATE TABLE met (
>>       met_id INTEGER PRIMARY KEY,
>>       other met attributes,
>>       met_grid_id INTEGER
>> );
>>
>> met_id met_grid_id
>> ------ -----------
>> 0      0
>> 1      0
>> ..
>> 7299   0
>> 7300   1
>> 7301   1
>> ..
>>
>> CREATE VIRTUAL TABLE cell_index USING rtree (
>>       cell_id INTEGER,
>>       minx REAL,
>>       maxx REAL,
>>       miny REAL,
>>       maxy REAL
>> )
>>
>> The db is about 350 MB with the cell table with 1000,000 rows and the
>> met table with 2,920,000 rows and the R*Tree index.
>>
>> Is there any other better way that jumps out at any of you?
>>
>>
>> --
>> Puneet Kishor http://www.punkish.org/
>> Nelson Institute for Environmental Studies http://www.nelson.wisc.edu/
>> Carbon Model http://carbonmodel.org/
>> Open Source Geospatial Foundation http://www.osgeo.org/
>> Sent from: Madison WI United States.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] designing a db to hold repeating data

Reply via email to