On Feb 10, 2011, at 5:00 PM, Yuzem wrote:

> The only thing I can think of is to have additional tables for the ids of
> all directors, all writers, etc...
> Tables "movies", "people", "capacity" and then tables "directors",
> "writers", etc.. with only the IDs so I can count using count(*) which is
> super fast.

Right... looking at something like the casting information in IMdb (actor, 
director, writer, etc), there are about 25M movie + role + person combinations. 
Which, while not huge, starts to get taxing when queried on a lowly laptop.

For example, out of these 25M rows, about 1.2M represent directors (~217K) in 
movies (~1M). 

Assuming a movie_cast table [1] and a cast dimension [2], you could record the 
distinct count for each cast in the dimension once, and store it. That way you 
don't have to recompute it over and over.

Alternatively, you could indeed partition that bulky movie_cast table by its 
cast type, which will in effect reduce the search space to at most ~9M (movies 
by actors, ~1.3M movies, ~1.3M actors). 

That said, even partitioning will not help you much here as you still have to 
search through ~9M records to figure out how many distinct actors you have.

So... perhaps best to cheat :)

One way to cheat is to precompute the answer by, for example, adding a is_* 
flag on your person table:

update person set is_actor = ( select count( distinct person_id ) from 
movie_actor where movie_actor.person_id = person.id );

The person table is much smaller (~3M) and easier to query in bulk:

explain query plan
select count( * ) from person where is_actor = 1;
0|0|0|SEARCH TABLE person USING COVERING INDEX person_is_actor (is_actor=?) 
(~1596808 rows)

1310119
CPU Time: user 0.256286 sys 0.000257

There you go :)

This is where I wish SQLite could have bitmap indices:

http://en.wikipedia.org/wiki/Bitmap_index

Oh, well...

[1] http://dev.alt.textdrive.com/browser/IMDB/IMDB.ddl#L161
[2] http://dev.alt.textdrive.com/browser/IMDB/IMDB.ddl#L146




_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to