On Feb 10, 2011, at 5:00 PM, Yuzem wrote: > The only thing I can think of is to have additional tables for the ids of > all directors, all writers, etc... > Tables "movies", "people", "capacity" and then tables "directors", > "writers", etc.. with only the IDs so I can count using count(*) which is > super fast.
Right... looking at something like the casting information in IMdb (actor, director, writer, etc), there are about 25M movie + role + person combinations. Which, while not huge, starts to get taxing when queried on a lowly laptop. For example, out of these 25M rows, about 1.2M represent directors (~217K) in movies (~1M). Assuming a movie_cast table [1] and a cast dimension [2], you could record the distinct count for each cast in the dimension once, and store it. That way you don't have to recompute it over and over. Alternatively, you could indeed partition that bulky movie_cast table by its cast type, which will in effect reduce the search space to at most ~9M (movies by actors, ~1.3M movies, ~1.3M actors). That said, even partitioning will not help you much here as you still have to search through ~9M records to figure out how many distinct actors you have. So... perhaps best to cheat :) One way to cheat is to precompute the answer by, for example, adding a is_* flag on your person table: update person set is_actor = ( select count( distinct person_id ) from movie_actor where movie_actor.person_id = person.id ); The person table is much smaller (~3M) and easier to query in bulk: explain query plan select count( * ) from person where is_actor = 1; 0|0|0|SEARCH TABLE person USING COVERING INDEX person_is_actor (is_actor=?) (~1596808 rows) 1310119 CPU Time: user 0.256286 sys 0.000257 There you go :) This is where I wish SQLite could have bitmap indices: http://en.wikipedia.org/wiki/Bitmap_index Oh, well... [1] http://dev.alt.textdrive.com/browser/IMDB/IMDB.ddl#L161 [2] http://dev.alt.textdrive.com/browser/IMDB/IMDB.ddl#L146 _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users