On 13/02/2011, at 1:04 AM, Yuzem wrote:

> I am grabbing the data from the each movie imdb webpage.

Does IMDB allow use of their data this way? After my brief reading of their 
site, I thought they charge a $15k minimum per year for data.

> The script is written in bash and I can give you the code but I think it 
> would be very complicated to adapt it to your schema.

Complicated? Why? In what format is the grabbed data (ie what tables, columns, 
rows)? It usually just takes an insert select to move data from one table's 
schema to another.

>> You count directors like this:
>> 
>> select sum(Count) from "Capacity People Statistics"
>> where Capacity_ID = (select ID from Capacity where Name = 'director')
>> ;
>> 
> Are you sure that this count how many directors there are?

Well, yes, but it depends of the definition of "how many directors there are". 
The above counts how many directors there are in total, counting the same 
person for each movie they direct.

What you want, I think, however, is how many people there are who are directors 
(counting each person just once, even if they direct multiple movies), given by 
this:

select count(*) from "Capacity People Statistics"
where Capacity_ID = (select ID from Capacity where Name = 'director')
;

> If I understand it correctly (probably I don't) you have for example table
> "Movie People":
> 1|director|1
> 2|director|1
> 3|director|2
> 
> There are 2 directors and then in "Capacity People Statistics" you should
> have:
> director|1|2
> director|2|1
> 
> If you use the previous code:
> select sum(Count) from "Capacity People Statistics"
> where Capacity_ID = (select ID from Capacity where Name = 'director')
> ;
> 
> I think it will  return 3 but there are only 2 directors.

Correct. That query will tell you that there are three (3) directors in the 
database. But two (2) of those directors are the same person.

To instead get what you want, the number of people who are directors, do this 
(repeat of above SQL):

select count(*) from "Capacity People Statistics"
where Capacity_ID = (select ID from Capacity where Name = 'director')
;

which gives your desired answer of 2.

> Sorry, but I still don't understands it because I don't understand that you 
> can count directors that way.

Hopefully now that I've given you the query you actually needed, it now makes 
sense ;-)

>> I expect the above to be about the same speed or faster... than separate 
>> tables, but far more flexible (eg no need to add a table to accommodate a 
>> new capacity), and better normalized.
>> 

> But using different tables provides an instant result, you can try it with
> any table:
> SELECT count(*) from table;

Let me know if you see any noticeable difference in speed.

> In the other hand you are right, it is less flexible.

> Oh wait... I think I got it. I need two more tables:

No, I don't think you need more tables to achieve the above.

Thanks,
Tom
BareFeetWare

 --
Comparison of SQLite GUI tools:
http://www.barefeetware.com/sqlite/compare/?ml

_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to