Re: [HACKERS] Netflix Prize data

2006-10-05 Thread Heikki Linnakangas
Mark Woodward wrote: I tried to cluster the data along a particular index but had to cancel it after 3 hours. If the data is in random order, it's faster to do SELECT * INTO foo_sorted FROM foo ORDER BY bar then CREATE INDEX, than to run CLUSTER. That's because CLUSTER does a full index sca

Re: [HACKERS] Netflix Prize data

2006-10-04 Thread Mark Woodward
> "Mark Woodward" <[EMAIL PROTECTED]> writes: >> The rating, however, is one char 1~9. Would making it a char(1) buy >> anything? > > No, that would actually hurt because of the length word for the char > field. Even if you used the "char" type, which really is only one byte, > you wouldn't win an

Re: [HACKERS] Netflix Prize data

2006-10-04 Thread Tom Lane
"Mark Woodward" <[EMAIL PROTECTED]> writes: > The rating, however, is one char 1~9. Would making it a char(1) buy anything? No, that would actually hurt because of the length word for the char field. Even if you used the "char" type, which really is only one byte, you wouldn't win anything becaus

Re: [HACKERS] Netflix Prize data

2006-10-04 Thread Mark Woodward
> > "Greg Sabino Mullane" <[EMAIL PROTECTED]> writes: > >> CREATE TABLE rating ( >> movie SMALLINT NOT NULL, >> person INTEGER NOT NULL, >> rating SMALLINT NOT NULL, >> viewed DATE NOT NULL >> ); > > You would probably be better off putting the two smallints first followed > by > the

Re: [HACKERS] Netflix Prize data

2006-10-04 Thread Gregory Stark
"Greg Sabino Mullane" <[EMAIL PROTECTED]> writes: > CREATE TABLE rating ( > movie SMALLINT NOT NULL, > person INTEGER NOT NULL, > rating SMALLINT NOT NULL, > viewed DATE NOT NULL > ); You would probably be better off putting the two smallints first followed by the integer and date.

Re: [HACKERS] Netflix Prize data

2006-10-04 Thread Mark Woodward
> "Mark Woodward" <[EMAIL PROTECTED]> writes: >> The one thing I notice is that it is REAL slow. > > How fast is your disk? Counting on my fingers, I estimate you are > scanning the table at about 47MB/sec, which might or might not be > disk-limited... > >> I'm using 8.1.4. The "rdate" field looks

Re: [HACKERS] Netflix Prize data

2006-10-04 Thread Mark Woodward
>> I signed up for the Netflix Prize. (www.netflixprize.com) >> and downloaded their data and have imported it into PostgreSQL. >> Here is how I created the table: > > I signed up as well, but have the table as follows: > > CREATE TABLE rating ( > movie SMALLINT NOT NULL, > person INTEGER NOT

Re: [HACKERS] Netflix Prize data

2006-10-04 Thread Greg Sabino Mullane
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 > I signed up for the Netflix Prize. (www.netflixprize.com) > and downloaded their data and have imported it into PostgreSQL. > Here is how I created the table: I signed up as well, but have the table as follows: CREATE TABLE rating ( movie SMAL

Re: [HACKERS] Netflix Prize data

2006-10-04 Thread Gregory Stark
"Mark Woodward" <[EMAIL PROTECTED]> writes: > I'm using 8.1.4. The "rdate" field looks something like: "2005-09-06" So, > the raw data is 23 bytes, the date string will probably be rounded up to > 12 bytes, that's 24 bytes per row of data. What is the overhead per > variable? per row? > > Is ther

Re: [HACKERS] Netflix Prize data

2006-10-04 Thread Tom Lane
"Mark Woodward" <[EMAIL PROTECTED]> writes: > The one thing I notice is that it is REAL slow. How fast is your disk? Counting on my fingers, I estimate you are scanning the table at about 47MB/sec, which might or might not be disk-limited... > I'm using 8.1.4. The "rdate" field looks something l

Re: [HACKERS] Netflix Prize data

2006-10-04 Thread Luke Lonergan
Mark, On 10/4/06 1:43 PM, "Mark Woodward" <[EMAIL PROTECTED]> wrote: > [EMAIL PROTECTED]:~/netflix$ time psql netflix -c "select count(*) from > ratings" >count > --- > 100480507 > (1 row) > > > real2m6.270s > user0m0.004s > sys 0m0.005s I think you are getting about

[HACKERS] Netflix Prize data

2006-10-04 Thread Mark Woodward
I signed up for the Netflix Prize. (www.netflixprize.com) and downloaded their data and have imported it into PostgreSQL. Here is how I created the table: Table "public.ratings" Column | Type | Modifiers +-+--- item | integer | client | integer | rating | intege