Mark Woodward wrote:
I tried to cluster the data along a particular index but had to cancel it
after 3 hours.
If the data is in random order, it's faster to do
SELECT * INTO foo_sorted FROM foo ORDER BY bar
then CREATE INDEX, than to run CLUSTER.
That's because CLUSTER does a full index sca
> "Mark Woodward" <[EMAIL PROTECTED]> writes:
>> The rating, however, is one char 1~9. Would making it a char(1) buy
>> anything?
>
> No, that would actually hurt because of the length word for the char
> field. Even if you used the "char" type, which really is only one byte,
> you wouldn't win an
"Mark Woodward" <[EMAIL PROTECTED]> writes:
> The rating, however, is one char 1~9. Would making it a char(1) buy anything?
No, that would actually hurt because of the length word for the char
field. Even if you used the "char" type, which really is only one byte,
you wouldn't win anything becaus
>
> "Greg Sabino Mullane" <[EMAIL PROTECTED]> writes:
>
>> CREATE TABLE rating (
>> movie SMALLINT NOT NULL,
>> person INTEGER NOT NULL,
>> rating SMALLINT NOT NULL,
>> viewed DATE NOT NULL
>> );
>
> You would probably be better off putting the two smallints first followed
> by
> the
"Greg Sabino Mullane" <[EMAIL PROTECTED]> writes:
> CREATE TABLE rating (
> movie SMALLINT NOT NULL,
> person INTEGER NOT NULL,
> rating SMALLINT NOT NULL,
> viewed DATE NOT NULL
> );
You would probably be better off putting the two smallints first followed by
the integer and date.
> "Mark Woodward" <[EMAIL PROTECTED]> writes:
>> The one thing I notice is that it is REAL slow.
>
> How fast is your disk? Counting on my fingers, I estimate you are
> scanning the table at about 47MB/sec, which might or might not be
> disk-limited...
>
>> I'm using 8.1.4. The "rdate" field looks
>> I signed up for the Netflix Prize. (www.netflixprize.com)
>> and downloaded their data and have imported it into PostgreSQL.
>> Here is how I created the table:
>
> I signed up as well, but have the table as follows:
>
> CREATE TABLE rating (
> movie SMALLINT NOT NULL,
> person INTEGER NOT
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
> I signed up for the Netflix Prize. (www.netflixprize.com)
> and downloaded their data and have imported it into PostgreSQL.
> Here is how I created the table:
I signed up as well, but have the table as follows:
CREATE TABLE rating (
movie SMAL
"Mark Woodward" <[EMAIL PROTECTED]> writes:
> I'm using 8.1.4. The "rdate" field looks something like: "2005-09-06" So,
> the raw data is 23 bytes, the date string will probably be rounded up to
> 12 bytes, that's 24 bytes per row of data. What is the overhead per
> variable? per row?
>
> Is ther
"Mark Woodward" <[EMAIL PROTECTED]> writes:
> The one thing I notice is that it is REAL slow.
How fast is your disk? Counting on my fingers, I estimate you are
scanning the table at about 47MB/sec, which might or might not be
disk-limited...
> I'm using 8.1.4. The "rdate" field looks something l
Mark,
On 10/4/06 1:43 PM, "Mark Woodward" <[EMAIL PROTECTED]> wrote:
> [EMAIL PROTECTED]:~/netflix$ time psql netflix -c "select count(*) from
> ratings"
>count
> ---
> 100480507
> (1 row)
>
>
> real2m6.270s
> user0m0.004s
> sys 0m0.005s
I think you are getting about
I signed up for the Netflix Prize. (www.netflixprize.com) and downloaded
their data and have imported it into PostgreSQL. Here is how I created the
table:
Table "public.ratings"
Column | Type | Modifiers
+-+---
item | integer |
client | integer |
rating | intege
12 matches
Mail list logo