[PERFORM] Re: [PERFORM] Re: [PERFORM] Re: [PERFORM] Re: [PERFORM] Re: 回复: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-03-01 Thread Claudio Freire
On Thu, Mar 1, 2012 at 10:13 PM, Tomas Vondra wrote: > > Maybe. I still am not sure how fsync=off affects the eviction in your > opinion. I think it does not (or just very remotely) and you were saying > the opposite. IMHO the eviction of (dirty) buffers is either very fast > or slow, no matter wh

Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-03-01 Thread Andrew Dunstan
On 03/01/2012 07:58 PM, Claudio Freire wrote: On Thu, Mar 1, 2012 at 9:28 PM, Peter van Hardenberg wrote: Setting work_mem to hundreds of MB in a 4G system is suicide. Tens even is dangerous. Why do you say that? We've had work_mem happily at 100MB for years. Is there a particular degenerat

[PERFORM] Re: [PERFORM] Re: [PERFORM] Re: [PERFORM] Re: [PERFORM] Re: 回复: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-03-01 Thread Tomas Vondra
On 28.2.2012 17:42, Claudio Freire wrote: > On Tue, Feb 28, 2012 at 1:05 PM, Tomas Vondra wrote: >> On 28 Únor 2012, 15:24, Claudio Freire wrote: >>> It speeds a lot more than the initial load of data. >>> >>> Assuming the database is read-only, but not the filesystem (ie: it's >>> not a slave, in

Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-03-01 Thread Claudio Freire
On Thu, Mar 1, 2012 at 9:28 PM, Peter van Hardenberg wrote: >> Setting work_mem to hundreds of MB in a 4G system is suicide. Tens >> even is dangerous. >> > > Why do you say that? We've had work_mem happily at 100MB for years. Is > there a particular degenerate case you're concerned about? Me too

Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-03-01 Thread Jeff Janes
On Wed, Feb 29, 2012 at 7:28 AM, Stefan Keller wrote: > 2012/2/29 Stefan Keller : >> 2012/2/29 Jeff Janes : It's quite possible the vacuum full is thrashing your disk cache due to maintainance_work_mem. You can overcome this issue with the tar trick, which is more easily performed a

Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-03-01 Thread Peter van Hardenberg
On Thu, Mar 1, 2012 at 4:23 PM, Claudio Freire wrote: > For a read-only database, as was discussed, a lower shared_buffers > settings makes sense. And 128M is low enough, I'd guess. > > Setting work_mem to hundreds of MB in a 4G system is suicide. Tens > even is dangerous. > Why do you say that?

Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-03-01 Thread Claudio Freire
On Thu, Mar 1, 2012 at 8:08 PM, Andrew Dunstan wrote: > These are extremely low settings on virtually any modern computer. I usually > look to set shared buffers in numbers of Gb and work_mem at least in tens if > not hundreds of Mb for any significantly sized database. For a read-only database,

Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-03-01 Thread Andrew Dunstan
On 03/01/2012 05:52 PM, Stefan Keller wrote: These are the current modified settings in postgresql.conf: shared_buffers = 128MB work_mem = 3MB These are extremely low settings on virtually any modern computer. I usually look to set shared buffers in numbers of Gb and work_mem at least in te

Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-03-01 Thread Stefan Keller
2012/3/1 Jeff Janes : > On Tue, Feb 28, 2012 at 3:46 PM, Stefan Keller wrote: >> 2012/2/28 Claudio Freire : >>> >>> In the OP, you say "There is enough main memory to hold all table >>> contents.". I'm assuming, there you refer to your current system, with >>> 4GB memory. >> >> Sorry for the confu

Re: [PERFORM] [planner] Ignore "order by" in subselect if parrent do count(*)

2012-03-01 Thread Tom Lane
Craig James writes: > On Thu, Mar 1, 2012 at 9:50 AM, Tom Lane wrote: >> Considering that ORDER BY in a subquery isn't even legal per spec, > That's surprising ... normally it won't affect the result, but with an > offset or limit it would. Does the offset or limit change the "not > even legal"

Re: [PERFORM] Bad estimation for "where field not in"

2012-03-01 Thread Tom Lane
Ants Aasma writes: > On Thu, Mar 1, 2012 at 6:40 PM, Daniele Varrazzo > wrote: >> Is this a known planner shortcoming or something unexpected, to be >> escalated to -bugs? Server version is 9.0.1. > The relevant code is in scalararraysel() function. It makes the > assumption that element wise co

Re: [PERFORM] Large insert and delete batches

2012-03-01 Thread Marti Raudsepp
On Thu, Mar 1, 2012 at 21:06, Kääriäinen Anssi wrote: > The queries are "select * from the_table where id = > ANY(ARRAY[list_of_numbers])" > and the similar delete, too. > [...] However, once you go into > millions of items in the list, the query will OOM my Postgres server. The problem with IN

Re: [PERFORM] efficient data reduction (and deduping)

2012-03-01 Thread Kevin Grittner
Alessandro Gagliardi wrote: > All of our servers run in UTC specifically to avoid this sort of > problem. It's kind of annoying actually, because we're a San > Francisco company and so whenever I have to do daily analytics, I > have to shift everything to Pacific. But in this case it's handy.

Re: [PERFORM] efficient data reduction (and deduping)

2012-03-01 Thread Alessandro Gagliardi
Ah, yes, that makes sense. Thank you! On Thu, Mar 1, 2012 at 11:39 AM, Claudio Freire wrote: > On Thu, Mar 1, 2012 at 4:35 PM, Alessandro Gagliardi > wrote: > > Interesting solution. If I'm not mistaken, this does solve the problem of > > having two entries for the same user at the exact same ti

Re: [PERFORM] efficient data reduction (and deduping)

2012-03-01 Thread Claudio Freire
On Thu, Mar 1, 2012 at 4:39 PM, Claudio Freire wrote: >> Interesting solution. If I'm not mistaken, this does solve the problem of >> having two entries for the same user at the exact same time (which violates >> my pk constraint) but it does so by leaving both of them out (since there is >> no au

Re: [PERFORM] efficient data reduction (and deduping)

2012-03-01 Thread Claudio Freire
On Thu, Mar 1, 2012 at 4:35 PM, Alessandro Gagliardi wrote: > Interesting solution. If I'm not mistaken, this does solve the problem of > having two entries for the same user at the exact same time (which violates > my pk constraint) but it does so by leaving both of them out (since there is > no

Re: [PERFORM] efficient data reduction (and deduping)

2012-03-01 Thread Alessandro Gagliardi
Interesting solution. If I'm not mistaken, this does solve the problem of having two entries for the same user at the exact same time (which violates my pk constraint) but it does so by leaving both of them out (since there is no au1.hr_timestamp > au2.hr_timestamp in that case). Is that right? On

Re: [PERFORM] efficient data reduction (and deduping)

2012-03-01 Thread Alessandro Gagliardi
Hah! Yeah, that might would work. Except that I suck at grep. :( Perhaps that's a weakness I should remedy. On Thu, Mar 1, 2012 at 10:35 AM, Craig James wrote: > On Thu, Mar 1, 2012 at 10:27 AM, Alessandro Gagliardi > wrote: > > Hi folks, > > > > I have a system that racks up about 40M log line

Re: [PERFORM] efficient data reduction (and deduping)

2012-03-01 Thread Alessandro Gagliardi
All of our servers run in UTC specifically to avoid this sort of problem. It's kind of annoying actually, because we're a San Francisco company and so whenever I have to do daily analytics, I have to shift everything to Pacific. But in this case it's handy. Thanks for the keen eye though. On Thu,

Re: [PERFORM] efficient data reduction (and deduping)

2012-03-01 Thread Alessandro Gagliardi
I was thinking of adding an index, but thought it would be pointless since I would only be using the index once before dropping the table (after its loaded into hourly_activity). I assumed it would take longer to create the index and then use it than to just seq scan once or twice. Am I wrong in th

Re: [PERFORM] Large insert and delete batches

2012-03-01 Thread Kääriäinen Anssi
Quoting myself: """ So, is there some common wisdom about the batch sizes? Or is it better to do the inserts and deletes in just one batch? I think the case for performance problems needs to be strong before default limits are considered for PostgreSQL. """ I did a little test about this. My test

Re: [PERFORM] efficient data reduction (and deduping)

2012-03-01 Thread Kevin Grittner
Alessandro Gagliardi wrote: > hr_timestamp timestamp without time zone, In addition to the responses which more directly answer your question, I feel I should point out that this will not represent a single moment in time. At the end of Daylight Saving Time, the value will jump backward and

Re: [PERFORM] efficient data reduction (and deduping)

2012-03-01 Thread Peter van Hardenberg
On Thu, Mar 1, 2012 at 10:27 AM, Alessandro Gagliardi wrote: > Now, I want to reduce that data to get the last activity that was performed > by each user in any given hour. It should fit into a table like this: > How about: 1) Create an expression based index on date_trunc('hour', hr_timestamp)

Re: [PERFORM] efficient data reduction (and deduping)

2012-03-01 Thread Craig James
On Thu, Mar 1, 2012 at 10:27 AM, Alessandro Gagliardi wrote: > Hi folks, > > I have a system that racks up about 40M log lines per day. I'm able to COPY > the log files into a PostgreSQL table that looks like this: Since you're using a COPY command and the table has a simple column with exactly t

Re: [PERFORM] efficient data reduction (and deduping)

2012-03-01 Thread Claudio Freire
On Thu, Mar 1, 2012 at 3:27 PM, Alessandro Gagliardi wrote: > INSERT INTO hourly_activity >     SELECT DISTINCT date_trunc('hour', hr_timestamp) AS activity_hour, > activity_unlogged.user_id, >                     client_ip, hr_timestamp, locale, log_id, method, > server_ip, uri, user_agent >    

[PERFORM] efficient data reduction (and deduping)

2012-03-01 Thread Alessandro Gagliardi
Hi folks, I have a system that racks up about 40M log lines per day. I'm able to COPY the log files into a PostgreSQL table that looks like this: CREATE TABLE activity_unlogged ( user_id character(24) NOT NULL, client_ip inet, hr_timestamp timestamp without time zone, locale character var

Re: [PERFORM] [planner] Ignore "order by" in subselect if parrent do count(*)

2012-03-01 Thread Craig James
On Thu, Mar 1, 2012 at 9:50 AM, Tom Lane wrote: > "Kevin Grittner" writes: >> Marcin Miros*aw wrote: >>> SELECT count(*) >>> from (select * from users_profile order by id) u_p; > >>> "order by id" can be ignored by planner. > >> This has been discussed before.  Certainly not all ORDER BY clauses

Re: [PERFORM] Bad estimation for "where field not in"

2012-03-01 Thread Ants Aasma
On Thu, Mar 1, 2012 at 6:40 PM, Daniele Varrazzo wrote: > Is this a known planner shortcoming or something unexpected, to be > escalated to -bugs? Server version is 9.0.1. The relevant code is in scalararraysel() function. It makes the assumption that element wise comparisons are completely indep

Re: [PERFORM] [planner] Ignore "order by" in subselect if parrent do count(*)

2012-03-01 Thread Tom Lane
"Kevin Grittner" writes: > Marcin Miros*aw wrote: >> SELECT count(*) >> from (select * from users_profile order by id) u_p; >> "order by id" can be ignored by planner. > This has been discussed before. Certainly not all ORDER BY clauses > within query steps can be ignored, so there would need

Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-03-01 Thread Dave Crooke
Just curious ... has anyone tried using a ram disk as the PG primary and DRBD as the means to make it persistent? On Mar 1, 2012 11:35 AM, "Scott Marlowe" wrote: > On Thu, Mar 1, 2012 at 9:57 AM, Jeff Janes wrote: > > On Tue, Feb 28, 2012 at 3:46 PM, Stefan Keller > wrote: > >> 2012/2/28 Claudi

Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-03-01 Thread Scott Marlowe
On Thu, Mar 1, 2012 at 9:57 AM, Jeff Janes wrote: > On Tue, Feb 28, 2012 at 3:46 PM, Stefan Keller wrote: >> 2012/2/28 Claudio Freire : >>> >>> In the OP, you say "There is enough main memory to hold all table >>> contents.". I'm assuming, there you refer to your current system, with >>> 4GB memo

Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-03-01 Thread Jeff Janes
On Tue, Feb 28, 2012 at 3:46 PM, Stefan Keller wrote: > 2012/2/28 Claudio Freire : >> >> In the OP, you say "There is enough main memory to hold all table >> contents.". I'm assuming, there you refer to your current system, with >> 4GB memory. > > Sorry for the confusion: I'm doing these tests on

Re: [PERFORM] [planner] Ignore "order by" in subselect if parrent do count(*)

2012-03-01 Thread Kevin Grittner
Marcin Miros*aw wrote: > SELECT count(*) > from (select * from users_profile order by id) u_p; > "order by id" can be ignored by planner. This has been discussed before. Certainly not all ORDER BY clauses within query steps can be ignored, so there would need to be code to determine whethe

[PERFORM] Bad estimation for "where field not in"

2012-03-01 Thread Daniele Varrazzo
Hello, We have a table with about 60M records, almost all of which in one of two statuses ('done', 'failed') and a few of them, usually < 1000, in different transient statuses. We also have a partial index indexing the transient items: where status not in ('done', 'failed'). Stats are about right

Re: [PERFORM] [planner] Ignore "order by" in subselect if parrent do count(*)

2012-03-01 Thread Marcin Mirosław
W dniu 01.03.2012 13:09, Szymon Guz pisze: > Could you provide the postgres version and the structure of > users_profile table (with indexes)? Argh, i forgot about version. It's postgresql-9.1.3. I don't think structre of users_profile is important here. Me idea is let planner ignore sorting compl

Re: [PERFORM] [planner] Ignore "order by" in subselect if parrent do count(*)

2012-03-01 Thread Szymon Guz
On 1 March 2012 13:02, Marcin Mirosław wrote: > W dniu 01.03.2012 12:50, Szymon Guz pisze: > Hi Szymon, > > If you have only 2 rows in the table, then the plan really doesn't > > matter too much. Sorting two rows would be really fast :) > > > > Try to check it with 10k rows. > > It doesn't matter

Re: [PERFORM] [planner] Ignore "order by" in subselect if parrent do count(*)

2012-03-01 Thread Marcin Mirosław
W dniu 01.03.2012 12:50, Szymon Guz pisze: Hi Szymon, > If you have only 2 rows in the table, then the plan really doesn't > matter too much. Sorting two rows would be really fast :) > > Try to check it with 10k rows. It doesn't matter (in this case) how many records is in user_profile table. Pla

Re: [PERFORM] [planner] Ignore "order by" in subselect if parrent do count(*)

2012-03-01 Thread Szymon Guz
On 1 March 2012 12:45, Marcin Mirosław wrote: > Hello, > my example query (and explain) is: > $ explain SELECT count(*) from (select * from users_profile order by id) > u_p; >QUERY PLAN > --- >

[PERFORM] [planner] Ignore "order by" in subselect if parrent do count(*)

2012-03-01 Thread Marcin Mirosław
Hello, my example query (and explain) is: $ explain SELECT count(*) from (select * from users_profile order by id) u_p; QUERY PLAN --- Aggregate (cost=1.06..1.07 rows=1 width=0) -> Sort (c