Re: [PERFORM] poor execution plan because column dependence

2011-04-13 Thread Václav Ovsík
Dear Bob, On Tue, Apr 12, 2011 at 05:14:29PM -0700, Bob Lunney wrote: Zito, Using psql log in as the database owner and run analyze verbose. Happiness will ensue. Unfortunately not. I ran analyze with different values default_statistics_target till 1000 as first tries always with the same

Re: [PERFORM] poor execution plan because column dependence

2011-04-13 Thread Václav Ovsík
Dear Tom, On Tue, Apr 12, 2011 at 08:52:15PM -0400, Tom Lane wrote: .. Yeah, that main.EffectiveId = main.id clause is going to be underestimated by a factor of about 200, which is most though not all of your rowcount error for that table. Not sure whether you can do much about it, if the

Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-13 Thread Glyn Astill
--- On Tue, 12/4/11, Greg Smith g...@2ndquadrant.com wrote: From: Greg Smith g...@2ndquadrant.com Subject: Re: [PERFORM] Linux: more cores = less concurrency. To: Kevin Grittner kevin.gritt...@wicourts.gov Cc: da...@lang.hm, Steve Clark scl...@netwolves.com, Glyn Astill

Re: [PERFORM] Performance

2011-04-13 Thread Ogden
On Apr 12, 2011, at 5:36 PM, Tomas Vondra wrote: Dne 12.4.2011 23:19, Ogden napsal(a): On Apr 12, 2011, at 4:09 PM, Tomas Vondra wrote: Dne 12.4.2011 20:28, Ogden napsal(a): On Apr 12, 2011, at 1:16 PM, Tomas Vondra wrote: Dne 12.4.2011 19:23, Ogden napsal(a): On Apr 12, 2011, at

Re: [PERFORM] Performance

2011-04-13 Thread tv
Thomas, Thank you for your very detailed and well written description. In conclusion, I should keep my random_page_cost (3.0) to a value more than seq_page_cost (1.0)? Is this bad practice or will this suffice for my setup (where the database is much bigger than the RAM in the system)? Or

Re: [PERFORM] Performance

2011-04-13 Thread Kevin Grittner
Ogden li...@darkstatic.com wrote: In conclusion, I should keep my random_page_cost (3.0) to a value more than seq_page_cost (1.0)? Is this bad practice or will this suffice for my setup (where the database is much bigger than the RAM in the system)? The idea is to adjust the costing

Re: [PERFORM] poor execution plan because column dependence

2011-04-13 Thread Tom Lane
=?iso-8859-1?Q?V=E1clav_Ovs=EDk?= vaclav.ov...@i.cz writes: On Tue, Apr 12, 2011 at 08:52:15PM -0400, Tom Lane wrote: ... If you can change it, try replacing main.EffectiveId = main.id with the underlying function, eg if they're integers use int4eq(main.EffectiveId, main.id). This will bypass

Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-13 Thread Scott Carey
If postgres is memory bandwidth constrained, what can be done to reduce its bandwidth use? Huge Pages could help some, by reducing page table lookups and making overall access more efficient. Compressed pages (speedy / lzo) in memory can help trade CPU cycles for memory usage for certain memory

Re: [PERFORM] HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?

2011-04-13 Thread Scott Carey
Sorry for resurrecting this thread, but this has been in my outbox for months and I think it is important: On Oct 27, 2010, at 12:56 PM, Tom Lane wrote: Scott Carey writes: Why does hashjoin behave poorly when the inner relation is not uniformly distributed and the outer is? Because a

Re: [PERFORM] HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?

2011-04-13 Thread Scott Carey
New email-client nightmares! Fixed below. I think. - Sorry for resurrecting this thread, but this has been in my outbox for months and I think it is important: On Oct 27, 2010, at 12:56 PM, Tom Lane wrote: Scott Carey writes: Why does hashjoin behave poorly when the inner

Re: [PERFORM] HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?

2011-04-13 Thread Tom Lane
Scott Carey sc...@richrelevance.com writes: On Oct 27, 2010, at 12:56 PM, Tom Lane wrote: Because a poorly distributed inner relation leads to long hash chains. In the very worst case, all the keys are on the same hash chain and it degenerates to a nested-loop join. A pathological skew case

Re: [PERFORM] HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?

2011-04-13 Thread Scott Carey
On 4/13/11 10:35 AM, Tom Lane t...@sss.pgh.pa.us wrote: Scott Carey sc...@richrelevance.com writes: On Oct 27, 2010, at 12:56 PM, Tom Lane wrote: Because a poorly distributed inner relation leads to long hash chains. In the very worst case, all the keys are on the same hash chain and it

Re: [PERFORM] Slow query postgres 8.3

2011-04-13 Thread Tomas Vondra
Dne 12.4.2011 09:33, Claudio Freire napsal(a): I actually implemented a statistical system for measuring these kinds of correlations. It's complex, but it might be adaptable to pgsql. Furthermore, one of the latest projects of mine was to replace the purely statistical approach with SVCs.

Re: [PERFORM] how explain works to Mr Nathan Boley

2011-04-13 Thread Tomas Vondra
Dne 12.4.2011 01:22, Radhya sahal napsal(a): Thanks Mr Nathan Boley , i want these equations to solve thsese equtions of parameters and total time in order to get each paramter formula i need these formula in my experiments is very important to know the rate for each parameter in total cost

Re: [PERFORM] Performance

2011-04-13 Thread Claudio Freire
On Wed, Apr 13, 2011 at 4:32 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: If you model the costing to reflect the reality on your server, good plans will be chosen. Wouldn't it be better to derive those costs from actual performance data measured at runtime? Say, pg could measure

Re: [PERFORM] Slow query postgres 8.3

2011-04-13 Thread Claudio Freire
On Wed, Apr 13, 2011 at 10:16 PM, Tomas Vondra t...@fuzzy.cz wrote: You mean Support Vector Classifiers? Interesting idea, although I don't see how to apply that to query planning, especially with non-numeric inputs. Could you share more details on that statistical system and how do you think

Re: [PERFORM] Performance

2011-04-13 Thread Tom Lane
Claudio Freire klaussfre...@gmail.com writes: On Wed, Apr 13, 2011 at 4:32 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: If you model the costing to reflect the reality on your server, good plans will be chosen. Wouldn't it be better to derive those costs from actual performance

Re: [PERFORM] Performance

2011-04-13 Thread Claudio Freire
On Wed, Apr 13, 2011 at 11:52 PM, Tom Lane t...@sss.pgh.pa.us wrote: Getting numbers that mean much of anything is a slow, expensive process.  You really don't want the database trying to do that for you. Once you've got them, you *really* don't want the database editorializing on them. So

Re: [PERFORM] Performance

2011-04-13 Thread Kevin Grittner
Claudio Freire klaussfre...@gmail.com wrote: So it hasn't even been tried. If you want to do that, I would be interested in your benchmark numbers. Or if you're not up to that, there are a number of companies which I'd bet would be willing to spend the time if they had a sponsor to pay for

Re: [PERFORM] Performance

2011-04-13 Thread Nathan Boley
If you model the costing to reflect the reality on your server, good plans will be chosen. Wouldn't it be better to derive those costs from actual performance data measured at runtime? Say, pg could measure random/seq page cost, *per tablespace* even. Has that been tried? FWIW, awhile

Re: [PERFORM] Performance

2011-04-13 Thread Kevin Grittner
Nathan Boley npbo...@gmail.com wrote: The problem is that caching effects have a large effect on the time it takes to access a random page, and caching effects are very workload dependent. So anything automated would probably need to optimize the parameter values over a set of 'typical'

Re: [PERFORM] Performance

2011-04-13 Thread Tomas Vondra
Dne 14.4.2011 00:05, Nathan Boley napsal(a): If you model the costing to reflect the reality on your server, good plans will be chosen. Wouldn't it be better to derive those costs from actual performance data measured at runtime? Say, pg could measure random/seq page cost, *per tablespace*

Re: [PERFORM] Performance

2011-04-13 Thread Claudio Freire
On Thu, Apr 14, 2011 at 12:19 AM, Tomas Vondra t...@fuzzy.cz wrote: Another issue is that when measuring multiple values (processing of different requests), the decisions may be contradictory so it really can't be fully automatic. I don't think it's s dependant on workload. It's

Re: [PERFORM] Performance

2011-04-13 Thread Tomas Vondra
Dne 14.4.2011 01:10, Claudio Freire napsal(a): On Thu, Apr 14, 2011 at 12:19 AM, Tomas Vondra t...@fuzzy.cz wrote: Another issue is that when measuring multiple values (processing of different requests), the decisions may be contradictory so it really can't be fully automatic. I don't

Re: [PERFORM] Performance

2011-04-13 Thread Tom Lane
Nathan Boley npbo...@gmail.com writes: FWIW, awhile ago I wrote a simple script to measure this and found that the *actual* random_page / seq_page cost ratio was much higher than 4/1. That 4:1 ratio is based on some rather extensive experimentation that I did back in 2000. In the interim,

Re: [PERFORM] Performance

2011-04-13 Thread Joshua D. Drake
On 04/13/2011 05:03 PM, Tom Lane wrote: That 4:1 ratio is based on some rather extensive experimentation that I did back in 2000. In the interim, disk transfer rates have improved quite a lot more than disk seek times have, and the CPU cost to process a page's worth of data has also improved

Re: [PERFORM] Performance

2011-04-13 Thread Scott Marlowe
On Wed, Apr 13, 2011 at 5:26 PM, Tomas Vondra t...@fuzzy.cz wrote: Workload A: Touches just a very small portion of the database, to the 'active' part actually fits into the memory. In this case the cache hit ratio can easily be close to 99%. Workload B: Touches large portion of the

Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-13 Thread Greg Smith
Scott Carey wrote: If postgres is memory bandwidth constrained, what can be done to reduce its bandwidth use? Huge Pages could help some, by reducing page table lookups and making overall access more efficient. Compressed pages (speedy / lzo) in memory can help trade CPU cycles for memory usage