Re: [PERFORM] hash join vs nested loop join

2012-12-20 Thread Huan Ruan
On 21 December 2012 01:06, Kevin Grittner wrote: > Huan Ruan wrote: > > Kevin Grittner wrote: > > >> Frankly, at 12 microseconds per matched pair of rows, I think > >> you're doing OK. > > > > This plan is the good one, I want the indexscan nested loop join and this > > is only achieved after mak

Re: [PERFORM] hash join vs nested loop join

2012-12-20 Thread Kevin Grittner
Huan Ruan wrote: > Kevin Grittner wrote: >> Frankly, at 12 microseconds per matched pair of rows, I think >> you're doing OK. > > This plan is the good one, I want the indexscan nested loop join and this > is only achieved after making all these costing factors change. Before > that, it was hash j

Re: [PERFORM] hash join vs nested loop join

2012-12-19 Thread Huan Ruan
Frankly, at 12 microseconds per matched pair of rows, I think > you're doing OK. > > This plan is the good one, I want the indexscan nested loop join and this is only achieved after making all these costing factors change. Before that, it was hash join and was very slow. However, I'm worried about

Re: [PERFORM] hash join vs nested loop join

2012-12-19 Thread Kevin Grittner
"Huan Ruan" wrote: > explain (analyze, buffers) > SELECT >  * > FROM IM_Match_Table smalltable >  inner join invtran bigtable on bigtable.invtranref = smalltable.invtranref Well, one table or the other will need to be read in full, and you would normally want that one to be the small table. When

Re: [PERFORM] hash join vs nested loop join

2012-12-18 Thread Huan Ruan
> Quite possibly, but it could be any of a number of other things, > like a type mismatch. It might be best to rule out other causes. If > you post the new query and EXPLAIN ANALYZE output, along with the > settings you have now adopted, someone may be able to spot > something. It wouldn't hurt to

Re: [PERFORM] hash join vs nested loop join

2012-12-14 Thread Kevin Grittner
Huan Ruan wrote: > Kevin Grittner wrote: >> With a low cache hit rate, that would generally be when the number >> of lookups into the table exceeds about 10% of the table's rows. > > So far, my main performance issue comes down to this pattern where > Postgres chooses hash join that's slower than

Re: [PERFORM] hash join vs nested loop join

2012-12-13 Thread Huan Ruan
> > With a low cache hit rate, that would generally be when the number > of lookups into the table exceeds about 10% of the table's rows. > > > So far, my main performance issue comes down to this pattern where Postgres chooses hash join that's slower than a nest loop indexed join. By changing thos

Re: [PERFORM] hash join vs nested loop join

2012-12-13 Thread Kevin Grittner
Huan Ruan wrote: > Interesting to see how you derived 100% cache hits. I assume by 'cache' you > mean the pg shared buffer plus the OS cache? Because the table is 23GB but > the shared buffer is only 6GB. Even then, I'm not completely convinced > because the total RAM is just 24GB, part of which w

Re: [PERFORM] hash join vs nested loop join

2012-12-13 Thread Huan Ruan
Hi Kevin Again, many thanks for your time and help. On 14 December 2012 02:26, Kevin Grittner wrote: > Huan Ruan wrote: > > > Hash 1st run > > > "Hash Join (cost=1681.87..6414169.04 rows=48261 width=171) > > (actual time=2182.450..88158.645 rows=48257 loops=1)" > > > " -> Seq Scan on invtran bi

Re: [PERFORM] hash join vs nested loop join

2012-12-13 Thread Huan Ruan
Hi Kevin On 13 December 2012 10:47, Kevin Grittner wrote: > Huan Ruan wrote: > > > is a lot slower than a nested loop join. > > Giving actual numbers is more useful than terms like "a lot". Even > better is to provide the output of EXPLAIN ANALYZZE rather than > just EXPLAIN. This shows estimate

Re: [PERFORM] hash join vs nested loop join

2012-12-13 Thread Huan Ruan
On 13 December 2012 03:28, Jeff Janes wrote: > > This looks like the same large-index over-penalty as discussed in the > recent thread "[PERFORM] Slow query: bitmap scan troubles". > > Back-patching the log(npages) change is starting to look like a good idea. > > Cheers, > > Jeff Thanks for the

Re: [PERFORM] hash join vs nested loop join

2012-12-13 Thread Kevin Grittner
Huan Ruan wrote: > Hash 1st run > "Hash Join (cost=1681.87..6414169.04 rows=48261 width=171) > (actual time=2182.450..88158.645 rows=48257 loops=1)" > " -> Seq Scan on invtran bigtable (cost=0.00..4730787.28 > rows=168121728 width=108) (actual time=0.051..32581.052 > rows=168121657 loops=1)" 19

Re: [PERFORM] hash join vs nested loop join

2012-12-12 Thread Kevin Grittner
Huan Ruan wrote: > is a lot slower than a nested loop join. Giving actual numbers is more useful than terms like "a lot". Even better is to provide the output of EXPLAIN ANALYZZE rather than just EXPLAIN. This shows estimates against actual numbers, and give timings. For more suggestions see this

Re: [PERFORM] hash join vs nested loop join

2012-12-12 Thread Jeff Janes
On Tue, Dec 11, 2012 at 8:25 PM, Huan Ruan wrote: > Hello All > > While investigating switching to Postgres, we come across a query plan that > uses hash join and is a lot slower than a nested loop join. > > I don't understand why the optimiser chooses the hash join in favor of the > nested loop.

Re: [PERFORM] hash join vs nested loop join

2012-12-11 Thread Evgeny Shishkin
On Dec 12, 2012, at 8:57 AM, Evgeny Shishkin wrote: > > On Dec 12, 2012, at 8:44 AM, Huan Ruan wrote: > >> >> On 12 December 2012 15:33, Evgeny Shishkin wrote: >> Optimiser thinks that nested loop is more expensive, because of point PK >> lookups, which a random io. >> Can you set random_p

Re: [PERFORM] hash join vs nested loop join

2012-12-11 Thread Evgeny Shishkin
On Dec 12, 2012, at 8:25 AM, Huan Ruan wrote: > Hello All > > While investigating switching to Postgres, we come across a query plan that > uses hash join and is a lot slower than a nested loop join. > > I don't understand why the optimiser chooses the hash join in favor of the > nested loop

[PERFORM] hash join vs nested loop join

2012-12-11 Thread Huan Ruan
Hello All While investigating switching to Postgres, we come across a query plan that uses hash join and is a lot slower than a nested loop join. I don't understand why the optimiser chooses the hash join in favor of the nested loop. What can I do to get the optimiser to make a better decision (n