[PERFORM] Optimizer : query rewrite and execution plan ?

2008-02-06 Thread SURANTYN Jean François
Hi I have discovered an issue on my Postgresql database recently installed : it seems that the optimizer can not, when possible, simplify and rewrite a simple query before running it. Here is a simple and reproducible example : my_db=# create table test (n numeric); CREATE my_db=# insert

Re: [PERFORM] Optimizer : query rewrite and execution plan ?

2008-02-06 Thread Richard Huxton
SURANTYN Jean François wrote: my_db=# explain select * from test where n = 1; my_db=# explain select * from test where n = 1 and n = 1; In the first SELECT query (with where n=1), the estimated number of returned rows is correct (10), whereas in the second SELECT query (with where n=1 and

Re: [PERFORM] Optimizer : query rewrite and execution plan ?

2008-02-06 Thread Richard Huxton
SURANTYN Jean François wrote: Many thanks for your quick reply In fact, that issue comes from a recent migration from Oracle to Postgresql, and even if some queries were not optimized by the past (example: where n=1 and n=1), Oracle was able to rewrite them and to hide the bad queries. But

Re: [PERFORM] Optimizer : query rewrite and execution plan ?

2008-02-06 Thread SURANTYN Jean François
Many thanks for your quick reply In fact, that issue comes from a recent migration from Oracle to Postgresql, and even if some queries were not optimized by the past (example: where n=1 and n=1), Oracle was able to rewrite them and to hide the bad queries. But now that we have migrated to

Re: [PERFORM] Benchmark Data requested

2008-02-06 Thread Dimitri Fontaine
Le mercredi 06 février 2008, Greg Smith a écrit : pgloader is a great tool for a lot of things, particularly if there's any chance that some of your rows will get rejected. But the way things pass through the Python/psycopg layer made it uncompetative (more than 50% slowdown) against the

Re: [PERFORM] Optimizer : query rewrite and execution plan ?

2008-02-06 Thread Simon Riggs
On Wed, 2008-02-06 at 09:42 +0100, SURANTYN Jean François wrote: That issue is very annoying because with generated SQL queries (from Business Objects for example) on big tables, it is possible that some queries have several times the same where condition (where n=1 and n=1 for example), and

Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas

2008-02-06 Thread Simon Riggs
On Wed, 2008-02-06 at 12:27 +0100, Dimitri Fontaine wrote: Multi-Threading behavior and CE support -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Now, pgloader will be able to run N threads, each one loading some data to a partitionned child-table target. N will certainly be configured depending on

Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas

2008-02-06 Thread Dimitri Fontaine
Hi, I've been thinking about this topic some more, and as I don't know when I'll be able to go and implement it I'd want to publish the ideas here. This way I'll be able to find them again :) Le mardi 05 février 2008, Dimitri Fontaine a écrit : Le mardi 05 février 2008, Simon Riggs a écrit :

Re: [PERFORM] Optimizer : query rewrite and execution plan ?

2008-02-06 Thread Theo Kramer
On Wed, 2008-02-06 at 11:53 +, Simon Riggs wrote: On Wed, 2008-02-06 at 09:42 +0100, SURANTYN Jean François wrote: That issue is very annoying because with generated SQL queries (from Business Objects for example) on big tables, it is possible that some queries have several times the

Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas

2008-02-06 Thread Dimitri Fontaine
Le mercredi 06 février 2008, Simon Riggs a écrit : For me, it would be good to see a --parallel=n parameter that would allow pg_loader to distribute rows in round-robin manner to n different concurrent COPY statements. i.e. a non-routing version. What happen when you want at most N parallel

Re: [PERFORM] Optimizer : query rewrite and execution plan ?

2008-02-06 Thread Roberts, Jon
Since the SQL is not your fault and difficult to control, it is an argument in favour of an optional planner mode that would perform additional checks for redundant clauses of various kinds. The default for that would be off since most people don't suffer from this problem. BO isn't the

Re: [PERFORM] Optimizer : query rewrite and execution plan ?

2008-02-06 Thread Erik Jones
On Feb 6, 2008, at 7:35 AM, Roberts, Jon wrote: Since the SQL is not your fault and difficult to control, it is an argument in favour of an optional planner mode that would perform additional checks for redundant clauses of various kinds. The default for that would be off since most people

Re: [PERFORM] Benchmark Data requested

2008-02-06 Thread Greg Smith
On Wed, 6 Feb 2008, Dimitri Fontaine wrote: Did you compare to COPY or \copy? COPY. If you're loading a TB, if you're smart it's going onto the server itself if it all possible and loading directly from there. Would probably get a closer comparision against psql \copy, but recognize

Re: [PERFORM] Optimizer : query rewrite and execution plan ?

2008-02-06 Thread Roberts, Jon
Since the SQL is not your fault and difficult to control, it is an argument in favour of an optional planner mode that would perform additional checks for redundant clauses of various kinds. The default for that would be off since most people don't suffer from this problem. BO isn't

Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas

2008-02-06 Thread Greg Smith
On Wed, 6 Feb 2008, Simon Riggs wrote: For me, it would be good to see a --parallel=n parameter that would allow pg_loader to distribute rows in round-robin manner to n different concurrent COPY statements. i.e. a non-routing version. Let me expand on this. In many of these giant COPY

Re: [PERFORM] Optimizer : query rewrite and execution plan ?

2008-02-06 Thread Tom Lane
Theo Kramer [EMAIL PROTECTED] writes: On Wed, 2008-02-06 at 11:53 +, Simon Riggs wrote: Since the SQL is not your fault and difficult to control, it is an argument in favour of an optional planner mode that would perform additional checks for redundant clauses of various kinds. The default

Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas

2008-02-06 Thread Luke Lonergan
Hi Greg, On 2/6/08 7:56 AM, Greg Smith [EMAIL PROTECTED] wrote: If I'm loading a TB file, odds are good I can split that into 4 or more vertical pieces (say rows 1-25%, 25-50%, 50-75%, 75-100%), start 4 loaders at once, and get way more than 1 disk worth of throughput reading. You have to

Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas

2008-02-06 Thread Jignesh K. Shah
Greg Smith wrote: On Wed, 6 Feb 2008, Simon Riggs wrote: For me, it would be good to see a --parallel=n parameter that would allow pg_loader to distribute rows in round-robin manner to n different concurrent COPY statements. i.e. a non-routing version. Let me expand on this. In many of

Re: [PERFORM] Benchmark Data requested

2008-02-06 Thread Dimitri Fontaine
Le mercredi 06 février 2008, Greg Smith a écrit : COPY. If you're loading a TB, if you're smart it's going onto the server itself if it all possible and loading directly from there. Would probably get a closer comparision against psql \copy, but recognize you're always going to be compared

Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas

2008-02-06 Thread Dimitri Fontaine
Le mercredi 06 février 2008, Greg Smith a écrit : If I'm loading a TB file, odds are good I can split that into 4 or more vertical pieces (say rows 1-25%, 25-50%, 50-75%, 75-100%), start 4 loaders at once, and get way more than 1 disk worth of throughput reading. pgloader already supports

Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas

2008-02-06 Thread Luke Lonergan
Improvements are welcome, but to compete in the industry, loading will need to speed up by a factor of 100. Note that Bizgres loader already does many of these ideas and it sounds like pgloader does too. - Luke Msg is shrt cuz m on ma treo -Original Message- From: Dimitri Fontaine

Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas

2008-02-06 Thread Dimitri Fontaine
Le Wednesday 06 February 2008 18:37:41 Dimitri Fontaine, vous avez écrit : Le mercredi 06 février 2008, Greg Smith a écrit : If I'm loading a TB file, odds are good I can split that into 4 or more vertical pieces (say rows 1-25%, 25-50%, 50-75%, 75-100%), start 4 loaders at once, and get

Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas

2008-02-06 Thread Dimitri Fontaine
Le Wednesday 06 February 2008 18:49:56 Luke Lonergan, vous avez écrit : Improvements are welcome, but to compete in the industry, loading will need to speed up by a factor of 100. Oh, I meant to compete with internal COPY command instead of \copy one, not with the competition. AIUI competing

Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas

2008-02-06 Thread Greg Smith
On Wed, 6 Feb 2008, Dimitri Fontaine wrote: In fact, the -F option works by having pgloader read the given number of lines but skip processing them, which is not at all what Greg is talking about here I think. Yeah, that's not useful. Greg, what would you think of a pgloader which will