Thanks for pointing out the mistake in postgres.
Your Advice makes lots of sense.
We will recreate the benchmark and post the results

V Murali
Differentiated Software Solutions Pvt. Ltd.,
90, 3rd Cross,2nd Main,
Ganga Nagar,
Bangalore - 560 032
Phone : 91 80 3631445, 3431470
Visit us at www.diffsoft.com

> ----- Original Message -----
> From: Cees Hek <[EMAIL PROTECTED]>
> To: Murali V <[EMAIL PROTECTED]>
> Cc: <[EMAIL PROTECTED]>
> Sent: Friday, April 20, 2001 1:45 AM
> Subject: [OT] Re: Fast DB access
>
>
> >
> > On Thu, 19 Apr 2001, Murali V wrote:
> >
> > > Hi,
> > >
> > > If you read the code more deeply, you'll find that the timeit is only
> > > wrapped around select and not around insert.
> > > We've written the insert code so that in the first round you can
> populate
> > > the database.
> > > You comment out the insert code after the first round and run the
> benchmark
> > > several times. This would only do select and time select.
> > >
> >
> > Hi Murali,
> >
> > OK, to start off, I was not specifically aiming my rant at you, I was
> > replying to someone who had modified your code and was now comparing
MySQL
> > and PostgreSQL, and he was implying that the timings were for inserts
and
> > selects.  I took this at face value, and didn't check the code close
> > enough which I really should have done in the first place.
> >
> > > Connecting this error to an axiom that "Benchmarks are useless" is bad
> > > indeed. Shouldn't we be ironing out errors and runing benchmarks which
> are
> > > good.
> >
> > Perhaps I should have said published benchmarks.  In your case, you are
> > using benchmarks for exactly what they are intended for...  Creating a
> > system that closely resembles your application and putting it through
it's
> > paces.  What I find dangerous about publishing benchmarks, is that they
> > are almost always heavily swayed to a specific application, and most of
> > the time they show what the user wants them to show.
> >
> > In your original message, you clain to have a bias against Postgres, and
> > your benchmark shows that bias.  I however am a happy user of postgres,
> > and am therefore biased towards it.  I modified your benchmark script
> > slightly, and I got the following results (I have include a diff of my
> > changes at the bottom):
> >
> > postgres
> >  0 wallclock secs ( 0.02 usr + 0.01 sys = 0.03 CPU)
> > postgres
> >  0 wallclock secs ( 0.02 usr +  0.00 sys =  0.02 CPU)
> >
> > Whereas if I run it with your version I get the following:
> >
> > postgres
> > 27 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
> > postgres
> > 27 wallclock secs ( 0.02 usr +  0.00 sys =  0.02 CPU)
> >
> > So what does that tell you about the benchmark?  that the postgres part
of
> > this benchmark is useless...  It may have given you the answer that you
> > wanted, but it is misleading to anyone else out there.
> >
> > This is why there are always flame wars about benchmarking databases (by
> > the way I think this whole thread has been very civilized and i hope is
> > stays that way).  Invariably the benchmark has missed some critical idea
> > or optimization which drastically skew the results.
> >
> > > Your recommendation is to pick a DB best suited to your app. But How
??
> > > a) Either by hiring a guru who has seen all kinds of apps with
different
> DBs
> > > who can give you the answer with which we can run
> > > b) Run a benchmark on critical programs which represent you app across
> > > databases and find what performs best.
> > > I've read too much literature on DB features. All DBs have all
features
> > > (except MySQL which does not have commit !!!!)
> > > You can't make a thing out of DB literature.
> >
> > What I would recommend is exactly what you have done in this case.  Get
> > access to any and all the systems that you feel may do the job for you ,
> > and try them out.  Browse the web for other users experiences, but don't
> > use other peoples benchmarks, because the odds are good that they are
> > wrong...  Create your own, or modify an existing one, and scrutinize
> > exactly what it is doing.  And if you want to share your results with
> > anyone else, tell them what you choose in the end, and why.  Tell them
you
> > choose database x because it did this and this for you.  Don't say
> > database y is a piece of crap, so we went with database x.
> >
> > But whatever you do, don't choose your database based on other peoples
> > benchmarks........ (that is all I'm trying to say, and I guess I didn't
> > say it clearly enough)
> >
> > When I first read your message, I tucked it away somewhere, so I could
> > reference it again in the future, because I was interested in the MLDBM
> > work that you had done, and I thank you for that.  But it also made me
> > think that maybe I shouldn't be using Postgres, because your results
were
> > so poor (only for a second or too though :).  But I'll bet that a lot of
> > people who have never used postgres before are now less likely to
download
> > it and try it out for themself, because a benchmark swayed them away
from
> > it.  That sounds like a good closer, so I'll stop it there :-)
> >
> >
> > Cees
> >
> >
> > Here is the diff of my changes and a quick comment on why your way kills
> > the performance of postgres
> >
> > ***************
> > *** 124,131 ****
> >                 $i_ip = int(rand(20));
> >
> >                 @row_ary = $dbh->selectrow_array("select crr from
> benchmark where
> > !                               rtrim(pub) = 'pub$i_pub' and rtrim(size)
=
> 'size$i_size' and
> > !                               rtrim(type) = 'type$i_type' and
rtrim(ip)
> = 'ip$i_ip'");
> >         }
> >   };
> >
> > --- 124,131 ----
> >                 $i_ip = int(rand(20));
> >
> >                 @row_ary = $dbh->selectrow_array("select crr from
> benchmark where
> > !                               pub = 'pub$i_pub' and size =
'size$i_size'
> and
> > !                               type = 'type$i_type' and ip =
'ip$i_ip'");
> >         }
> >   };
> >
> > All I did was remove the 'rtrim' called from the query.  I'm assuming
you
> > did this because Postgres pads all char fields with spaces (varchar
> > doesn't do this).  What your query ends up doing, is running rtrim on
> > every field in the column and then comparing it against your provided
> > value (indeces are not used since you are changing the value of the
> > primary key).  So since your table has 24000 rows and you call rtrim on
4
> > columns in the select, it has a worst case of 24,000 x 4 calls to rtrim
> > (of course postgres will short circuit the conditions if the first call
> > fails), but in the best case it still has to do 24,000 calls to rtrim.
> > I'm amazed that postgres can actually do this this fast :).
> >
> > In Postgres 7.0 you don't have to worry about the space padding issue,
so
> > I was able to remove the rtrims from the query.  However since you are
> > using 6.5.x, what you should have done, is either make your columns
> > varchar instead of char, or space pad your values in the query, so that
> > you are checking pub = 'pub1 ' instead of rtrim(pub) = 'pub1'.  i
> >
> > However, since you were looking at solutions for a new application, you
> > have no excuse for using a database that is several years old, when
> > several new stable revisions are available (for free)...
> >
> > Give it a try and see what happens...
> >
> >
> >
> >
>

Reply via email to