On Fri, Mar 12, 2004 at 04:21:56AM +1300, Sidney Markowitz wrote: > > This sounds like a "Don't worry your pretty little head about it, the > experts know what they are doing" argument. >
Don't let my passion for the SQL stuffs cause you to not question it's efficiencies, I think it's a good thing. Without questioning the code as written it won't get better. The AWL and Bayes code has been in existence for just over a year now. In that time it's gone through 5 or so re-writes, based on my own questioning of the code and assumptions made. I'm all for per-review and making it better. > Ok, I'll wait to see what the performance is like when someone, > presumably Kelsey, is able to try a large scale test using SQL Bayes. If > its good I'll rejoice. > > If its bad I'll start asking what those fancy high paid DBAs would do to > get some locality of reference so all bayes data of one user can be > slurped up when it is needed during processing of one message. And I'll > suggest that we reduce the size of the data fields. And I'll ask if the > MySQL documentation is correct when it says that VARCHAR elements in > MyISAM tables hurt performance. Etc. > > But I'll shut up now until we see some numbers. > I'd love to see the numbers as well. I just don't have the amount of data that some folks have to test with. I also don't have the server hardware to setup a serious large scale test. I'm stuck with smaller datasets and measuring relative performance on weaker hardware. I think it's important to put our heads together and come up with a reasonable set of benchmarks that everyone agrees exercises the Bayes Storage code in the proper way. Then we can run that benchmark against not only the DBM/SQL modules, but QDBM and TDB. We can tune disks, database servers, etc and see how that affects results. We can change the Bayes token implementation to use fixed length hashes and measure performance, etc. etc. Michael
