On Fri, Mar 12, 2004 at 02:05:05AM +1300, Sidney Markowitz wrote:
> Michael Parker wrote:
> >The index would have to be loaded into memory, not the entire table.
> >The index on bayes_token should be fairly efficient. Given enough
> >memory MySQL can hold a good bit of the index in memory. Practically
> >every single one of my queries is served out of memory. Recently I've
> >been peaking somewhere around 200 queries per second, with performance
> >about equal, if not slightly better than DB_File on the same hardware.
> 
> My concern is what happens when you scale up the number of users. You 
> are getting every one of your queries out of memory but:
> 
> 1. How many megabytes are being loaded into memory to get that? With 
> many users, each message that is processed will be for a different user. 
> That initial load of index data into memory will have to be done once 
> per message, not just once as in your tests. Is the index file for 
> bayes_token table something like 4 or 5 megabytes of data for one user?
> 
> 2. Your entire bayes database may end up cached in memory by the time 
> you have processed one message. Everything you are doing would be from 
> memory after that. With many users, the bayes_token records for any one 
> user will be scattered all over the table. Each query will result in a 
> new seek and reading of an entire block. How many queries are there per 
> message? If its 1000, that's something like another 4 megabytes read per 
> message.
> 
> Reducing the size of the index helps #1. Finding a way to get some 
> locality of reference for the bayes data for a single user, as well as 
> reducing the amount of data, will help #2.
> 
> I think performance will not be acceptable if each message requires 
> reading on the order of 7 or 8 megabytes from disk.
> 

All of this is exactly the argument for using a modern RDBMS.  You
trust that they have all of the smarts built in to manage your tables
and indexes.  In some cases you pay experts to help tune and manage
your databases.  I've had $day_jobs with 4-5 dedicated DBAs managing
hundreds of tables with millions of rows of data on multi-terabyte
disk arrays.  It's certainly not something you take lightly.

Michael

Reply via email to