Kelsey Cummings wrote:
the problems that we see with SA+Bayes is I/O requirements
Kelsey, can you say what amount of I/O in megabytes per second you consider to be where it gets to be a problem in an installation of your size? That would allow some very simple sanity checks on a design.
For example, if there are 20 megabytes of key and data information for each user, and each message requires that every token in the message is looked up in that 20 megabyte table, then we might assume that we have to read a 20 megabyte table from disk for every message that is processed, then process it from memory.
If the number of messages you process per second times 20 megabytes is greater than what you consider a feasible I/O requirement, then we know we have to change some significant things even before considering whether to use MySQL or how to optimize its use. We could see whether it would take 20 megabytes per user, or if less would do, for example.
On the other hand, if that is ok but the I/O requirements of also doing the writing that is required when learning are too high, we could look at how to split off the learning the processing.
-- sidney
