On Apr 30 2008, Matthew Toseland wrote: > Keys to block number. Block numbers to keys is handled by the on disk > structure. So we can actually pick a random block number to dump - but at > the cost of having to keep a key index.
Cool, I see what you mean now - I'll simulate that too. > I'm surprised that hashing works so well, it has some big disadvantages > e.g. once the datastore is say half full, half of all new incoming keys > will overwrite old data rather than being added to the end. So we end up > storing less data: it takes a much longer time for the datastore to fill > up. Hmm, good point. On the other hand filling the store (or 99% filling it) would typically only take a few days, so maybe it's more important to optimise the steady state behaviour than the startup behaviour? > What is the approximate ratio of store filling rates for the same size > store on LRU versus on a direct hashing implementation? Can you simulate > this? So far I've been allowing the simulations to reach a steady state before making any measurements, but it shouldn't be a problem to simulate it. > IMHO most of it will be filesharing, just as a massive chunk of the total > internet bandwidth is filesharing. OK, I'll simulate filesharing two popularity distributions, uniform and Zipf. Each file will contain a lognormally distributed number of blocks, and the downloader will randomly choose 2/3 of them to request. I won't bother with splitfile healing, inserts, churn, congestion, swapping, phase of the moon, etc. > SSK polling for messages obviously > will also be huge, right now we have 2.5 SSKs for every CHK (but SSKs are > ~ 10x than CHKs). That should reduce a bit in future with some new > measures such as RecentlyFailed ... but it will increase as FMS is more > widely adopted... So no idea really... I do know that if we spend all our > bandwidth on SSK polling, filesharing will not work well. :| Also, SSKs > are kept in a separate store from CHKs, this is not likely to change. I'll stick to simulating CHKs for the moment - RecentlyFailed and ULPRs will affect the way SSKs are cached, but I don't have time to dig into the code to find out how they work (and into Frost and FMS to find out what kind of traffic patterns they produce). Cheers, Michael
