On Apr 30 2008, Matthew Toseland wrote:
> Keys to block number. Block numbers to keys is handled by the on disk 
> structure. So we can actually pick a random block number to dump - but at 
> the cost of having to keep a key index.

Cool, I see what you mean now - I'll simulate that too.

> I'm surprised that hashing works so well, it has some big disadvantages 
> e.g. once the datastore is say half full, half of all new incoming keys 
> will overwrite old data rather than being added to the end. So we end up 
> storing less data: it takes a much longer time for the datastore to fill 
> up.

Hmm, good point. On the other hand filling the store (or 99% filling it) 
would typically only take a few days, so maybe it's more important to 
optimise the steady state behaviour than the startup behaviour?

> What is the approximate ratio of store filling rates for the same size 
> store on LRU versus on a direct hashing implementation? Can you simulate 
> this?

So far I've been allowing the simulations to reach a steady state before 
making any measurements, but it shouldn't be a problem to simulate it.

> IMHO most of it will be filesharing, just as a massive chunk of the total 
> internet bandwidth is filesharing.

OK, I'll simulate filesharing two popularity distributions, uniform and 
Zipf. Each file will contain a lognormally distributed number of blocks, 
and the downloader will randomly choose 2/3 of them to request. I won't 
bother with splitfile healing, inserts, churn, congestion, swapping, phase 
of the moon, etc.

> SSK polling for messages obviously 
> will also be huge, right now we have 2.5 SSKs for every CHK (but SSKs are 
> ~ 10x than CHKs). That should reduce a bit in future with some new 
> measures such as RecentlyFailed ... but it will increase as FMS is more 
> widely adopted... So no idea really... I do know that if we spend all our 
> bandwidth on SSK polling, filesharing will not work well. :| Also, SSKs 
> are kept in a separate store from CHKs, this is not likely to change.

I'll stick to simulating CHKs for the moment - RecentlyFailed and ULPRs 
will affect the way SSKs are cached, but I don't have time to dig into the 
code to find out how they work (and into Frost and FMS to find out what 
kind of traffic patterns they produce).

Cheers,
Michael


Reply via email to