[Tech] Re datastore simulations

Michael Rogers 30 Apr 2008 13:19:23 +0100

On Apr 30 2008, Matthew Toseland wrote:
> Keys to block number. Block numbers to keys is handled by the on disk 
> structure. So we can actually pick a random block number to dump - but at 
> the cost of having to keep a key index.


Cool, I see what you mean now - I'll simulate that too.

> I'm surprised that hashing works so well, it has some big disadvantages 
> e.g. once the datastore is say half full, half of all new incoming keys 
> will overwrite old data rather than being added to the end. So we end up 
> storing less data: it takes a much longer time for the datastore to fill 
> up.

Hmm, good point. On the other hand filling the store (or 99% filling it) 
would typically only take a few days, so maybe it's more important to 
optimise the steady state behaviour than the startup behaviour?

> What is the approximate ratio of store filling rates for the same size 
> store on LRU versus on a direct hashing implementation? Can you simulate 
> this?

So far I've been allowing the simulations to reach a steady state before 
making any measurements, but it shouldn't be a problem to simulate it.

> IMHO most of it will be filesharing, just as a massive chunk of the total 
> internet bandwidth is filesharing.

OK, I'll simulate filesharing two popularity distributions, uniform and 
Zipf. Each file will contain a lognormally distributed number of blocks, 
and the downloader will randomly choose 2/3 of them to request. I won't 
bother with splitfile healing, inserts, churn, congestion, swapping, phase 
of the moon, etc.

> SSK polling for messages obviously 
> will also be huge, right now we have 2.5 SSKs for every CHK (but SSKs are 
> ~ 10x than CHKs). That should reduce a bit in future with some new 
> measures such as RecentlyFailed ... but it will increase as FMS is more 
> widely adopted... So no idea really... I do know that if we spend all our 
> bandwidth on SSK polling, filesharing will not work well. :| Also, SSKs 
> are kept in a separate store from CHKs, this is not likely to change.

I'll stick to simulating CHKs for the moment - RecentlyFailed and ULPRs 
will affect the way SSKs are cached, but I don't have time to dig into the 
code to find out how they work (and into Frost and FMS to find out what 
kind of traffic patterns they produce).

Cheers,
Michael

[Tech] Re datastore simulations

Reply via email to