Greg Ganger wrote:
Yea, that would be the concern... perhaps it should be a config setting
that one can set? [Where the DFS is fast, go there directly... where
not, stage through the local disk.]
I think the original issue that Richard thinks may have prompted this
may have gone away.
So if data has to end up on DFS anyway, sending it directly eliminates
the double copy. Right now (on SVN trunk), the data is stored to a local
filesystem completely before sending to DFS.
In this case, the bottleneck is the local disk, so the double copy
halves the throughput again. Lets say the state file is 100 GB big, and
the disk has a throughput of 40 MB/s. Currently, storing it would
theoretically take 1.5 hours.
If we're saving directly to DFS, which has a throughput of about 70
MB/s, it should theoretically be finished in 24 minutes.
I am playing around with user-configurable suspend and resume handlers
which could be free to stage things as they wish. In my case, they try
to compress VM state in different ways.
The state is very variable in nature; at times the hypervisor sends more
data than can be processed by the CPU, at other times the storage system
is the bottleneck.
For a VM with the above characteristics and uninitialized local storage,
actual suspend times ranged from 130 minutes using the default gzip and
double copying down to 45 minutes using parallel gzip and storing
directly onto DFS.
Greetings,
Michael.