On 10 May 2011, at 10:39, Nikolay Igotti wrote: > Hi Bayard, > > 06.05.2011 19:56, Bayard Bell пишет: >> I've got 8 cores on my system, so I can hand it all over to the guests >> without sweating it. I'm looking at top, and I don't see any indication that >> other system load is contending. When I stop other apps running, it's only >> the amount of CPU idle time in the host that goes down, while the guest >> maintains the same level of CPU utilisation. > CPU isn't that easily "given" to the VM, as RAM pages, for example. > VirtualBox internally need to run few threads doing disk/network IO. Same > situation with host OS too, so essentially some experiments is the best way > to figure out how many vCPUs is reasonable to give to the guest to get best > performance.
Any suggestions as to how to go about that methodically? What I know is that the run queue seems to back up to the point of crushing the host if I provide only two vCPUs, while with 4 vCPUs, I only seem to get consumption of 2 actual CPUs. I've got a slight further wrinkle, insofar as the default behaviour of the build environment is to look at the number of CPUs and amount of memory and decide for itself what the appropriate level of parallelism is, although I can work around this by setting a fixed value before experimenting with CPU count. Just to give this a bottom line, if I haven't mentioned this previously: I've got a compile job that normally takes at most few hours on comparable bare metal, and it's taking several days under VBox. Resolving this is the difference between being able to get acceptably slower performance under VBox and needing to sort myself out with a separate system. >> The load I'm running is compilation. There shouldn't be a lot of system >> time, but the build system I'm using schedules a higher level of parallel >> jobs than there is CPU, using both CPU count and memory size to determine >> the maximum number of jobs. What nevertheless seems odd is that when the >> Solaris guest thinks it's got 3 or 4 threads on CPU, utilisation is half >> what I'd expect. > With compilation, especially if you compile a lot of small files, significant > part of load is fork/exec performance (and so, VMM in the guest), and of > course, IO does matter too. The I/O is trivial, but what I'm gathering is that the CPU overhead of the system calls is increased considerably. I don't see a lot of fork and exec load, but what I'm wondering is whether time spent in the kernel would actually be relatively longer, such that relatively lightweight system calls on a normal host would add up to a considerably higher percentage of CPU time in a virtual environment. >> Now, I can imagine a variety of reasons for this, plenty of which I don't >> properly or at all understand, but looking at CPUPalette.app (I'm not aware >> of anything on OS X that approximates the functionality of mpstat), it looks >> like the load on the system is being spread evenly across CPUs. > That's pretty much expected. >> My very naive reaction to this is that this isn't quite right, that >> VirtualBox should be trying to maintain processor affinity and pushing the >> CPU flat-out and not itself being subject to unnecessary additional SMP >> overhead, which is cumulative with the overhead of the guest. > It's up to host OS scheduler to maintain (soft) affinity of threads the way > it thinks most reasonable. SMP overhead, such as need for TLB shootdown, > couldn't be cured by forcing affinity, affinity would only help with CPU > cache entries reuse, if some form of address space ID is used (or if switches > happens inside same address space). > >> (My understanding is that the ability to create CPU affinity in OS X is a >> bit weak compared to compared to Linux or Solaris [i.e. affinity is between >> threads and is meant to be defined by applications based on hw.cacheconfig >> and friends, whereas in Linux and Solaris it can be defined more strictly in >> terms of processors and processes].) > Don't think you really need that. As VBox doesn't do explicit gang > scheduling, some assistance from host scheduler on that would be helpful, not > explicit assignment of CPU affinity. In theory, good scheduler shall gang > schedule threads with the same address space even without additional hints, > as this will likely increase performance. Not sure if OSX does that, although. Thanks for that info. I'll see if there's any documentation or source to satisfy my curiosity on this point. It might also be useful to see what DTrace can tell me. Does VBox have its own DTrace probes to help with these kinds of problems? Cheers, Bayard
smime.p7s
Description: S/MIME cryptographic signature
PGP.sig
Description: This is a digitally signed message part
_______________________________________________ vbox-dev mailing list [email protected] http://vbox.innotek.de/mailman/listinfo/vbox-dev
