On Wed, Apr 07, 2021 at 07:47:28PM -0400, Dave Voutila wrote: > > Thomas L. writes: > > >> > Thomas: I looked at your host dmesg and your provided vm.conf. It > >> > looks like 11 vm's with the default 512M memory and one (minecraft) > >> > with 8G. Your host seems to have only 16GB of memory, some of which > >> > is probably unavailable as it's used by the integrated gpu. I'm > >> > wondering if you are effectively oversusbcribing your memory here. > >> > > >> > I know we currently don't support swapping guest memory out, but not > >> > sure what happens if we don't have the physical memory to fault a > >> > page in and wire it. > >> > > >> > >> Something else gets swapped out. > > > > Wire == Can't swap out? > > Yes. > > > top shows 15G real memory available. That should be enough (8G + 11 * > > 0.5G = 13.5G), or is this inherently risky with 6.8? > > With 6.8, the guests might have memory swapped out and worst case you'll > see some performance issues. That shouldn't cause unexpected > termination. >
Depends on the exact content that got swapped out (as we didn't handle TLB flushes correctly), so a crash was certainly a possibility. That's why I wanted to see the VMM_DEBUG output. In any case, Thomas should try -current and see if this problem is even reproducible. -ml > > I can try -current as suggested in the other mail. Is this a likely > > cause or should I run with VMM_DEBUG for further investigation? Is > > "somewhat slower" from VMM_DEBUG still usable? I don't need full > > performance, but ~month downtime until the problem shows again would be > > too much. > > A fix is more likely to land in -current if an issue can be > identified. Since the issue doesn't sound like it's easily reproducible > yet, VMM_DEBUG is the best bet for having the information you'd need to > share when the issue occurs. > > >> > Even without a custom kernel with VMM_DEBUG, if it's a uvm_fault > >> > issue you should see a message in the kernel buffer. Something like: > >> > > >> > vmx_fault_page: uvm_fault returns N, GPA=0x...., rip=0x.... > >> > > >> > mlarkin: thoughts on my hypothesis? Am I wildly off course? > >> > > >> > -dv > >> > > >> > >> Yeah I was trying to catch the big dump when a VM resets. That would > >> tell us if the vm caused the reset or if vmd(8) crashed for some > >> reason. > > > > But if vmd crashed it wouldn't restart automatically or does it? > > All VMs down from vmd crashing would have been noticed. > > That kernel message would have shown in the dmesg too, wouldn't it? > > > > There are multiple factors. First is vmd(8) is multi-process and a vm's > process can die without impacting others. Second is the vcpu could be > reset making the guest "reboot." There are numerous reasons these things > could happen, hence needing debug logging. > > -dv >