Thanks for your input, I do really appreciate it, even if it wasn’t
quite what I hoped to hear.
// Ric
*From:*Michal Necasek [mailto:michal.neca...@oracle.com]
*Sent:* Thursday, April 07, 2016 14:56
*To:* Vilbig, Ric
*Cc:* vbox-dev@virtualbox.org; klaus.espenl...@oracle.com
*Subject:* Re: [vbox-dev] VM crash, NS_ERROR_FAILURE
The thing you're doing "wrong" is that you're trying to implement
your own bridge device. No one else does that and the PCI/device code
in VirtualBox simply never was architected to support that. Just like
it was never anticipated that someone would want to implement devices
with no 1:1 PDM device : PCI device relationship.
This is just something that has never come up before. For all devices
VirtualBox currently emulates, bridges don't matter and guest software
doesn't care. Bridges in VirtualBox only serve to overcome the number
of devices per PCI bus limit.
You either need to restructure your approach to fit what VirtualBox
actually supports, or modify VirtualBox.
Michal
----- Original Message -----
From: ric_vil...@mentor.com <mailto:ric_vil...@mentor.com>
To: klaus.espenl...@oracle.com <mailto:klaus.espenl...@oracle.com>,
vbox-dev@virtualbox.org <mailto:vbox-dev@virtualbox.org>
Sent: Thursday, April 7, 2016 10:47:46 PM GMT +01:00 Amsterdam /
Berlin / Bern / Rome / Stockholm / Vienna
Subject: Re: [vbox-dev] VM crash, NS_ERROR_FAILURE
Hi Klaus,
I am blocked on my MSI issues for the moment, so I went back to look
at this SecBus issue. I think I have figured out what is going wrong,
and I would not be surprised if it’s also a factor in some of the
other troubles I am having (discussed in other threads). But I am not
sure what to do about it and would really appreciate some advice.
My device constructor prepares the PCIDevice structure and then calls
PDMDevHlpPCIRegister() to register the device. That in turn calls
ich9pciRegisterInternal(), which adds my device to the set of bridges,
here:
pBus->papBridgesR3[pBus->cBridges] = pPciDev;
pBus->cBridges++;
The problem comes later when VBox calls ich9pciInitBridgeTopology()
recursively to enumerate the busses. As it’s walking through the list
of bridges, it makes the assumption that achInstanceData in the device
instance is of type ICH9PCIBUS, but for my device it instead contains
my own device property structure. This confuses the next deeper
recursion layer, and as you correctly guessed, it enters a recursion
loop that breaks the stack.
for (uint32_t iBridge = 0; iBridge < pBus->cBridges; iBridge++)
{
PPCIDEVICE pBridge = pBus->papBridgesR3[iBridge];
AssertMsg(pBridge && pciDevIsPci2PciBridge(pBridge),
("Device is not a PCI bridge but on the list of PCI
bridges\n"));
PICH9PCIBUS pChildBus = PDMINS_2_DATA(pBridge->pDevIns, PICH9PCIBUS);
pGlobals->uBus++;
ich9pciInitBridgeTopology(pGlobals, pChildBus, uBusSecondary,
pGlobals->uBus);
}
It seems that if I want to register my device as a bridge, then I need
my achInstanceData buffer to contain an ICH9PCIBUS struct. I thought
I could just add this to the top of my own device properties
structure, but it’s declared in DevPciIch9.cpp and therefore not
available to me.
Clearly I am doing something wrong in the way I register my bridge, so
what is the right way to register a bridge such that its
achInstanceData buffer contains an ICH9PCIBUS struct and also my own
device properties as well?
// RicV
*From:*vbox-dev-boun...@virtualbox.org
<mailto:vbox-dev-boun...@virtualbox.org>
[mailto:vbox-dev-boun...@virtualbox.org] *On Behalf Of *Vilbig, Ric
*Sent:* Saturday, April 02, 2016 10:11
*To:* Klaus Espenlaub; vbox-dev@virtualbox.org
<mailto:vbox-dev@virtualbox.org>
*Subject:* Re: [vbox-dev] VM crash, NS_ERROR_FAILURE
Thank you Klaus. I will return to this SecBus issue after I get past
the MSI issue that I am having, and this information will certainly be
helpful. At the moment I have a work around (hack) for the SecBus
problem which seems to be working, whereas the MSI problem has me
blocked. Once I get past that, I should have time to figure out a
proper solution for the SecBus problem.
// RicV
*From:*Klaus Espenlaub [mailto:klaus.espenl...@oracle.com]
*Sent:* Friday, April 01, 2016 08:51
*To:* vbox-dev@virtualbox.org <mailto:vbox-dev@virtualbox.org>
*Subject:* Re: [vbox-dev] VM crash, NS_ERROR_FAILURE
Hi Ric,
On 30.03.2016 02:02, Vilbig, Ric wrote:
Hi,
I obviously carried on with my investigation after sending the
original email, and have figured out what is triggering this abort
(not really fair to call it a crash).
VBox.log actually is showing that the VM was never fully powered up.
So the crash happens before the CPU started executing instructions.
See below, I know that this doesn't make much sense to you.
When the BIOS starts initializing the PCI Configuration space for my
PCIe switch, it reads the secondary bus register (PCI CFG 0x19) before
it’s been initialized, so the device model is returning 0. This puts
the BIOS into a loop, repeating the following over 5000 times before
aborting the VM session.
PCI CFG Root Rd 0x0a L 2 = 0x0604 // Class
PCI CFG Root Rd 0x00 L 2 = 0x14ab // VendID
PCI CFG Root Rd 0x02 L 2 = 0x1000 // DevID
PCI CFG Root Wr 0x1c L 1 = 0xd0 // IOBase
PCI CFG Root Wr 0x20 L 2 = 0xf000 // MemBase
PCI CFG Root Rd 0x19 L 1 = *0x00* // SecBus
If I intercept the secondary bus register read, and return a 3 instead
of reading 0 from RTL, then it carries on with root configuration and
my VM boots and runs correctly. It’s not detecting the downstream end
point, but that is a separate issue.
Meanwhile, does it make sense for the BIOS to read the secondary bus
register before it’s been initialized? It seems like that register
should be set up as the BIOS proceeds through the enumeration. That
is what the VM with PIIX3 chipset does.
It does, but for a non-obvious reason. VirtualBox pre-configures its
PCI devices before it starts the BIOS, especially the bus numbers.
Looks like for some reason this isn't done properly (or not making it
correctly to your PCIe switch). This confuses the code, most likely
causing endless recursion and thus a stack overflow. You should be
able to use a debugger on the VM process to find out the detail,
because this is all normal userland code on the host - which wouldn't
work if it's BIOS code running inside the VM.
The motivation for moving the PCI bus configuration out of the BIOS is
to some extent historic (in the old days we always fought with the
BIOS size restriction, due to the extremely bad code quality by the
BCC compiler), to some extent an optimization (it's far easier and
more efficient to do the hairy stuff in 32 bit code on the host, and
not in in the actual BIOS, which is annoying 16 bit real mode code).
Klaus
_____________________________________________
**
*Ric Vilbig*
Mentor Graphics, Emulation Division
46871 Bayside Parkway, Fremont CA, 94538
Phone: 510-354-7360
Mobile: 408-529-2365
email: ric_vil...@mentor.com <mailto:ric_vil...@mentor.com>
*From:*Vilbig, Ric
*Sent:* Tuesday, March 29, 2016 11:40
*To:* vbox-dev@virtualbox.org <mailto:vbox-dev@virtualbox.org>
*Cc:* Vilbig, Ric
*Subject:* VM crash, NS_ERROR_FAILURE
Hi experts,
I would like to ask for some help to figure out why a certain VM
crashes on start-up. Although the problem is evidently induced by my
PDM plug-in, the crash does not appear to be happening therein. I
need some help to root cause where VBox is aborting the VM session.
> VBoxManage startvm "U14_ICH9_2"
Waiting for VM "U14_ICH9_2" to power on...
VBoxManage: error: The VM session was aborted
VBoxManage: error: Details: code NS_ERROR_FAILURE (0x80004005),
component SessionMachine, interface ISession
I created this VM from the VirtualBox GUI, v5.0.16, which I built from
the tarball at https://www.virtualbox.org/wiki/Downloads and am
running on an Ubuntu 14 host. Then I switched the chipset to ICH9,
then I installed Ubuntu 14 as the guest. The VM runs well, until I
plug my virtual device model into PDM (it’s a PCIe switch with
downstream endpoint). After plugging in my virtual device, the VM
crashes as shown above.
I tracked down everywhere NS_ERROR_FAILURE is mentioned in the
sources. I found that DirectoryServiceProvider::GetFile() returns
that error twice, right away, but that is also true in the working
case when my device is unplugged. In no other place is that specific
error ever returned or asserted. However, I found that E_FAIL is
#defined to NS_ERROR_FAILURE, and there are hundreds of references to
E_FAIL, so I gave up trying to instrument them all.
I need some help to root cause this problem. Log files show that it
is getting as far as BIOS starting to initialize the switch,
apparently stuck in a loop doing that, but then lights out with no
trail that I can follow.
Log files are attached. Lines bearing the “RicV” prefix were
instrumented by me to investigate this problem. Lines bearing the
“RemDev” prefix are coming from my PDM plug-in.
Thanks,
_____________________________________________
**
*Ric Vilbig*
Mentor Graphics, Emulation Division
46871 Bayside Parkway, Fremont CA, 94538
Phone: 510-354-7360
Mobile: 408-529-2365
email: ric_vil...@mentor.com <mailto:ric_vil...@mentor.com>