Re: possible kernel/libxl race with xl network-attach

2022-01-24 Thread James Dingwall
On Mon, Jan 24, 2022 at 10:07:54AM +0100, Roger Pau Monné wrote:
> On Fri, Jan 21, 2022 at 03:05:07PM +, James Dingwall wrote:
> > On Fri, Jan 21, 2022 at 03:00:29PM +0100, Roger Pau Monné wrote:
> > > On Fri, Jan 21, 2022 at 01:34:54PM +, James Dingwall wrote:
> > > > On 2022-01-13 16:11, Roger Pau Monné wrote:
> > > > > On Thu, Jan 13, 2022 at 11:19:46AM +, James Dingwall wrote:
> > > > > > 
> > > > > > I have been trying to debug a problem where a vif with the backend
> > > > > > in a
> > > > > > driver domain is added to dom0.  Intermittently the hotplug script 
> > > > > > is
> > > > > > not invoked by libxl (running as xl devd) in the driver domain.  By
> > > > > > enabling some debug for the driver domain kernel and libxl I have
> > > > > > these
> > > > > > messages:
> > > > > > 
> > > > > > driver domain kernel (Ubuntu 5.4.0-92-generic):
> > > > > > 
> > > > > > [Thu Jan 13 01:39:31 2022] [1408] 564: vif vif-0-0 vif0.0:
> > > > > > Successfully created xenvif
> > > > > > [Thu Jan 13 01:39:31 2022] [26] 583: xen_netback:frontend_changed:
> > > > > > /local/domain/0/device/vif/0 -> Initialising
> > > > > > [Thu Jan 13 01:39:31 2022] [26] 470:
> > > > > > xen_netback:backend_switch_state: backend/vif/0/0 -> InitWait
> > > > > > [Thu Jan 13 01:39:31 2022] [26] 583: xen_netback:frontend_changed:
> > > > > > /local/domain/0/device/vif/0 -> Connected
> > > > > > [Thu Jan 13 01:39:31 2022] vif vif-0-0 vif0.0: Guest Rx ready
> > > > > > [Thu Jan 13 01:39:31 2022] [26] 470:
> > > > > > xen_netback:backend_switch_state: backend/vif/0/0 -> Connected
> > > > > > 
> > > > > > xl devd (Xen 4.14.3):
> > > > > > 
> > > > > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > > > > libxl_event.c:750:watchfd_callback: watch w=0x7ffd416b0528
> > > > > > wpath=/local/domain/2/backend token=3/0: event
> > > > > > epath=/local/domain/2/backend/vif/0/0/state
> > > > > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > > > > libxl_event.c:2445:libxl__nested_ao_create: ao 0x5633ac569700:
> > > > > > nested ao, parent 0x5633ac567f90
> > > > > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > > > > libxl_event.c:750:watchfd_callback: watch w=0x5633ac569180
> > > > > > wpath=/local/domain/2/backend/vif/0/0/state token=2/1: event
> > > > > > epath=/local/domain/2/backend/vif/0/0/state
> > > > > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > > > > libxl_event.c:1055:devstate_callback: backend
> > > > > > /local/domain/2/backend/vif/0/0/state wanted state 2 still waiting
> > > > > > state 4
> > > > > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > > > > libxl_event.c:750:watchfd_callback: watch w=0x7ffd416b0528
> > > > > > wpath=/local/domain/2/backend token=3/0: event
> > > > > > epath=/local/domain/2/backend/vif/0/0/state
> > > > > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > > > > libxl_event.c:2445:libxl__nested_ao_create: ao 0x5633ac56a220:
> > > > > > nested ao, parent 0x5633ac567f90
> > > > > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > > > > libxl_event.c:750:watchfd_callback: watch w=0x5633ac569180
> > > > > > wpath=/local/domain/2/backend/vif/0/0/state token=2/1: event
> > > > > > epath=/local/domain/2/backend/vif/0/0/state
> > > > > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > > > > libxl_event.c:1055:devstate_callback: backend
> > > > > > /local/domain/2/backend/vif/0/0/state wanted state 2 still waiting
> > > > > > state 4
> > > > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > > > libxl_aoutils.c:88:xswait_timeout_callback: backend
> > > > > > /local/domain/2/backend/vif/0/0/state (hoping for state change to
> > > > > > 2): xswait timeout (path=/local/domain/2/backend/vif/0/0/state)
> > > > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > > > libxl_event.c:850:libxl__ev_xswatch_deregister: watch
> > > > > > w=0x5633ac569180 wpath=/local/domain/2/backend/vif/0/0/state
> > > > > > token=2/1: deregister slotnum=2
> > > > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > > > libxl_event.c:1039:devstate_callback: backend
> > > > > > /local/domain/2/backend/vif/0/0/state wanted state 2  timed out
> > > > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > > > libxl_event.c:864:libxl__ev_xswatch_deregister: watch
> > > > > > w=0x5633ac569180: deregister unregistered
> > > > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > > > libxl_device.c:1092:device_backend_callback: calling
> > > > > > device_backend_cleanup
> > > > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > > > libxl_event.c:864:libxl__ev_xswatch_deregister: watch
> > > > > > w=0x5633ac569180: deregister unregistered
> > > > > > 2022-01-13 01:39:51 UTC libxl: error:
> > > > > > libxl_device.c:1105:device_backend_callback: unable to add device
> > > > > > with path /local/domain/2/backend/vif/0/0
> > > > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > > > libxl_event.c:864:libxl__ev_xswatch_deregister: watch
> > > > > > w=0x5633ac569280: deregister unregistered
> > > > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > > > libxl_device.c:1470:device_complete: de

Re: possible kernel/libxl race with xl network-attach

2022-01-24 Thread Roger Pau Monné
On Fri, Jan 21, 2022 at 03:05:07PM +, James Dingwall wrote:
> On Fri, Jan 21, 2022 at 03:00:29PM +0100, Roger Pau Monné wrote:
> > On Fri, Jan 21, 2022 at 01:34:54PM +, James Dingwall wrote:
> > > On 2022-01-13 16:11, Roger Pau Monné wrote:
> > > > On Thu, Jan 13, 2022 at 11:19:46AM +, James Dingwall wrote:
> > > > > 
> > > > > I have been trying to debug a problem where a vif with the backend
> > > > > in a
> > > > > driver domain is added to dom0.  Intermittently the hotplug script is
> > > > > not invoked by libxl (running as xl devd) in the driver domain.  By
> > > > > enabling some debug for the driver domain kernel and libxl I have
> > > > > these
> > > > > messages:
> > > > > 
> > > > > driver domain kernel (Ubuntu 5.4.0-92-generic):
> > > > > 
> > > > > [Thu Jan 13 01:39:31 2022] [1408] 564: vif vif-0-0 vif0.0:
> > > > > Successfully created xenvif
> > > > > [Thu Jan 13 01:39:31 2022] [26] 583: xen_netback:frontend_changed:
> > > > > /local/domain/0/device/vif/0 -> Initialising
> > > > > [Thu Jan 13 01:39:31 2022] [26] 470:
> > > > > xen_netback:backend_switch_state: backend/vif/0/0 -> InitWait
> > > > > [Thu Jan 13 01:39:31 2022] [26] 583: xen_netback:frontend_changed:
> > > > > /local/domain/0/device/vif/0 -> Connected
> > > > > [Thu Jan 13 01:39:31 2022] vif vif-0-0 vif0.0: Guest Rx ready
> > > > > [Thu Jan 13 01:39:31 2022] [26] 470:
> > > > > xen_netback:backend_switch_state: backend/vif/0/0 -> Connected
> > > > > 
> > > > > xl devd (Xen 4.14.3):
> > > > > 
> > > > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > > > libxl_event.c:750:watchfd_callback: watch w=0x7ffd416b0528
> > > > > wpath=/local/domain/2/backend token=3/0: event
> > > > > epath=/local/domain/2/backend/vif/0/0/state
> > > > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > > > libxl_event.c:2445:libxl__nested_ao_create: ao 0x5633ac569700:
> > > > > nested ao, parent 0x5633ac567f90
> > > > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > > > libxl_event.c:750:watchfd_callback: watch w=0x5633ac569180
> > > > > wpath=/local/domain/2/backend/vif/0/0/state token=2/1: event
> > > > > epath=/local/domain/2/backend/vif/0/0/state
> > > > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > > > libxl_event.c:1055:devstate_callback: backend
> > > > > /local/domain/2/backend/vif/0/0/state wanted state 2 still waiting
> > > > > state 4
> > > > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > > > libxl_event.c:750:watchfd_callback: watch w=0x7ffd416b0528
> > > > > wpath=/local/domain/2/backend token=3/0: event
> > > > > epath=/local/domain/2/backend/vif/0/0/state
> > > > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > > > libxl_event.c:2445:libxl__nested_ao_create: ao 0x5633ac56a220:
> > > > > nested ao, parent 0x5633ac567f90
> > > > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > > > libxl_event.c:750:watchfd_callback: watch w=0x5633ac569180
> > > > > wpath=/local/domain/2/backend/vif/0/0/state token=2/1: event
> > > > > epath=/local/domain/2/backend/vif/0/0/state
> > > > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > > > libxl_event.c:1055:devstate_callback: backend
> > > > > /local/domain/2/backend/vif/0/0/state wanted state 2 still waiting
> > > > > state 4
> > > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > > libxl_aoutils.c:88:xswait_timeout_callback: backend
> > > > > /local/domain/2/backend/vif/0/0/state (hoping for state change to
> > > > > 2): xswait timeout (path=/local/domain/2/backend/vif/0/0/state)
> > > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > > libxl_event.c:850:libxl__ev_xswatch_deregister: watch
> > > > > w=0x5633ac569180 wpath=/local/domain/2/backend/vif/0/0/state
> > > > > token=2/1: deregister slotnum=2
> > > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > > libxl_event.c:1039:devstate_callback: backend
> > > > > /local/domain/2/backend/vif/0/0/state wanted state 2  timed out
> > > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > > libxl_event.c:864:libxl__ev_xswatch_deregister: watch
> > > > > w=0x5633ac569180: deregister unregistered
> > > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > > libxl_device.c:1092:device_backend_callback: calling
> > > > > device_backend_cleanup
> > > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > > libxl_event.c:864:libxl__ev_xswatch_deregister: watch
> > > > > w=0x5633ac569180: deregister unregistered
> > > > > 2022-01-13 01:39:51 UTC libxl: error:
> > > > > libxl_device.c:1105:device_backend_callback: unable to add device
> > > > > with path /local/domain/2/backend/vif/0/0
> > > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > > libxl_event.c:864:libxl__ev_xswatch_deregister: watch
> > > > > w=0x5633ac569280: deregister unregistered
> > > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > > libxl_device.c:1470:device_complete: device
> > > > > /local/domain/2/backend/vif/0/0 add failed
> > > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > > libxl_event.c:2035:libxl__ao__destroy: ao 0x5633ac568f30: destroy
> > > > > 
> > > > > the xenstore content for the backend:
> > > > > 
> 

Re: possible kernel/libxl race with xl network-attach

2022-01-21 Thread James Dingwall
On Fri, Jan 21, 2022 at 03:00:29PM +0100, Roger Pau Monné wrote:
> On Fri, Jan 21, 2022 at 01:34:54PM +, James Dingwall wrote:
> > On 2022-01-13 16:11, Roger Pau Monné wrote:
> > > On Thu, Jan 13, 2022 at 11:19:46AM +, James Dingwall wrote:
> > > > 
> > > > I have been trying to debug a problem where a vif with the backend
> > > > in a
> > > > driver domain is added to dom0.  Intermittently the hotplug script is
> > > > not invoked by libxl (running as xl devd) in the driver domain.  By
> > > > enabling some debug for the driver domain kernel and libxl I have
> > > > these
> > > > messages:
> > > > 
> > > > driver domain kernel (Ubuntu 5.4.0-92-generic):
> > > > 
> > > > [Thu Jan 13 01:39:31 2022] [1408] 564: vif vif-0-0 vif0.0:
> > > > Successfully created xenvif
> > > > [Thu Jan 13 01:39:31 2022] [26] 583: xen_netback:frontend_changed:
> > > > /local/domain/0/device/vif/0 -> Initialising
> > > > [Thu Jan 13 01:39:31 2022] [26] 470:
> > > > xen_netback:backend_switch_state: backend/vif/0/0 -> InitWait
> > > > [Thu Jan 13 01:39:31 2022] [26] 583: xen_netback:frontend_changed:
> > > > /local/domain/0/device/vif/0 -> Connected
> > > > [Thu Jan 13 01:39:31 2022] vif vif-0-0 vif0.0: Guest Rx ready
> > > > [Thu Jan 13 01:39:31 2022] [26] 470:
> > > > xen_netback:backend_switch_state: backend/vif/0/0 -> Connected
> > > > 
> > > > xl devd (Xen 4.14.3):
> > > > 
> > > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > > libxl_event.c:750:watchfd_callback: watch w=0x7ffd416b0528
> > > > wpath=/local/domain/2/backend token=3/0: event
> > > > epath=/local/domain/2/backend/vif/0/0/state
> > > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > > libxl_event.c:2445:libxl__nested_ao_create: ao 0x5633ac569700:
> > > > nested ao, parent 0x5633ac567f90
> > > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > > libxl_event.c:750:watchfd_callback: watch w=0x5633ac569180
> > > > wpath=/local/domain/2/backend/vif/0/0/state token=2/1: event
> > > > epath=/local/domain/2/backend/vif/0/0/state
> > > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > > libxl_event.c:1055:devstate_callback: backend
> > > > /local/domain/2/backend/vif/0/0/state wanted state 2 still waiting
> > > > state 4
> > > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > > libxl_event.c:750:watchfd_callback: watch w=0x7ffd416b0528
> > > > wpath=/local/domain/2/backend token=3/0: event
> > > > epath=/local/domain/2/backend/vif/0/0/state
> > > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > > libxl_event.c:2445:libxl__nested_ao_create: ao 0x5633ac56a220:
> > > > nested ao, parent 0x5633ac567f90
> > > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > > libxl_event.c:750:watchfd_callback: watch w=0x5633ac569180
> > > > wpath=/local/domain/2/backend/vif/0/0/state token=2/1: event
> > > > epath=/local/domain/2/backend/vif/0/0/state
> > > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > > libxl_event.c:1055:devstate_callback: backend
> > > > /local/domain/2/backend/vif/0/0/state wanted state 2 still waiting
> > > > state 4
> > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > libxl_aoutils.c:88:xswait_timeout_callback: backend
> > > > /local/domain/2/backend/vif/0/0/state (hoping for state change to
> > > > 2): xswait timeout (path=/local/domain/2/backend/vif/0/0/state)
> > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > libxl_event.c:850:libxl__ev_xswatch_deregister: watch
> > > > w=0x5633ac569180 wpath=/local/domain/2/backend/vif/0/0/state
> > > > token=2/1: deregister slotnum=2
> > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > libxl_event.c:1039:devstate_callback: backend
> > > > /local/domain/2/backend/vif/0/0/state wanted state 2  timed out
> > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > libxl_event.c:864:libxl__ev_xswatch_deregister: watch
> > > > w=0x5633ac569180: deregister unregistered
> > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > libxl_device.c:1092:device_backend_callback: calling
> > > > device_backend_cleanup
> > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > libxl_event.c:864:libxl__ev_xswatch_deregister: watch
> > > > w=0x5633ac569180: deregister unregistered
> > > > 2022-01-13 01:39:51 UTC libxl: error:
> > > > libxl_device.c:1105:device_backend_callback: unable to add device
> > > > with path /local/domain/2/backend/vif/0/0
> > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > libxl_event.c:864:libxl__ev_xswatch_deregister: watch
> > > > w=0x5633ac569280: deregister unregistered
> > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > libxl_device.c:1470:device_complete: device
> > > > /local/domain/2/backend/vif/0/0 add failed
> > > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > > libxl_event.c:2035:libxl__ao__destroy: ao 0x5633ac568f30: destroy
> > > > 
> > > > the xenstore content for the backend:
> > > > 
> > > > # xenstore-ls /local/domain/2/backend/vif/0
> > > > 0 = ""
> > > >  frontend = "/local/domain/0/device/vif/0"
> > > >  frontend-id = "0"
> > > >  online = "1"
> > > >  state = "4"
> > > >  script = "/etc/xen/scripts/vif-zynstra"
> > > >  vifnam

Re: possible kernel/libxl race with xl network-attach

2022-01-21 Thread Roger Pau Monné
On Fri, Jan 21, 2022 at 01:34:54PM +, James Dingwall wrote:
> On 2022-01-13 16:11, Roger Pau Monné wrote:
> > On Thu, Jan 13, 2022 at 11:19:46AM +, James Dingwall wrote:
> > > 
> > > I have been trying to debug a problem where a vif with the backend
> > > in a
> > > driver domain is added to dom0.  Intermittently the hotplug script is
> > > not invoked by libxl (running as xl devd) in the driver domain.  By
> > > enabling some debug for the driver domain kernel and libxl I have
> > > these
> > > messages:
> > > 
> > > driver domain kernel (Ubuntu 5.4.0-92-generic):
> > > 
> > > [Thu Jan 13 01:39:31 2022] [1408] 564: vif vif-0-0 vif0.0:
> > > Successfully created xenvif
> > > [Thu Jan 13 01:39:31 2022] [26] 583: xen_netback:frontend_changed:
> > > /local/domain/0/device/vif/0 -> Initialising
> > > [Thu Jan 13 01:39:31 2022] [26] 470:
> > > xen_netback:backend_switch_state: backend/vif/0/0 -> InitWait
> > > [Thu Jan 13 01:39:31 2022] [26] 583: xen_netback:frontend_changed:
> > > /local/domain/0/device/vif/0 -> Connected
> > > [Thu Jan 13 01:39:31 2022] vif vif-0-0 vif0.0: Guest Rx ready
> > > [Thu Jan 13 01:39:31 2022] [26] 470:
> > > xen_netback:backend_switch_state: backend/vif/0/0 -> Connected
> > > 
> > > xl devd (Xen 4.14.3):
> > > 
> > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > libxl_event.c:750:watchfd_callback: watch w=0x7ffd416b0528
> > > wpath=/local/domain/2/backend token=3/0: event
> > > epath=/local/domain/2/backend/vif/0/0/state
> > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > libxl_event.c:2445:libxl__nested_ao_create: ao 0x5633ac569700:
> > > nested ao, parent 0x5633ac567f90
> > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > libxl_event.c:750:watchfd_callback: watch w=0x5633ac569180
> > > wpath=/local/domain/2/backend/vif/0/0/state token=2/1: event
> > > epath=/local/domain/2/backend/vif/0/0/state
> > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > libxl_event.c:1055:devstate_callback: backend
> > > /local/domain/2/backend/vif/0/0/state wanted state 2 still waiting
> > > state 4
> > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > libxl_event.c:750:watchfd_callback: watch w=0x7ffd416b0528
> > > wpath=/local/domain/2/backend token=3/0: event
> > > epath=/local/domain/2/backend/vif/0/0/state
> > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > libxl_event.c:2445:libxl__nested_ao_create: ao 0x5633ac56a220:
> > > nested ao, parent 0x5633ac567f90
> > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > libxl_event.c:750:watchfd_callback: watch w=0x5633ac569180
> > > wpath=/local/domain/2/backend/vif/0/0/state token=2/1: event
> > > epath=/local/domain/2/backend/vif/0/0/state
> > > 2022-01-13 01:39:31 UTC libxl: debug:
> > > libxl_event.c:1055:devstate_callback: backend
> > > /local/domain/2/backend/vif/0/0/state wanted state 2 still waiting
> > > state 4
> > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > libxl_aoutils.c:88:xswait_timeout_callback: backend
> > > /local/domain/2/backend/vif/0/0/state (hoping for state change to
> > > 2): xswait timeout (path=/local/domain/2/backend/vif/0/0/state)
> > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > libxl_event.c:850:libxl__ev_xswatch_deregister: watch
> > > w=0x5633ac569180 wpath=/local/domain/2/backend/vif/0/0/state
> > > token=2/1: deregister slotnum=2
> > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > libxl_event.c:1039:devstate_callback: backend
> > > /local/domain/2/backend/vif/0/0/state wanted state 2  timed out
> > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > libxl_event.c:864:libxl__ev_xswatch_deregister: watch
> > > w=0x5633ac569180: deregister unregistered
> > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > libxl_device.c:1092:device_backend_callback: calling
> > > device_backend_cleanup
> > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > libxl_event.c:864:libxl__ev_xswatch_deregister: watch
> > > w=0x5633ac569180: deregister unregistered
> > > 2022-01-13 01:39:51 UTC libxl: error:
> > > libxl_device.c:1105:device_backend_callback: unable to add device
> > > with path /local/domain/2/backend/vif/0/0
> > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > libxl_event.c:864:libxl__ev_xswatch_deregister: watch
> > > w=0x5633ac569280: deregister unregistered
> > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > libxl_device.c:1470:device_complete: device
> > > /local/domain/2/backend/vif/0/0 add failed
> > > 2022-01-13 01:39:51 UTC libxl: debug:
> > > libxl_event.c:2035:libxl__ao__destroy: ao 0x5633ac568f30: destroy
> > > 
> > > the xenstore content for the backend:
> > > 
> > > # xenstore-ls /local/domain/2/backend/vif/0
> > > 0 = ""
> > >  frontend = "/local/domain/0/device/vif/0"
> > >  frontend-id = "0"
> > >  online = "1"
> > >  state = "4"
> > >  script = "/etc/xen/scripts/vif-zynstra"
> > >  vifname = "dom0.0"
> > >  mac = "00:16:3e:6c:de:82"
> > >  bridge = "cluster"
> > >  handle = "0"
> > >  type = "vif"
> > >  feature-sg = "1"
> > >  feature-gso-tcpv4 = "1"
> > >  feature-gso-tcpv6 = "1"
> > >  feature-ipv6-csum-offload = "1"
> > >  feature-rx-copy = "1"

Re: possible kernel/libxl race with xl network-attach

2022-01-21 Thread James Dingwall

On 2022-01-13 16:11, Roger Pau Monné wrote:

On Thu, Jan 13, 2022 at 11:19:46AM +, James Dingwall wrote:


I have been trying to debug a problem where a vif with the backend in 
a

driver domain is added to dom0.  Intermittently the hotplug script is
not invoked by libxl (running as xl devd) in the driver domain.  By
enabling some debug for the driver domain kernel and libxl I have 
these

messages:

driver domain kernel (Ubuntu 5.4.0-92-generic):

[Thu Jan 13 01:39:31 2022] [1408] 564: vif vif-0-0 vif0.0: 
Successfully created xenvif
[Thu Jan 13 01:39:31 2022] [26] 583: xen_netback:frontend_changed: 
/local/domain/0/device/vif/0 -> Initialising
[Thu Jan 13 01:39:31 2022] [26] 470: xen_netback:backend_switch_state: 
backend/vif/0/0 -> InitWait
[Thu Jan 13 01:39:31 2022] [26] 583: xen_netback:frontend_changed: 
/local/domain/0/device/vif/0 -> Connected

[Thu Jan 13 01:39:31 2022] vif vif-0-0 vif0.0: Guest Rx ready
[Thu Jan 13 01:39:31 2022] [26] 470: xen_netback:backend_switch_state: 
backend/vif/0/0 -> Connected


xl devd (Xen 4.14.3):

2022-01-13 01:39:31 UTC libxl: debug: 
libxl_event.c:750:watchfd_callback: watch w=0x7ffd416b0528 
wpath=/local/domain/2/backend token=3/0: event 
epath=/local/domain/2/backend/vif/0/0/state
2022-01-13 01:39:31 UTC libxl: debug: 
libxl_event.c:2445:libxl__nested_ao_create: ao 0x5633ac569700: nested 
ao, parent 0x5633ac567f90
2022-01-13 01:39:31 UTC libxl: debug: 
libxl_event.c:750:watchfd_callback: watch w=0x5633ac569180 
wpath=/local/domain/2/backend/vif/0/0/state token=2/1: event 
epath=/local/domain/2/backend/vif/0/0/state
2022-01-13 01:39:31 UTC libxl: debug: 
libxl_event.c:1055:devstate_callback: backend 
/local/domain/2/backend/vif/0/0/state wanted state 2 still waiting 
state 4
2022-01-13 01:39:31 UTC libxl: debug: 
libxl_event.c:750:watchfd_callback: watch w=0x7ffd416b0528 
wpath=/local/domain/2/backend token=3/0: event 
epath=/local/domain/2/backend/vif/0/0/state
2022-01-13 01:39:31 UTC libxl: debug: 
libxl_event.c:2445:libxl__nested_ao_create: ao 0x5633ac56a220: nested 
ao, parent 0x5633ac567f90
2022-01-13 01:39:31 UTC libxl: debug: 
libxl_event.c:750:watchfd_callback: watch w=0x5633ac569180 
wpath=/local/domain/2/backend/vif/0/0/state token=2/1: event 
epath=/local/domain/2/backend/vif/0/0/state
2022-01-13 01:39:31 UTC libxl: debug: 
libxl_event.c:1055:devstate_callback: backend 
/local/domain/2/backend/vif/0/0/state wanted state 2 still waiting 
state 4
2022-01-13 01:39:51 UTC libxl: debug: 
libxl_aoutils.c:88:xswait_timeout_callback: backend 
/local/domain/2/backend/vif/0/0/state (hoping for state change to 2): 
xswait timeout (path=/local/domain/2/backend/vif/0/0/state)
2022-01-13 01:39:51 UTC libxl: debug: 
libxl_event.c:850:libxl__ev_xswatch_deregister: watch w=0x5633ac569180 
wpath=/local/domain/2/backend/vif/0/0/state token=2/1: deregister 
slotnum=2
2022-01-13 01:39:51 UTC libxl: debug: 
libxl_event.c:1039:devstate_callback: backend 
/local/domain/2/backend/vif/0/0/state wanted state 2  timed out
2022-01-13 01:39:51 UTC libxl: debug: 
libxl_event.c:864:libxl__ev_xswatch_deregister: watch 
w=0x5633ac569180: deregister unregistered
2022-01-13 01:39:51 UTC libxl: debug: 
libxl_device.c:1092:device_backend_callback: calling 
device_backend_cleanup
2022-01-13 01:39:51 UTC libxl: debug: 
libxl_event.c:864:libxl__ev_xswatch_deregister: watch 
w=0x5633ac569180: deregister unregistered
2022-01-13 01:39:51 UTC libxl: error: 
libxl_device.c:1105:device_backend_callback: unable to add device with 
path /local/domain/2/backend/vif/0/0
2022-01-13 01:39:51 UTC libxl: debug: 
libxl_event.c:864:libxl__ev_xswatch_deregister: watch 
w=0x5633ac569280: deregister unregistered
2022-01-13 01:39:51 UTC libxl: debug: 
libxl_device.c:1470:device_complete: device 
/local/domain/2/backend/vif/0/0 add failed
2022-01-13 01:39:51 UTC libxl: debug: 
libxl_event.c:2035:libxl__ao__destroy: ao 0x5633ac568f30: destroy


the xenstore content for the backend:

# xenstore-ls /local/domain/2/backend/vif/0
0 = ""
 frontend = "/local/domain/0/device/vif/0"
 frontend-id = "0"
 online = "1"
 state = "4"
 script = "/etc/xen/scripts/vif-zynstra"
 vifname = "dom0.0"
 mac = "00:16:3e:6c:de:82"
 bridge = "cluster"
 handle = "0"
 type = "vif"
 feature-sg = "1"
 feature-gso-tcpv4 = "1"
 feature-gso-tcpv6 = "1"
 feature-ipv6-csum-offload = "1"
 feature-rx-copy = "1"
 feature-rx-flip = "0"
 feature-multicast-control = "1"
 feature-dynamic-multicast-control = "1"
 feature-split-event-channels = "1"
 multi-queue-max-queues = "2"
 feature-ctrl-ring = "1"
 hotplug-status = "connected"

My guess is that the libxl callback is started waiting for the backend
state key to be set to XenbusStateInitWait (2) but the frontend in 
dom0
has already triggered the backend to transition to 
XenbusStateConnected

(4) and therefore it does not successfully complete.


I think your analysis is correct. The question is who sets
hotplug-status to "connected" in xenstore if the hotplug script in the
driver domain hasn't been

Re: possible kernel/libxl race with xl network-attach

2022-01-13 Thread Roger Pau Monné
On Thu, Jan 13, 2022 at 11:19:46AM +, James Dingwall wrote:
> Hi,
> 
> I have been trying to debug a problem where a vif with the backend in a 
> driver domain is added to dom0.  Intermittently the hotplug script is 
> not invoked by libxl (running as xl devd) in the driver domain.  By 
> enabling some debug for the driver domain kernel and libxl I have these 
> messages:
> 
> driver domain kernel (Ubuntu 5.4.0-92-generic):
> 
> [Thu Jan 13 01:39:31 2022] [1408] 564: vif vif-0-0 vif0.0: Successfully 
> created xenvif
> [Thu Jan 13 01:39:31 2022] [26] 583: xen_netback:frontend_changed: 
> /local/domain/0/device/vif/0 -> Initialising
> [Thu Jan 13 01:39:31 2022] [26] 470: xen_netback:backend_switch_state: 
> backend/vif/0/0 -> InitWait
> [Thu Jan 13 01:39:31 2022] [26] 583: xen_netback:frontend_changed: 
> /local/domain/0/device/vif/0 -> Connected
> [Thu Jan 13 01:39:31 2022] vif vif-0-0 vif0.0: Guest Rx ready
> [Thu Jan 13 01:39:31 2022] [26] 470: xen_netback:backend_switch_state: 
> backend/vif/0/0 -> Connected
> 
> xl devd (Xen 4.14.3):
> 
> 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:750:watchfd_callback: 
> watch w=0x7ffd416b0528 wpath=/local/domain/2/backend token=3/0: event 
> epath=/local/domain/2/backend/vif/0/0/state
> 2022-01-13 01:39:31 UTC libxl: debug: 
> libxl_event.c:2445:libxl__nested_ao_create: ao 0x5633ac569700: nested ao, 
> parent 0x5633ac567f90
> 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:750:watchfd_callback: 
> watch w=0x5633ac569180 wpath=/local/domain/2/backend/vif/0/0/state token=2/1: 
> event epath=/local/domain/2/backend/vif/0/0/state
> 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:1055:devstate_callback: 
> backend /local/domain/2/backend/vif/0/0/state wanted state 2 still waiting 
> state 4
> 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:750:watchfd_callback: 
> watch w=0x7ffd416b0528 wpath=/local/domain/2/backend token=3/0: event 
> epath=/local/domain/2/backend/vif/0/0/state
> 2022-01-13 01:39:31 UTC libxl: debug: 
> libxl_event.c:2445:libxl__nested_ao_create: ao 0x5633ac56a220: nested ao, 
> parent 0x5633ac567f90
> 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:750:watchfd_callback: 
> watch w=0x5633ac569180 wpath=/local/domain/2/backend/vif/0/0/state token=2/1: 
> event epath=/local/domain/2/backend/vif/0/0/state
> 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:1055:devstate_callback: 
> backend /local/domain/2/backend/vif/0/0/state wanted state 2 still waiting 
> state 4
> 2022-01-13 01:39:51 UTC libxl: debug: 
> libxl_aoutils.c:88:xswait_timeout_callback: backend 
> /local/domain/2/backend/vif/0/0/state (hoping for state change to 2): xswait 
> timeout (path=/local/domain/2/backend/vif/0/0/state)
> 2022-01-13 01:39:51 UTC libxl: debug: 
> libxl_event.c:850:libxl__ev_xswatch_deregister: watch w=0x5633ac569180 
> wpath=/local/domain/2/backend/vif/0/0/state token=2/1: deregister slotnum=2
> 2022-01-13 01:39:51 UTC libxl: debug: libxl_event.c:1039:devstate_callback: 
> backend /local/domain/2/backend/vif/0/0/state wanted state 2  timed out
> 2022-01-13 01:39:51 UTC libxl: debug: 
> libxl_event.c:864:libxl__ev_xswatch_deregister: watch w=0x5633ac569180: 
> deregister unregistered
> 2022-01-13 01:39:51 UTC libxl: debug: 
> libxl_device.c:1092:device_backend_callback: calling device_backend_cleanup
> 2022-01-13 01:39:51 UTC libxl: debug: 
> libxl_event.c:864:libxl__ev_xswatch_deregister: watch w=0x5633ac569180: 
> deregister unregistered
> 2022-01-13 01:39:51 UTC libxl: error: 
> libxl_device.c:1105:device_backend_callback: unable to add device with path 
> /local/domain/2/backend/vif/0/0
> 2022-01-13 01:39:51 UTC libxl: debug: 
> libxl_event.c:864:libxl__ev_xswatch_deregister: watch w=0x5633ac569280: 
> deregister unregistered
> 2022-01-13 01:39:51 UTC libxl: debug: libxl_device.c:1470:device_complete: 
> device /local/domain/2/backend/vif/0/0 add failed
> 2022-01-13 01:39:51 UTC libxl: debug: libxl_event.c:2035:libxl__ao__destroy: 
> ao 0x5633ac568f30: destroy
> 
> the xenstore content for the backend:
> 
> # xenstore-ls /local/domain/2/backend/vif/0
> 0 = ""
>  frontend = "/local/domain/0/device/vif/0"
>  frontend-id = "0"
>  online = "1"
>  state = "4"
>  script = "/etc/xen/scripts/vif-zynstra"
>  vifname = "dom0.0"
>  mac = "00:16:3e:6c:de:82"
>  bridge = "cluster"
>  handle = "0"
>  type = "vif"
>  feature-sg = "1"
>  feature-gso-tcpv4 = "1"
>  feature-gso-tcpv6 = "1"
>  feature-ipv6-csum-offload = "1"
>  feature-rx-copy = "1"
>  feature-rx-flip = "0"
>  feature-multicast-control = "1"
>  feature-dynamic-multicast-control = "1"
>  feature-split-event-channels = "1"
>  multi-queue-max-queues = "2"
>  feature-ctrl-ring = "1"
>  hotplug-status = "connected"
> 
> My guess is that the libxl callback is started waiting for the backend 
> state key to be set to XenbusStateInitWait (2) but the frontend in dom0 
> has already triggered the backend to transition to XenbusStateConnected 
> (4) and therefore it does not successfully complete.

possible kernel/libxl race with xl network-attach

2022-01-13 Thread James Dingwall
Hi,

I have been trying to debug a problem where a vif with the backend in a 
driver domain is added to dom0.  Intermittently the hotplug script is 
not invoked by libxl (running as xl devd) in the driver domain.  By 
enabling some debug for the driver domain kernel and libxl I have these 
messages:

driver domain kernel (Ubuntu 5.4.0-92-generic):

[Thu Jan 13 01:39:31 2022] [1408] 564: vif vif-0-0 vif0.0: Successfully created 
xenvif
[Thu Jan 13 01:39:31 2022] [26] 583: xen_netback:frontend_changed: 
/local/domain/0/device/vif/0 -> Initialising
[Thu Jan 13 01:39:31 2022] [26] 470: xen_netback:backend_switch_state: 
backend/vif/0/0 -> InitWait
[Thu Jan 13 01:39:31 2022] [26] 583: xen_netback:frontend_changed: 
/local/domain/0/device/vif/0 -> Connected
[Thu Jan 13 01:39:31 2022] vif vif-0-0 vif0.0: Guest Rx ready
[Thu Jan 13 01:39:31 2022] [26] 470: xen_netback:backend_switch_state: 
backend/vif/0/0 -> Connected

xl devd (Xen 4.14.3):

2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:750:watchfd_callback: watch 
w=0x7ffd416b0528 wpath=/local/domain/2/backend token=3/0: event 
epath=/local/domain/2/backend/vif/0/0/state
2022-01-13 01:39:31 UTC libxl: debug: 
libxl_event.c:2445:libxl__nested_ao_create: ao 0x5633ac569700: nested ao, 
parent 0x5633ac567f90
2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:750:watchfd_callback: watch 
w=0x5633ac569180 wpath=/local/domain/2/backend/vif/0/0/state token=2/1: event 
epath=/local/domain/2/backend/vif/0/0/state
2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:1055:devstate_callback: 
backend /local/domain/2/backend/vif/0/0/state wanted state 2 still waiting 
state 4
2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:750:watchfd_callback: watch 
w=0x7ffd416b0528 wpath=/local/domain/2/backend token=3/0: event 
epath=/local/domain/2/backend/vif/0/0/state
2022-01-13 01:39:31 UTC libxl: debug: 
libxl_event.c:2445:libxl__nested_ao_create: ao 0x5633ac56a220: nested ao, 
parent 0x5633ac567f90
2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:750:watchfd_callback: watch 
w=0x5633ac569180 wpath=/local/domain/2/backend/vif/0/0/state token=2/1: event 
epath=/local/domain/2/backend/vif/0/0/state
2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:1055:devstate_callback: 
backend /local/domain/2/backend/vif/0/0/state wanted state 2 still waiting 
state 4
2022-01-13 01:39:51 UTC libxl: debug: 
libxl_aoutils.c:88:xswait_timeout_callback: backend 
/local/domain/2/backend/vif/0/0/state (hoping for state change to 2): xswait 
timeout (path=/local/domain/2/backend/vif/0/0/state)
2022-01-13 01:39:51 UTC libxl: debug: 
libxl_event.c:850:libxl__ev_xswatch_deregister: watch w=0x5633ac569180 
wpath=/local/domain/2/backend/vif/0/0/state token=2/1: deregister slotnum=2
2022-01-13 01:39:51 UTC libxl: debug: libxl_event.c:1039:devstate_callback: 
backend /local/domain/2/backend/vif/0/0/state wanted state 2  timed out
2022-01-13 01:39:51 UTC libxl: debug: 
libxl_event.c:864:libxl__ev_xswatch_deregister: watch w=0x5633ac569180: 
deregister unregistered
2022-01-13 01:39:51 UTC libxl: debug: 
libxl_device.c:1092:device_backend_callback: calling device_backend_cleanup
2022-01-13 01:39:51 UTC libxl: debug: 
libxl_event.c:864:libxl__ev_xswatch_deregister: watch w=0x5633ac569180: 
deregister unregistered
2022-01-13 01:39:51 UTC libxl: error: 
libxl_device.c:1105:device_backend_callback: unable to add device with path 
/local/domain/2/backend/vif/0/0
2022-01-13 01:39:51 UTC libxl: debug: 
libxl_event.c:864:libxl__ev_xswatch_deregister: watch w=0x5633ac569280: 
deregister unregistered
2022-01-13 01:39:51 UTC libxl: debug: libxl_device.c:1470:device_complete: 
device /local/domain/2/backend/vif/0/0 add failed
2022-01-13 01:39:51 UTC libxl: debug: libxl_event.c:2035:libxl__ao__destroy: ao 
0x5633ac568f30: destroy

the xenstore content for the backend:

# xenstore-ls /local/domain/2/backend/vif/0
0 = ""
 frontend = "/local/domain/0/device/vif/0"
 frontend-id = "0"
 online = "1"
 state = "4"
 script = "/etc/xen/scripts/vif-zynstra"
 vifname = "dom0.0"
 mac = "00:16:3e:6c:de:82"
 bridge = "cluster"
 handle = "0"
 type = "vif"
 feature-sg = "1"
 feature-gso-tcpv4 = "1"
 feature-gso-tcpv6 = "1"
 feature-ipv6-csum-offload = "1"
 feature-rx-copy = "1"
 feature-rx-flip = "0"
 feature-multicast-control = "1"
 feature-dynamic-multicast-control = "1"
 feature-split-event-channels = "1"
 multi-queue-max-queues = "2"
 feature-ctrl-ring = "1"
 hotplug-status = "connected"

My guess is that the libxl callback is started waiting for the backend 
state key to be set to XenbusStateInitWait (2) but the frontend in dom0 
has already triggered the backend to transition to XenbusStateConnected 
(4) and therefore it does not successfully complete.

Does this seem a reasonable explanation for the problem and what would 
the best approach to try and solve it?

Thanks,
James