blkfront

Dimitry Andric Tue, 13 Dec 2016 12:29:30 -0800

On 13 Dec 2016, at 07:54, Colin Percival <cperc...@freebsd.org> wrote:
> 
> Author: cperciva
> Date: Tue Dec 13 06:54:13 2016
> New Revision: 310013
> URL: https://svnweb.freebsd.org/changeset/base/310013
> 
> Log:
>  Check that blkfront devices have a non-zero number of sectors and a
>  non-zero sector size.  Such a device would be a virtual disk of zero
>  bytes; clearly not useful, and not something we should try to attach.
> 
>  As a fortuitous side effect, checking that these values are non-zero
>  here results in them not *becoming* zero later on the function.  This
>  odd behaviour began with r309124 (clang 3.9.0) but is challenging to
>  debug; making any changes to this function whatsoever seems to affect
>  the llvm optimizer behaviour enough to make the unexpected zeroing of
>  the sector_size variable cease.


I've been having some fun debugging a kernel under Xen, and what I found
is that sector_size changes from 512 to 0 because of an xs_gather() call
in xbd_connect().  The function starts as follows:

  1222  static void
  1223  xbd_connect(struct xbd_softc *sc)
  1224  {
  1225          device_t dev = sc->xbd_dev;
  1226          unsigned long sectors, sector_size, phys_sector_size;
  1227          unsigned int binfo;
  1228          int err, feature_barrier, feature_flush;
...
  1237          err = xs_gather(XST_NIL, xenbus_get_otherend_path(dev),
  1238              "sectors", "%lu", &sectors,
  1239              "info", "%u", &binfo,
  1240              "sector-size", "%lu", &sector_size,
  1241              NULL);

After this first call to xs_gather(), sectors is 44040322 (in my case of
a ~21G disk), binfo is zero, and sector_size is 512.  During execution
of the following few lines, sector_size stays 512, until just after the
fourth call to xs_gather():

  1259          err = xs_gather(XST_NIL, xenbus_get_otherend_path(dev),
  1260               "feature-flush-cache", "%lu", &feature_flush,
  1261               NULL);

and then it becomes zero!  This is because the %lu format scans a 64 bit
unsigned integer, while the feature_flush variable is a 32 bit signed
integer, thus overwriting an adjacent location on the stack, which
happens to contain sector_size: in the assembly output, sector_size is
at -88(%rbp), while feature_flush is at -92(%rbp).

There is another instance of such an incorrect format, a few lines
before this, but it doesn't seem to scribble over important data (or we
didn't panic because of it, yet):

  1253          err = xs_gather(XST_NIL, xenbus_get_otherend_path(dev),
  1254               "feature-barrier", "%lu", &feature_barrier,
  1255               NULL);

Here, feature_barrier is a 32 bit signed integer, while %lu again scans
a 64 bit unsigned integer.

In short, I think the following change would also fix the problem:

Index: sys/dev/xen/blkfront/blkfront.c
===================================================================
--- sys/dev/xen/blkfront/blkfront.c     (revision 309817)
+++ sys/dev/xen/blkfront/blkfront.c     (working copy)
@@ -1251,13 +1251,13 @@ xbd_connect(struct xbd_softc *sc)
        if (err || phys_sector_size <= sector_size)
                phys_sector_size = 0;
        err = xs_gather(XST_NIL, xenbus_get_otherend_path(dev),
-            "feature-barrier", "%lu", &feature_barrier,
+            "feature-barrier", "%d", &feature_barrier,
             NULL);
        if (err == 0 && feature_barrier != 0)
                sc->xbd_flags |= XBDF_BARRIER;

        err = xs_gather(XST_NIL, xenbus_get_otherend_path(dev),
-            "feature-flush-cache", "%lu", &feature_flush,
+            "feature-flush-cache", "%d", &feature_flush,
             NULL);
        if (err == 0 && feature_flush != 0)
                sc->xbd_flags |= XBDF_FLUSH;

However, I have some difficulty getting a custom-built kernel booting on
my very rudimentary Xen setup.  I am using a snapshot raw image, with no
idea how to insert a kernel into it.

E.g, can you please try this out, instead of the zero-check fix?  I am
very curious whether that fixes the panics. :)

-Dimitry

P.S.: We really need AddressSanitizer support in the kernel...

signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: svn commit: r310013 - head/sys/dev/xen/blkfront

Reply via email to