On 13 Dec 2016, at 07:54, Colin Percival <cperc...@freebsd.org> wrote: > > Author: cperciva > Date: Tue Dec 13 06:54:13 2016 > New Revision: 310013 > URL: https://svnweb.freebsd.org/changeset/base/310013 > > Log: > Check that blkfront devices have a non-zero number of sectors and a > non-zero sector size. Such a device would be a virtual disk of zero > bytes; clearly not useful, and not something we should try to attach. > > As a fortuitous side effect, checking that these values are non-zero > here results in them not *becoming* zero later on the function. This > odd behaviour began with r309124 (clang 3.9.0) but is challenging to > debug; making any changes to this function whatsoever seems to affect > the llvm optimizer behaviour enough to make the unexpected zeroing of > the sector_size variable cease.
I've been having some fun debugging a kernel under Xen, and what I found is that sector_size changes from 512 to 0 because of an xs_gather() call in xbd_connect(). The function starts as follows: 1222 static void 1223 xbd_connect(struct xbd_softc *sc) 1224 { 1225 device_t dev = sc->xbd_dev; 1226 unsigned long sectors, sector_size, phys_sector_size; 1227 unsigned int binfo; 1228 int err, feature_barrier, feature_flush; ... 1237 err = xs_gather(XST_NIL, xenbus_get_otherend_path(dev), 1238 "sectors", "%lu", §ors, 1239 "info", "%u", &binfo, 1240 "sector-size", "%lu", §or_size, 1241 NULL); After this first call to xs_gather(), sectors is 44040322 (in my case of a ~21G disk), binfo is zero, and sector_size is 512. During execution of the following few lines, sector_size stays 512, until just after the fourth call to xs_gather(): 1259 err = xs_gather(XST_NIL, xenbus_get_otherend_path(dev), 1260 "feature-flush-cache", "%lu", &feature_flush, 1261 NULL); and then it becomes zero! This is because the %lu format scans a 64 bit unsigned integer, while the feature_flush variable is a 32 bit signed integer, thus overwriting an adjacent location on the stack, which happens to contain sector_size: in the assembly output, sector_size is at -88(%rbp), while feature_flush is at -92(%rbp). There is another instance of such an incorrect format, a few lines before this, but it doesn't seem to scribble over important data (or we didn't panic because of it, yet): 1253 err = xs_gather(XST_NIL, xenbus_get_otherend_path(dev), 1254 "feature-barrier", "%lu", &feature_barrier, 1255 NULL); Here, feature_barrier is a 32 bit signed integer, while %lu again scans a 64 bit unsigned integer. In short, I think the following change would also fix the problem: Index: sys/dev/xen/blkfront/blkfront.c =================================================================== --- sys/dev/xen/blkfront/blkfront.c (revision 309817) +++ sys/dev/xen/blkfront/blkfront.c (working copy) @@ -1251,13 +1251,13 @@ xbd_connect(struct xbd_softc *sc) if (err || phys_sector_size <= sector_size) phys_sector_size = 0; err = xs_gather(XST_NIL, xenbus_get_otherend_path(dev), - "feature-barrier", "%lu", &feature_barrier, + "feature-barrier", "%d", &feature_barrier, NULL); if (err == 0 && feature_barrier != 0) sc->xbd_flags |= XBDF_BARRIER; err = xs_gather(XST_NIL, xenbus_get_otherend_path(dev), - "feature-flush-cache", "%lu", &feature_flush, + "feature-flush-cache", "%d", &feature_flush, NULL); if (err == 0 && feature_flush != 0) sc->xbd_flags |= XBDF_FLUSH; However, I have some difficulty getting a custom-built kernel booting on my very rudimentary Xen setup. I am using a snapshot raw image, with no idea how to insert a kernel into it. E.g, can you please try this out, instead of the zero-check fix? I am very curious whether that fixes the panics. :) -Dimitry P.S.: We really need AddressSanitizer support in the kernel...
signature.asc
Description: Message signed with OpenPGP using GPGMail