On Wed, Jul 13, 2022 at 02:18:53PM +0200, Otto Moerbeek wrote: > Hi, > > after a prompt from stsp@ and florian@, reporting that newfs_msdos > fails when given an $duid.i argument, I set down to see what could be > going on. My test using an USB stick failed to reprdouce the problem. > Then I started using a vnd, and that shows the issue (once in a > while). The feeling is that any disk devcied created on the fly might > show this issue. > > What I have found out so far: sometimes, disk_subr.c:disk_map() > fails, because the device in question does not have the DKF_LABELVALID > flag set, so it is skipped in the duid search. > > The only place where DKF_LABELVALID is set is in > disk_attach_callback(). > > Often the disk_readlabel() call in this function succeeds and the flag > is set, but also often it does not succeed. > > I have observed it failing setting the message to "cannot read disk > label, 0xe32/0x2932, error 25", which seems to come from the > DIOCGPDINFO NOTTY case in dev/vnc.c > > This is how far I got.
Looking a bit more, I see that the attachment is done from a task. I suppose that if the open happens before the task is run, we end up with a flag not set. > > I am using the test script below. In my case, instrumenting > subr_disk.c with printf made the problem occur less often, so I > suspect a race between the disk attachment and the label becoming > available. > > The vnconfig triggers the disk attachment message (but only the first time a > vnd is created, it seems, I am not seeing any messages after a vnconfig > -u and re-running the script). > > > === script === > set -e > dd if=/dev/random of=/tmp/image$1 bs=1m count=100 > vnconfig vnd$1 /tmp/image$1 > fdisk -ig vnd$1 > echo "a\ni\n\n\n\nw\nl\nq\n" | disklabel -E vnd$1 > ============== > > Try to run newfs_msdos on the duid.i reported. Note that you need to > reboot to see a change a behaviour on a specific vnd. If it starts out > OK, it keeps on being OK, if it fails, it keeps on failing. This > corresponds to my observation that the disk_attach_callback() is > called only once for a specifc vnd, even after unconfig and reconfig. > > Hope somebody who knows how this all shoudl work can see where the > issue is. > > -Otto > > Index: subr_disk.c > =================================================================== > RCS file: /cvs/src/sys/kern/subr_disk.c,v > retrieving revision 1.248 > diff -u -p -r1.248 subr_disk.c > --- subr_disk.c 2 Jan 2022 17:26:14 -0000 1.248 > +++ subr_disk.c 13 Jul 2022 12:02:05 -0000 > @@ -1118,8 +1118,10 @@ disk_attach_callback(void *xdat) > /* Read disklabel. */ > if (disk_readlabel(&dl, dk->dk_devno, errbuf, sizeof(errbuf)) == NULL) { > enqueue_randomness(dl.d_checksum); > + printf("disk_attach_callback %s: seting DKF_LABELVALID\n", > dk->dk_name); > dk->dk_flags |= DKF_LABELVALID; > } > + else printf("disk_attach_callback %s: NOT seting DKF_LABELVALID > (%s)\n", dk->dk_name, errbuf); > > done: > dk->dk_flags |= DKF_OPENED; > @@ -1785,6 +1787,7 @@ disk_map(char *path, char *mappath, int > > mdk = NULL; > TAILQ_FOREACH(dk, &disklist, dk_link) { > + printf("YYY %p %s %x %p\n", dk, dk->dk_name, dk->dk_flags, > dk->dk_label); > if ((dk->dk_flags & DKF_LABELVALID) && dk->dk_label && > memcmp(dk->dk_label->d_uid, uid, > sizeof(dk->dk_label->d_uid)) == 0) { > @@ -1795,8 +1798,10 @@ disk_map(char *path, char *mappath, int > } > } > > - if (mdk == NULL || mdk->dk_name == NULL) > + if (mdk == NULL || mdk->dk_name == NULL) { > + printf("XXX %p\n", mdk); > return -1; > + } > > snprintf(mappath, size, "/dev/%s%s%c", > (flags & DM_OPENBLCK) ? "" : "r", mdk->dk_name, part); >