On Mon, Oct 23, 2023 at 06:36:21PM -0300, Crystal Kolipe wrote:
> On Mon, Oct 23, 2023 at 11:04:07AM +0000, Klemens Nanni wrote:
> > 10/16/23 04:02, Klemens Nanni ??????????:
> > > The current check implies one could use, e.g. SWAP or MSDOS partitions
> > > as softraid(4) chunks, but sys/dev/softraid.c always expects FS_RAID,
> > > thus using chunks with different partition types is not possible:
> > > 
> > >   # vmctl create -s100M disk.img
> > >   # vnd=`vnconfig disk.img`
> > >   # echo 'swap *' | disklabel -wAT- vnd0
> > > 
> > >   # disklabel $vnd | grep swap
> > >     a:           204800                0    swap
> > >   # bioctl -c c -l ${vnd}a softraid0
> > >   softraid0: invalid metadata format
> > > 
> > > Correct the check.
> > > I don't expect this to break anything.
> > > amd64 biosboot boots off standard RAID 'a' as before.
> > > 
> > > Feedback? Objection? OK?
> > 
> > Ping.
> 
> This breaks booting off of a RAID that is not on partition 'a', on amd64.
> 
> Was this intentional?

No.

> 
> For example, if you have a RAID on 'd', with no 'a' partition at all, then
> with your patch the machine becomes unbootable.
> 
> The second stage bootloader doesn't automatically find the softraid volume.
> Manually booting the kernel from it results in a kernel panic when the
> kernel can't find the root filesystem.
> 
> Although booting from a RAID on a non-'a' partition is not supported on all
> archs, it has worked fine on amd64 for a long time, so it's quite possible
> that people have machines deployed that boot from other RAID partitions.

We don't recommend it, but this sentence in softraid(4) is the only thing I
can find actually talking about boot requirements wrt. chunks and labels:

        On sparc64, bootable chunks must be RAID partitions using the letter 
‘a’.

It does imply that other architectures can boot from non-'a' partitions,
I've done so myself, you example shows it and I don't intent to change that.

> 
> This change would unexpectedly break them, and it would potentially be quite
> painful for any users who upgrade to 7.5 and find out afterwards that their
> machine doesn't boot, because the work-around would likely be to boot the
> ramdisk kernel, and unpack mdec/boot from the base package of the previous
> release then re-run installboot specifying the old mdec/boot.
> 
> That wouldn't be at all obvious to users without a lot of OpenBSD experience.

Rereading the code, I now question why it checks the 'a' label type at all.

Taking your sd0d example through devboot():

|#ifdef SOFTRAID
|       /*
|        * Determine the partition type for the 'a' partition of the
|        * boot device.
|        */
|       TAILQ_FOREACH(dip, &disklist, list)
|               if (dip->bios_info.bios_number == bootdev &&
|                   (dip->bios_info.flags & BDI_BADLABEL) == 0)
|                       part_type = dip->disklabel.d_partitions[0].p_fstype;

Whatever sd0a's type is...

|
|       /*
|        * See if we booted from a disk that is a member of a bootable
|        * softraid volume.
|        */
|       SLIST_FOREACH(bv, &sr_volumes, sbv_link) {
|               if (bv->sbv_flags & BIOC_SCBOOTABLE)
|                       SLIST_FOREACH(bc, &bv->sbv_chunks, sbc_link)
|                               if (bc->sbc_disk == bootdev)

... the boot loader sees we booted from a bootable softraid chunk,
matching disk sd0 by number and not partition.

|                                       sr_boot_vol = bv->sbv_unit;
|               if (sr_boot_vol != -1)
|                       break;
|       }
|#endif
|
|       if (sr_boot_vol != -1 && part_type != FS_BSDFFS) {

With softraid chunk on sd0d, sd0a happens to be not FFS (likely unused),
but that's still enough for the boot loader to stick to softraid on sd0!

|               *p++ = 's';
|               *p++ = 'r';
|               *p++ = '0' + sr_boot_vol;
|       } else if (bootdev & 0x100) {

So why check the 'a' label type if that's not relevant?
It is confusing and 
Doubling down on the assumption that bootable chunks are always on 'a'
like my diff did shows that's a mistake and dropping the check actually
makes more sense to me now.

This boots with root on softraid on sd0a and sd0d.

Thoughts?
Am I missing something?

Index: sys/arch/amd64//stand/libsa/dev_i386.c
===================================================================
RCS file: /cvs/src/sys/arch/amd64/stand/libsa/dev_i386.c,v
retrieving revision 1.23
diff -u -p -r1.23 dev_i386.c
--- sys/arch/amd64//stand/libsa/dev_i386.c      10 May 2019 21:20:43 -0000      
1.23
+++ sys/arch/amd64//stand/libsa/dev_i386.c      24 Oct 2023 01:43:06 -0000
@@ -103,22 +103,11 @@ devboot(dev_t bootdev, char *p)
 #ifdef SOFTRAID
        struct sr_boot_volume *bv;
        struct sr_boot_chunk *bc;
-       struct diskinfo *dip = NULL;
 #endif
        int sr_boot_vol = -1;
-       int part_type = FS_UNUSED;
 
 #ifdef SOFTRAID
        /*
-        * Determine the partition type for the 'a' partition of the
-        * boot device.
-        */
-       TAILQ_FOREACH(dip, &disklist, list)
-               if (dip->bios_info.bios_number == bootdev &&
-                   (dip->bios_info.flags & BDI_BADLABEL) == 0)
-                       part_type = dip->disklabel.d_partitions[0].p_fstype;
-
-       /*
         * See if we booted from a disk that is a member of a bootable
         * softraid volume.
         */
@@ -132,7 +121,7 @@ devboot(dev_t bootdev, char *p)
        }
 #endif
 
-       if (sr_boot_vol != -1 && part_type != FS_BSDFFS) {
+       if (sr_boot_vol != -1) {
                *p++ = 's';
                *p++ = 'r';
                *p++ = '0' + sr_boot_vol;

Reply via email to