On 2011-Dec-20 00:29:50 +1100, Jim Klimov <jimkli...@cos.ru> wrote:
>2011-12-19 16:58, Pawel Jakub Dawidek wrote:
>> On Mon, Dec 19, 2011 at 10:18:05AM +0000, Darren J Moffat wrote:
>>> For those of us not familiar with how FreeBSD is installed and boots can
>>> you explain how boot works (ie do you use GRUB at all and if so which
>>> version and where the early boot ZFS code is).
>>
>> We don't use GRUB, no. We use three stages for booting. Stage 0 is
>> bascially 512 byte of very simple MBR boot loader installed at the
>> begining of the disk that is used to launch stage 1 boot loader. Stage 1
>> is where we interpret all ZFS (or UFS) structure and read real files.
...
>Hmm... and is the freebsd-boot partition redundant somehow?

In the GPT case, each boot device would have a copy of both the boot0
MBR and a freebsd-boot partition containing gptzfsboot.  Both zfsboot
(used with traditional MBR/fdisk partitioning) and gptzfsboot
incorporate standard ZFS code and so should be able to boot off any
supported zpool type (but note that there's a bug in the handling of
gang blocks that was only fixed very recently).

>Is it mirrored or can be striped over several disks?

Effectively the boot code is mirrored on each bootdisk.  FreeBSD does
not have the same partitioned vs whole disk issues as Solaris so there
is no downside to using partitioned disks with ZFS on FreeBSD.

>I was educated that the core problem lies in the system's
>required ability to boot off any single device (including
>volumes of several disks singularly presented by HWRAIDs).
>This "BIOS boot device" should hold everything that is
>required and sufficient to go on booting the OS and using
>disk sets of some more sophisticated redundancy.

Normally, firmware boot code (BIOS, EFI, OFW etc) has no RAID ability
and needs to load bootstrap code off a single (physical or HW RAID)
boot device.  The exception is the primitive software RAID solutions
found in consumer PC hardware - which are best ignored.

Effectively, all the code needed prior to the point where a software
RAID device can be built must be replicated in full across all boot
devices.  For RAID-1, everything is already replicated so it's
sufficient to just treat one mirror as the boot device and let the
kernel build the RAID device.  For anything more complex, one of the
bootstrap stages has to build enough of the RAID device to allow the
kernel (etc) to be read out of the RAID device.

>I gather that in FreeBSD's case this "self-sufficient"
>bootloader is small and incurs a small storage overhead,
>even if cloned to a dozen disks in your array?

gptzfsboot is currently ~34KB (20KB larger than the equivalent UFS
bootstrap).  GPT has a 34-sector overhead and the freebsd-boot
partition is typically 128 sectors to allow for future growth (though
I've shrunk it at home to 94 sectors so the following partition is on
a 64KB boundary to better suit future 4KB disks).  My mirrored ZFS
system at work is partitioned as:
$ gpart show  -p
=>      34  78124933    ad0  GPT  (37G)
        34       128  ad0p1  freebsd-boot  (64k)
       162   5242880  ad0p2  freebsd-swap  (2.5G)
   5243042  72881925  ad0p3  freebsd-zfs  (34G)

=>      34  78124933    ad1  GPT  (37G)
        34       128  ad1p1  freebsd-boot  (64k)
       162   5242880  ad1p2  freebsd-swap  (2.5G)
   5243042  72881925  ad1p3  freebsd-zfs  (34G)
(The first 2 columns are absolute offset and size in sectors)
My root pool is a mirror of ad0p3 and ad1p3.

>In this case Solaris's problem with only-mirrored ZFS
>on root pools is that the "self-sufficient" quantum
>of required data is much larger; but otherwise the
>situation is the same?

If you have enough data and disk space, the overheads in combining a
mirrored root with RAIDZ data aren't that great.  At home, I have 6
1TB disks and I've carved out 8GB from the front of each (3GB for swap
and 5GB for root) and the remainder in a RAIDZ2 pool - that's less
than 1% overhead.  5GB is big enough to hold the complete source tree
and compile it, as well as the base OS.  I have a 3-way mirrored root
across half the disks and use the other "root" partitions as
"temporary" roots when upgrading.

-- 
Peter Jeremy

Attachment: pgpqOpr17sU1a.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to