Hi Kent,
I'm one of the team that works on Solaris' mpt driver, which we
recently enhanced to deliver mpxio support with SAS. I have a bit
of knowledge about your issue :-)

Kent Watsen wrote:
> Based on recommendations from this list, I asked the company that built 
> my box to use an LSI SAS3081E controller.
> 
> The first problem I noticed was that the drive-numbers were ordered 
> incorrectly.  That is, given that my system has 24 bays (6 rows, 4 
> bays/row), the drive numbers from top-to-bottom & left-to-right were 6, 
> 1, 0, 2, 4, 5 - even though when the system boots, each drive is scanned 
> in perfect order (I can tell by watching the LEDs blink).
> 
> I contacted LSI tech support and they explained:
> 
> <start response>
> SAS treats device IDs differently than SCSI.  LSI SAS controllers
> "remember" devices in the order they were discovered by the controller.
> This memory is persistent across power cycles.  It is based on the world
> wide name (WWN) given uniquely to every SAS device.  This allows your
> boot device to remain your boot device no matter where it migrates in
> the SAS topology.
> 
> In order to clear the memory of existing devices you need at least one
> device that will not be present in your final configuration.  Re-boot
> the machine and enter the LSI configuration utility (CTRL-C).  Then find
> your way to SAS Topology.  To see "more" options, press CTRL-M.  Choose
> the option to clear all non-present device IDs.  This clears the
> persistent memory of all devices not present at that time.  Exchange the
> drives.  The system will now remember the order it finds the drives
> after the next boot cycle. 
> <end response> 


Firstly, yes, the LSI SAS hbas do use persistent mapping,
with a "logical target id" by default. This is where the
hba does the translation between the physical disk device's
SAS address (which you'll see in "prtconf -v" as the devid),
and an essentially arbitrary target number which gets passed
up to the OS - in this case Solaris.

The support person @ LSI was correct about deleting all those
mappings.

Yes, the controller is being smart and tracking the actual
device rather than a particular bay/slot mapping. This isn't
so bad, mostly. The effect for you is that you can't assume
that the replaced device is going to have the same target
number as the old one (in fact, I'd call that quite unlikely)
so you'll have to see what the new device name is by checking
your dmesg or iostat -En output.




> Sure enough, I was able to physical reorder my drives so they were 0, 1, 
> 2, 4, 5, 6 - so, appearantly, the company that put my system together 
> moved the drives around after they were initially scanned.  But where is 
> 3?  (answer below).  Then I tried another test:
> 
>   1. make first disk blink
> 
>         # run dd if=/dev/dsk/c2t0d0p0 of=/dev/null count=10
>         10+0 records in
>         10+0 records out
> 
>   2. pull disk '0' out and replace it with a brand new disk
> 
>         # run dd if=/dev/dsk/c2t0d0p0 of=/dev/null count=10
>         dd: /dev/dsk/c2t0d0p0: open: No such file or directory
> 
>   3. scratch head and try again with '3'  (I had previously cleared the 
> LSI's controllers memory)
> 
>         # run dd if=/dev/dsk/c2t3d0p0 of=/dev/null count=10
>         10+0 records in
>         10+0 records out
> 
> So, it seems my SAS controller is being too smart for its own good - it 
> tracks the drives themselves, not the drive-bays.  If I hot-swap a brand 
> new drive into a bay, Solaris will see it as a new disk, not a 
> replacement for the old disk.  How can ZFS support this?  I asked the 
> LSI tech support again and got:
> 
> <start quote>
> I don't have the knowledge to answer that, so I'll just say
> this:  most vendors, including Sun, set up the SAS HBA to use
> "enclosure/slot" naming, which means that if a drive is
> swapped, it does NOT get a new name (after all, the enclosure
> and slot did not change).
> <end quote>

Now here's where things get murky.

At this point in time at least (it may change!) Solaris' mpt
driver uses LSI's logical target id mapping method. This is
*NOT* an enclosure/slot naming method - at least, not from the
OS' point of view. Additionally, unless you're using an actual
real SCSI Enclosure Services (ses) device, there's no enclosure
to provide enclosure/slot mapping with either.

Since mpt uses logical target id, therefore the target id which
Solaris sees _will definitely change_ if you swap a disk.

(I'm a tad annoyed that the LSI support person appears to have
made an assumption based on a total lack of understanding about
how Solaris' mpt driver works).

(My assumption here is that you're using Solaris' mpt(7d) driver
rather than LSI's itmpt driver)


So how do you use your system and its up-to-24 drives with ZFS?

(a) ensure that you note what Solaris's idea of the target id
     is when you replace a drive, then

(b) use "zpool replace" to tell ZFS what to do with the new
     device in your enclosure.


I hope the above helps you along the way... but I'm sure
you'll have followup questions, so please don't hesitate
to ask either directly or to the list.



best regards,
James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp       http://www.jmcp.homeunix.com/blog
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to