John Sonnenschein wrote:

> Look, yanking the drives like that can seriously damage the drives or
> your motherboard. Solaris doesn't let you do it and assumes that
> something's gone seriously wrong if you try it. That Linux ignores
> the behavior and lets you do it sounds more like a bug in linux than
> anything else.

OK, so far we've had a lot of knee jerk defense of Solaris. Sorry, but 
that isn't helping. Let's get back to science here, shall we?

What happens when you remove a disk?

A) The driver detects the removal and informs the OS. Solaris appears to 
behave reasonaby well in this case.

B) The driver does not detect the removal. Commands must time out before 
a problem is detected. Due to driver layering, timeouts increase 
rapidly, causig te OS to "hang" for unreasonable periods of time.

We really need to fix (B). It seems the "easy" fixes are:

- Configure faster timeouts and fewer retries on redundant devices, 
similar to drive manufacturers' RAID edition firmware. This could be via 
driver config file, or (better) automatically via ZFS, similar to write 
cache behaviour.

- Propagate timeouts quickly between layers (immediate soft fail without 
retry) or perhaps just to the fault management system

-- 
Carson
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to