2012-10-01 17:07, Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) пишет:
Well, now I know why it's stupid.  Cuz it doesn't work right - It turns out, 
iscsi devices (And I presume SAS devices) are not removable storage.  That 
means, if the device goes offline and comes back online again, it doesn't just 
gracefully resilver and move on without any problems, it's in a perpetual state 
of IO error, device unreadable.  If there were simply cksum errors, or 
something like that, I could handle it.  But it's bus error, device error, 
system can't operate, I have to remove the device permanently.

The really odd thing is - It doesn't always show as faulted in zpool status.  
Even when it does show as faulted - I can zpool online, or zpool clear, to make 
the pool look healthy again.  But when an app tries to use something in that 
zpool, the system grinds, and I can see scsi errors spewing into the 
/var/adm/messages, and sometimes the system will halt.

This is call caused because I disconnected / rebooted either the iscsi 
initiator or target.

Lesson learned:  If you create an iscsi target, make *damn* sure it's an 
always-on system.  And don't use just one.  And don't do maintenance on them 
both, in anywhere near the same week.


And would some sort of clusterware help in this case?
I.e. when the target goes down, it informs the initiator to
"offline" the disk component gracefully (if that is possible).
When the target returns up, the automation would online the
pool components, or replace them in-place, and *properly*
resilver and clear the pool.

Wonder if that's possible and if that would help your case?

//Jim
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to