>>>>> "c" == Miles Nordin <[EMAIL PROTECTED]> writes:
>>>>> "tn" == Thomas Nau <[EMAIL PROTECTED]> writes:

     c> 'zpool status' should not be touching the disk at all.

I found this on some old worklog:

http://web.Ivy.NET/~carton/oneNightOfWork/20061119-carton.html

-----8<-----
Also, zpool status takes forEVer. I found out why:

ezln:~$ sudo tcpdump -n -p -i tlp2 host fishstick
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on tlp2, link-type EN10MB (Ethernet), capture size 96 bytes
17:44:43.916373 IP 10.100.100.140.42569 > 10.100.100.135.3260: S 
582435419:582435419(0) win 49640 
17:44:43.916679 IP 10.100.100.135.3260 > 10.100.100.140.42569: R 0:0(0) ack 
582435420 win 0
17:44:52.611549 IP 10.100.100.140.48474 > 10.100.100.135.3260: S 
584903933:584903933(0) win 49640 
17:44:52.611858 IP 10.100.100.135.3260 > 10.100.100.140.48474: R 0:0(0) ack 
584903934 win 0
17:44:58.766525 IP 10.100.100.140.58767 > 10.100.100.135.3260: S 
586435093:586435093(0) win 49640 
17:44:58.766831 IP 10.100.100.135.3260 > 10.100.100.140.58767: R 0:0(0) ack 
586435094 win 0

10.100.100.135 is the iSCSI target. When it's down, connect() from the
Solaris initiator will take a while to time out. I added [the
target's] address as an alias on some other box's interface, so
Solaris would get a TCP reset immediately. Now zpool status is fast
again, and every time I type zpool status, I get one of those SYN, RST
pairs. (one, not three. I typed zpool status three times.) They also
appear on their own over time.

How would I fix this? I'd have iSCSI keep track of whether targets are
``up'' or ``down''. If an upper layer tries to access a target that's
``down'', iSCSI will immediately return an error, then try to open the
target in the background. There will be no automatic attempts to open
targets in the background. so, if an iSCSI target goes away, and then
it comes back, your software may need to touch the device inode twice
before you see the target available again.

If targets close their TCP circuits on inactivity or go into
power-save or some such flakey nonsense, we're still ok, because after
that happens iSCSI will still have the target marked ``up.'' It will
thus keep the upper layers waiting for one connection attempt,
returning no error if the first connection attempt succeeds. If it
doesn't, the iSCSI initiator will then mark the target ``down'' and
start returning errors immediately.

As I said before, error handling is the most important part of any
RAID implementation. In this case, among the more obvious and
immediately inconvenient problems we have a fundamentally serious one:
iSCSI's not returning errors fast enough is pushing us up against a
timeout in the svc subsystem, so one broken disk can potentially
cascade into breaking a huge swath of the SVM subsystem.
-----8<-----

I would add, I'd fix 'zpool status' first, and start being careful
throughout ZFS to do certain things in parallel rather than serial.
but the iSCSI initiator could be smarter, too.

    tn> we usually don't touch any of the iSCSI settings as long as a
    tn> devices is offline.

so the above is another reason you may want to remove a
discovery-address before taking that IP off the network.  If the
discovery-address returns an immediate TCP RST, then 'zpool status'
will work okay, but if the address is completely gone so connect()
times out, 'zpool status' will make you wait quite a while,
potentially multiplied by the number of devices or pools you have,
which could make it equivalent to broken in a practical
sense---scalability applies to failure scenarios, too, not just to
normal operation.

Don't worry---iSCSI won't move your /dev/dsk/... links around or
forget your CHAP passwords when you remove the discovery-address.
It's super-convenient.  But in fact, even if you WANT the iSCSI
initiator to forget this stuff, it seems there's no documented way to
do it!  It's sort of like the Windows Registry keeping track of your
30-day shareware trial. :(

Attachment: pgpLfMLVbbcrr.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to