[zfs-discuss] The iSCSI-backed zpool for my zone hangs.

Jacob Ritorto Mon, 19 Oct 2009 11:01:27 -0700

My goal is to have a big, fast, HA filer that holds nearly everythingfor a bunch of development services, each running in its own Solariszone. So when I need a new service, test box, etc., I provision a newzone and hand it to the dev requesters and they load their stuff on itand go.

Each zone has zonepath on its own zpool, which is an iSCSI-backeddevice pointing to an a unique sparse zvol on the filer.

If things slow down, we buy more 1U boxes with lots of CPU and RAM,don't care about the disk, and simply provision more LUNs on the filer.Works great. Cheap, good performance, nice and scalable. They smiledon me for a while.


        Until the filer dropped a few packets.

I know it shouldn't happen and I'm addressing that, but the failuremode for this eventuality is too drastic. If the filer isn't respondingnicely to the zone's i/o request, the zone pretty much completely hangs,responding to pings perhaps, but not allowing any real connections.Kind of, not surprisingly, like a machine whose root disk got yankedduring normal operations.

To make it worse, the whole global zone seems unable to do anythingabout the issue. I can't down the affected zone; zoneadm commands justput the zone in a shutting_down state forever. zpool commands justhang. Only thing I've found to recover (from far away in the middle ofthe night) is to uadmin 1 1 the global zone. Even reboot didn't work.So all the zones on the box get hard-reset and that makes all the devguys pretty unhappy.

I thought about setting failmode to continue on these individual zonepools because it's set to wait right now. How do you folks predict thataction will change play?


thx
jake
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] The iSCSI-backed zpool for my zone hangs.

Reply via email to