I currently have a traditional NFS cluster hardware setup in the lab (2 host with FC attached JBOD storage) but no cluster software yet. I've been wanting to try out the separate ZIL to see what it might do to boost performance. My problem is that I don't have any cool SSD devices, much less ones that I could have shared between two host. Commercial arrays have custom hardware with mirrored cache which got me thinking about a way to do this with regular hardware.
So I tried this experiment this week... On each host (OpenSolaris 2008.05), I created an 8GB ramdisk with ramdiskadm. I shared this ramdisk on each host via the iscsi target and initiator over a 1GB crossconnect cable (jumbo frames enabled). I added these as mirrored slog devices in a zpool. The end result was a pool that I could import and export between host, and it can survive one of the host dying. I also copied a dd image of my ramdisk device to stable storage with the pool exported (thus flushed), which allowed me to shut the entire cluster down, and power 1 node up, recreate the ramdisk and dd the image back and re-import the pool. I'm not sure I could survive a crash of both nodes, going to try and test some more. The big thing here is I ended up getting a MASSIVE boost in performance even with the overhead of the 1GB link, and iSCSI. The iorate test I was using went from 3073 IOPS on 90% sequential writes to 23953 IOPS with the RAM slog added. The service time was also significantly better than the physical disk. It also boosted the reads significantly and I'm guessing this is because of updating the access time on the files was completely cached. So what are the downsides to this? If both nodes were to crash and I used the same technique to recreate the ramdisk I would lose any transactions in the slog at the time of the crash, but the physical disk image is still in a consistent state right (just not from my apps point of view)? Anyone have any idea what difference infiniband might make for the cross connect? In some test, I did completely saturate the 1GB link between the boxes. So is this idea completely crazy? It also brings up questions of correctly sizing your slog in relation to the physical disk on the backend. It looks like if the ZIL can handle significantly more I/O than the physical disk the effect will be short lived as the system will have to slow things down as it spends more time flushing from the slog to physical disk. The 8GB looked like overkill in my case, because in a lot of the test, it drove the individual disk in the system to 100% and was causing service times on the physical disk in the 900 - 1000ms range (although my app never saw that because of the slog). -- This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss