On Fri, Sep 25, 2009 at 1:39 PM, Richard Elling <richard.ell...@gmail.com> wrote: > On Sep 25, 2009, at 9:14 AM, Ross Walker wrote: > >> On Fri, Sep 25, 2009 at 11:34 AM, Bob Friesenhahn >> <bfrie...@simple.dallas.tx.us> wrote: >>> >>> On Fri, 25 Sep 2009, Ross Walker wrote: >>>> >>>> As a side an slog device will not be too beneficial for large >>>> sequential writes, because it will be throughput bound not latency >>>> bound. slog devices really help when you have lots of small sync >>>> writes. A RAIDZ2 with the ZIL spread across it will provide much >>> >>> Surely this depends on the origin of the large sequential writes. If the >>> origin is NFS and the SSD has considerably more sustained write bandwidth >>> than the ethernet transfer bandwidth, then using the SSD is a win. If >>> the SSD accepts data slower than the ethernet can deliver it (which seems to >>> be this particular case) then the SSD is not helping. >>> >>> If the ethernet can pass 100MB/second, then the sustained write >>> specification for the SSD needs to be at least 100MB/second. Since data >>> is buffered in the Ethernet,TCP/IP,NFS stack prior to sending it to ZFS, the >>> SSD should support write bursts of at least double that or else it will >>> not be helping bulk-write performance. >> >> Specifically I was talking NFS as that was what the OP was talking >> about, but yes it does depend on the origin, but you also assume that >> NFS IO goes over only a single 1Gbe interface when it could be over >> multiple 1Gbe interfaces or a 10Gbe interface or even multple 10Gbe >> interfaces. You also assume the IO recorded in the ZIL is just the raw >> IO when there is also meta-data or multiple transaction copies as >> well. >> >> Personnally I still prefer to spread the ZIL across the pool and have >> a large NVRAM backed HBA as opposed to an slog which really puts all >> my IO in one basket. If I had a pure NVRAM device I might consider >> using that as an slog device, but SSDs are too variable for my taste. > > Back of the envelope math says: > 10 Gbe = ~1 GByte/sec of I/O capacity > > If the SSD can only sink 70 MByte/s, then you will need: > int(1000/70) + 1 = 15 SSDs for the slog > > For capacity, you need: > 1 GByte/sec * 30 sec = 30 GBytes
Where did the 30 seconds come in here? The amount of time to hold cache depends on how fast you can fill it. > Ross' idea has merit, if the size of the NVRAM in the array is 30 GBytes > or so. I'm thinking you can do less if you don't need to hold it for 30 seconds. > Both of the above assume there is lots of memory in the server. > This is increasingly becoming easier to do as the memory costs > come down and you can physically fit 512 GBytes in a 4u server. > By default, the txg commit will occur when 1/8 of memory is used > for writes. For 30 GBytes, that would mean a main memory of only > 240 Gbytes... feasible for modern servers. > > However, most folks won't stomach 15 SSDs for slog or 30 GBytes of > NVRAM in their arrays. So Bob's recommendation of reducing the > txg commit interval below 30 seconds also has merit. Or, to put it > another way, the dynamic sizing of the txg commit interval isn't > quite perfect yet. [Cue for Neil to chime in... :-)] I'm sorry did I miss something Bob said about the txg commit interval? I looked back and didn't see it, maybe it was off-list? -Ross _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss