Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-11 Thread Garrett D'Amore
On Fri, 2010-06-11 at 13:58 +0400, Andrey Kuzmin wrote: > # dd if=/dev/zero of=/dev/rdsk/cXtYdZs0 bs=512 > > I did a test on my workstation a moment ago and got about 21k > IOPS from my sata drive (iostat). > The trick here of course is that this is sequent

Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-11 Thread Robert Milkowski
On 11/06/2010 10:58, Andrey Kuzmin wrote: On Fri, Jun 11, 2010 at 1:26 PM, Robert Milkowski > wrote: On 11/06/2010 09:22, sensille wrote: Andrey Kuzmin wrote: On Fri, Jun 11, 2010 at 1:54 AM, Richard Elling mailto:richard.ell...@gma

Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-11 Thread Andrey Kuzmin
On Fri, Jun 11, 2010 at 1:26 PM, Robert Milkowski wrote: > On 11/06/2010 09:22, sensille wrote: > >> Andrey Kuzmin wrote: >> >> >>> On Fri, Jun 11, 2010 at 1:54 AM, Richard Elling >>> mailto:richard.ell...@gmail.com>> wrote: >>> >>> On Jun 10, 2010, at 1:24 PM, Arne Jansen wrote: >>> >>>

Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-11 Thread Robert Milkowski
On 11/06/2010 09:22, sensille wrote: Andrey Kuzmin wrote: On Fri, Jun 11, 2010 at 1:54 AM, Richard Elling mailto:richard.ell...@gmail.com>> wrote: On Jun 10, 2010, at 1:24 PM, Arne Jansen wrote: > Andrey Kuzmin wrote: >> Well, I'm more accustomed to "sequential vs. rando

Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-11 Thread Robert Milkowski
On 10/06/2010 20:43, Andrey Kuzmin wrote: As to your results, it sounds almost too good to be true. As Bob has pointed out, h/w design targeted hundreds IOPS, and it was hard to believe it can scale 100x. Fantastic. But it actually can do over 100k. Also several thousand IOPS on a single FC po

Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-11 Thread sensille
Andrey Kuzmin wrote: > On Fri, Jun 11, 2010 at 1:54 AM, Richard Elling > mailto:richard.ell...@gmail.com>> wrote: > > On Jun 10, 2010, at 1:24 PM, Arne Jansen wrote: > > > Andrey Kuzmin wrote: > >> Well, I'm more accustomed to "sequential vs. random", but YMMW. > >> As to 67000 5

Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-11 Thread Andrey Kuzmin
On Fri, Jun 11, 2010 at 1:54 AM, Richard Elling wrote: > On Jun 10, 2010, at 1:24 PM, Arne Jansen wrote: > > > Andrey Kuzmin wrote: > >> Well, I'm more accustomed to "sequential vs. random", but YMMW. > >> As to 67000 512 byte writes (this sounds suspiciously close to 32Mb > fitting into cache),

Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Richard Elling
On Jun 10, 2010, at 1:24 PM, Arne Jansen wrote: > Andrey Kuzmin wrote: >> Well, I'm more accustomed to "sequential vs. random", but YMMW. >> As to 67000 512 byte writes (this sounds suspiciously close to 32Mb fitting >> into cache), did you have write-back enabled? > > It's a sustained number,

Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Andrey Kuzmin
Well, I'm more accustomed to "sequential vs. random", but YMMW. As to 67000 512 byte writes (this sounds suspiciously close to 32Mb fitting into cache), did you have write-back enabled? Regards, Andrey On Fri, Jun 11, 2010 at 12:03 AM, Arne Jansen wrote: > Andrey Kuzmin wrote: > > On Thu,

Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Arne Jansen
Andrey Kuzmin wrote: As to your results, it sounds almost too good to be true. As Bob has pointed out, h/w design targeted hundreds IOPS, and it was hard to believe it can scale 100x. Fantastic. Hundreds IOPS is not quite true, even with hard drives. I just tested a Hitachi 15k drive and it ha

Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Arne Jansen
Andrey Kuzmin wrote: Well, I'm more accustomed to "sequential vs. random", but YMMW. As to 67000 512 byte writes (this sounds suspiciously close to 32Mb fitting into cache), did you have write-back enabled? It's a sustained number, so it shouldn't matter. Regards, Andrey On Fri, Jun 1

Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Arne Jansen
Andrey Kuzmin wrote: On Thu, Jun 10, 2010 at 11:51 PM, Arne Jansen > wrote: Andrey Kuzmin wrote: As to your results, it sounds almost too good to be true. As Bob has pointed out, h/w design targeted hundreds IOPS, and it was hard to believe i

Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Andrey Kuzmin
On Thu, Jun 10, 2010 at 11:51 PM, Arne Jansen wrote: > Andrey Kuzmin wrote: > >> As to your results, it sounds almost too good to be true. As Bob has >> pointed out, h/w design targeted hundreds IOPS, and it was hard to believe >> it can scale 100x. Fantastic. >> > > Hundreds IOPS is not quite tr

Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Garrett D'Amore
For the record, with my driver (which is not the same as the one shipped by the vendor), I was getting over 150K IOPS with a single DDRdrive X1. It is possible to get very high IOPS with Solaris. However, it might be difficult to get such high numbers with systems based on SCSI/SCSA. (SCSA

Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Andrey Kuzmin
As to your results, it sounds almost too good to be true. As Bob has pointed out, h/w design targeted hundreds IOPS, and it was hard to believe it can scale 100x. Fantastic. Regards, Andrey On Thu, Jun 10, 2010 at 6:06 PM, Robert Milkowski wrote: > On 21/10/2009 03:54, Bob Friesenhahn wrote:

Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Ross Walker
On Jun 10, 2010, at 5:54 PM, Richard Elling wrote: On Jun 10, 2010, at 1:24 PM, Arne Jansen wrote: Andrey Kuzmin wrote: Well, I'm more accustomed to "sequential vs. random", but YMMW. As to 67000 512 byte writes (this sounds suspiciously close to 32Mb fitting into cache), did you have w

Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Mike Gerdts
On Thu, Jun 10, 2010 at 9:39 AM, Andrey Kuzmin wrote: > On Thu, Jun 10, 2010 at 6:06 PM, Robert Milkowski wrote: >> >> On 21/10/2009 03:54, Bob Friesenhahn wrote: >>> >>> I would be interested to know how many IOPS an OS like Solaris is able to >>> push through a single device interface.  The nor

Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Andrey Kuzmin
Sorry, my bad. _Reading_ from /dev/null may be an issue, but not writing to it, of course. Regards, Andrey On Thu, Jun 10, 2010 at 6:46 PM, Robert Milkowski wrote: > On 10/06/2010 15:39, Andrey Kuzmin wrote: > > On Thu, Jun 10, 2010 at 6:06 PM, Robert Milkowski wrote: > >> On 21/10/2009 03:5

Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Robert Milkowski
On 10/06/2010 15:39, Andrey Kuzmin wrote: On Thu, Jun 10, 2010 at 6:06 PM, Robert Milkowski > wrote: On 21/10/2009 03:54, Bob Friesenhahn wrote: I would be interested to know how many IOPS an OS like Solaris is able to push through a single device

Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Andrey Kuzmin
On Thu, Jun 10, 2010 at 6:06 PM, Robert Milkowski wrote: > On 21/10/2009 03:54, Bob Friesenhahn wrote: > >> >> I would be interested to know how many IOPS an OS like Solaris is able to >> push through a single device interface. The normal driver stack is likely >> limited as to how many IOPS it

Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Robert Milkowski
On 21/10/2009 03:54, Bob Friesenhahn wrote: I would be interested to know how many IOPS an OS like Solaris is able to push through a single device interface. The normal driver stack is likely limited as to how many IOPS it can sustain for a given LUN since the driver stack is optimized for h

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Ragnar Sundblad
On 7 apr 2010, at 18.13, Edward Ned Harvey wrote: >> From: Ragnar Sundblad [mailto:ra...@csc.kth.se] >> >> Rather: ... >=19 would be ... if you don't mind loosing data written >> the ~30 seconds before the crash, you don't have to mirror your log >> device. > > If you have a system crash, *and*

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Miles Nordin
> "jr" == Jeroen Roodhart writes: jr> Running OSOL nv130. Power off the machine, removed the F20 and jr> power back on. Machines boots OK and comes up "normally" with jr> the following message in 'zpool status': yeah, but try it again and this time put rpool on the F20 as well an

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Richard Elling
On Apr 7, 2010, at 10:19 AM, Bob Friesenhahn wrote: > On Wed, 7 Apr 2010, Edward Ned Harvey wrote: >>> From: Ragnar Sundblad [mailto:ra...@csc.kth.se] >>> >>> Rather: ... >=19 would be ... if you don't mind loosing data written >>> the ~30 seconds before the crash, you don't have to mirror your lo

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Bob Friesenhahn
On Wed, 7 Apr 2010, Edward Ned Harvey wrote: BTW, does the system *ever* read from the log device during normal operation? Such as perhaps during a scrub? It really would be nice to detect failure of log devices in advance, that are claiming to write correctly, but which are really unreadable.

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Bob Friesenhahn
On Wed, 7 Apr 2010, Edward Ned Harvey wrote: From: Ragnar Sundblad [mailto:ra...@csc.kth.se] Rather: ... >=19 would be ... if you don't mind loosing data written the ~30 seconds before the crash, you don't have to mirror your log device. If you have a system crash, *and* a failed log device a

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Mark J Musante
On Wed, 7 Apr 2010, Neil Perrin wrote: There have previously been suggestions to read slogs periodically. I don't know if there's a CR raised for this though. Roch wrote up CR 6938883 "Need to exercise read from slog dynamically" Regards, markm ___

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Neil Perrin
On 04/07/10 10:18, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Bob Friesenhahn It is also worth pointing out that in normal operation the slog is essentially a write-only device which is only read at boot time. Th

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Bob Friesenhahn > > It is also worth pointing out that in normal operation the slog is > essentially a write-only device which is only read at boot time. The > writes are assumed to work if th

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Edward Ned Harvey
> From: Ragnar Sundblad [mailto:ra...@csc.kth.se] > > Rather: ... >=19 would be ... if you don't mind loosing data written > the ~30 seconds before the crash, you don't have to mirror your log > device. If you have a system crash, *and* a failed log device at the same time, this is an important c

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Neil Perrin
On 04/07/10 09:19, Bob Friesenhahn wrote: On Wed, 7 Apr 2010, Robert Milkowski wrote: it is only read at boot if there are uncomitted data on it - during normal reboots zfs won't read data from slog. How does zfs know if there is uncomitted data on the slog device without reading it? The m

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Bob Friesenhahn
On Wed, 7 Apr 2010, Robert Milkowski wrote: it is only read at boot if there are uncomitted data on it - during normal reboots zfs won't read data from slog. How does zfs know if there is uncomitted data on the slog device without reading it? The minimal read would be quite small, but it s

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Robert Milkowski
On 07/04/2010 15:35, Bob Friesenhahn wrote: On Wed, 7 Apr 2010, Ragnar Sundblad wrote: So the recommendation for zpool <19 would be *strongly* recommended. Mirror your log device if you care about using your pool. And the recommendation for zpool >=19 would be ... don't mirror your log dev

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Bob Friesenhahn
On Wed, 7 Apr 2010, Ragnar Sundblad wrote: So the recommendation for zpool <19 would be *strongly* recommended. Mirror your log device if you care about using your pool. And the recommendation for zpool >=19 would be ... don't mirror your log device. If you have more than one, just add them bo

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Robert Milkowski
On 07/04/2010 13:58, Ragnar Sundblad wrote: Rather: ...>=19 would be ... if you don't mind loosing data written the ~30 seconds before the crash, you don't have to mirror your log device. For a file server, mail server, etc etc, where things are stored and supposed to be available later, you al

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Ragnar Sundblad
On 7 apr 2010, at 14.28, Edward Ned Harvey wrote: >> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- >> boun...@opensolaris.org] On Behalf Of Jeroen Roodhart >> >>> If you're running solaris proper, you better mirror >>> your ZIL log device. >> ... >>> I plan to get to test t

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Jeroen Roodhart > > > If you're running solaris proper, you better mirror > > your > > > ZIL log device. > ... > > I plan to get to test this as well, won't be until > > late next week though.

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Jeroen Roodhart
Hi list, > If you're running solaris proper, you better mirror > your > > ZIL log device. ... > I plan to get to test this as well, won't be until > late next week though. Running OSOL nv130. Power off the machine, removed the F20 and power back on. Machines boots OK and comes up "normally" wi

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Jeroen Roodhart
Hi list, > If you're running solaris proper, you better mirror > your > > ZIL log device. ... > I plan to get to test this as well, won't be until > late next week though. Running OSOL nv130. Power off the machine, removed the F20 and power back on. Machines boots OK and comes up "normally" wi

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-06 Thread Edward Ned Harvey
> > We ran into something similar with these drives in an X4170 that > turned > > out to > > be an issue of the preconfigured logical volumes on the drives. Once > > we made > > sure all of our Sun PCI HBAs where running the exact same version of > > firmware > > and recreated the volumes on new d

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-06 Thread Jeroen Roodhart
Hi Roch, > Can you try 4 concurrent tar to four different ZFS > filesystems (same pool). Hmmm, you're on to something here: http://www.science.uva.nl/~jeroen/zil_compared_e1000_iostat_iops_svc_t_10sec_interval.pdf In short: when using two exported file systems total time goes down to around

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-05 Thread Edward Ned Harvey
> From: Kyle McDonald [mailto:kmcdon...@egenera.com] > > So does your HBA have newer firmware now than it did when the first > disk > was connected? > Maybe it's the HBA that is handling the new disks differently now, than > it did when the first one was plugged in? > > Can you down rev the HBA FW

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-05 Thread Kyle McDonald
On 4/4/2010 11:04 PM, Edward Ned Harvey wrote: >> Actually, It's my experience that Sun (and other vendors) do exactly >> that for you when you buy their parts - at least for rotating drives, I >> have no experience with SSD's. >> >> The Sun disk label shipped on all the drives is setup to make the

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-04 Thread Edward Ned Harvey
> Actually, It's my experience that Sun (and other vendors) do exactly > that for you when you buy their parts - at least for rotating drives, I > have no experience with SSD's. > > The Sun disk label shipped on all the drives is setup to make the drive > the standard size for that sun part number

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-04 Thread Edward Ned Harvey
> Hmm, when you did the write-back test was the ZIL SSD included in the > write-back? > > What I was proposing was write-back only on the disks, and ZIL SSD > with no write-back. The tests I did were: All disks write-through All disks write-back With/without SSD for ZIL All the permutations of th

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-04 Thread Ragnar Sundblad
On 4 apr 2010, at 06.01, Richard Elling wrote: Thank you for your reply! Just wanted to make sure. > Do not assume that power outages are the only cause of unclean shutdowns. > -- richard Thanks, I have seen that mistake several times with other (file)systems, and hope I'll never ever make it m

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-03 Thread Richard Elling
On Apr 3, 2010, at 5:47 PM, Ragnar Sundblad wrote: > On 2 apr 2010, at 22.47, Neil Perrin wrote: > >>> Suppose there is an application which sometimes does sync writes, and >>> sometimes async writes. In fact, to make it easier, suppose two processes >>> open two files, one of which always writes

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-03 Thread Ragnar Sundblad
On 2 apr 2010, at 22.47, Neil Perrin wrote: >> Suppose there is an application which sometimes does sync writes, and >> sometimes async writes. In fact, to make it easier, suppose two processes >> open two files, one of which always writes asynchronously, and one of which >> always writes synchr

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-03 Thread Ragnar Sundblad
On 1 apr 2010, at 06.15, Stuart Anderson wrote: > Assuming you are also using a PCI LSI HBA from Sun that is managed with > a utility called /opt/StorMan/arcconf and reports itself as the amazingly > informative model number "Sun STK RAID INT" what worked for me was to run, > arcconf delete (to d

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-03 Thread Christopher George
> Well, I did look at it but at that time there was no Solaris support yet. > Right now it > seems there is only a beta driver? Correct, we just completed functional validation of the OpenSolaris driver. Our focus has now turned to performance tuning and benchmarking. We expect to formally

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-03 Thread Jeroen Roodhart
Hi Al, > Have you tried the DDRdrive from Christopher George > ? > Looks to me like a much better fit for your application than the F20? > > It would not hurt to check it out. Looks to me like > you need a product with low *latency* - and a RAM based cache > would be a much better performer than

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-03 Thread Neil Perrin
On 04/02/10 08:24, Edward Ned Harvey wrote: The purpose of the ZIL is to act like a fast "log" for synchronous writes. It allows the system to quickly confirm a synchronous write request with the minimum amount of work. Bob and Casper and some others clearly know a lot here. But I'm he

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-03 Thread Casper . Dik
>The only way to guarantee consistency in the snapshot is to always >(regardless of ZIL enabled/disabled) give priority for sync writes to get >into the TXG before async writes. > >If the OS does give priority for sync writes going into TXG's before async >writes (even with ZIL disabled), then af

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Al Hopper
Hi Jeroen, Have you tried the DDRdrive from Christopher George ? Looks to me like a much better fit for your application than the F20? It would not hurt to check it out. Looks to me like you need a product with low *latency* - and a RAM based cache would be a much better performer than any solut

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Eric D. Mudama
On Fri, Apr 2 at 11:14, Tirso Alonso wrote: If my new replacement SSD with identical part number and firmware is 0.001 Gb smaller than the original and hence unable to mirror, what's to prevent the same thing from happening to one of my 1TB spindle disk mirrors? There is a standard for sizes t

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Tim Cook
On Fri, Apr 2, 2010 at 10:08 AM, Kyle McDonald wrote: > On 4/2/2010 8:08 AM, Edward Ned Harvey wrote: > >> I know it is way after the fact, but I find it best to coerce each > >> drive down to the whole GB boundary using format (create Solaris > >> partition just up to the boundary). Then if you e

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Miles Nordin
> "enh" == Edward Ned Harvey writes: enh> If you have zpool less than version 19 (when ability to remove enh> log device was introduced) and you have a non-mirrored log enh> device that failed, you had better treat the situation as an enh> emergency. Ed the log device removal sup

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Tirso Alonso
> If my new replacement SSD with identical part number and firmware is 0.001 > Gb smaller than the original and hence unable to mirror, what's to prevent > the same thing from happening to one of my 1TB spindle disk mirrors? There is a standard for sizes that many manufatures use (IDEMA LBA1-02):

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Robert Milkowski
On 02/04/2010 16:04, casper@sun.com wrote: sync() is actually *async* and returning from sync() says nothing about to clarify - in case of ZFS sync() is actually synchronous. -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss ma

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Ross Walker
On Fri, Apr 2, 2010 at 8:03 AM, Edward Ned Harvey wrote: >> > Seriously, all disks configured WriteThrough (spindle and SSD disks >> > alike) >> > using the dedicated ZIL SSD device, very noticeably faster than >> > enabling the >> > WriteBack. >> >> What do you get with both SSD ZIL and WriteBack

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Stuart Anderson
On Apr 2, 2010, at 5:08 AM, Edward Ned Harvey wrote: >> I know it is way after the fact, but I find it best to coerce each >> drive down to the whole GB boundary using format (create Solaris >> partition just up to the boundary). Then if you ever get a drive a >> little smaller it still should fi

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Bob Friesenhahn
On Fri, 2 Apr 2010, Edward Ned Harvey wrote: were taking place at the same time. That is, if two processes both complete a write operation at the same time, one in sync mode and the other in async mode, then it is guaranteed the data on disk will never have the async data committed before the sy

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Bob Friesenhahn
On Fri, 2 Apr 2010, Edward Ned Harvey wrote: So you're saying that while the OS is building txg's to write to disk, the OS will never reorder the sequence in which individual write operations get ordered into the txg's. That is, an application performing a small sync write, followed by a large

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Mattias Pantzare
On Fri, Apr 2, 2010 at 16:24, Edward Ned Harvey wrote: >> The purpose of the ZIL is to act like a fast "log" for synchronous >> writes.  It allows the system to quickly confirm a synchronous write >> request with the minimum amount of work. > > Bob and Casper and some others clearly know a lot her

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Kyle McDonald
On 4/2/2010 8:08 AM, Edward Ned Harvey wrote: >> I know it is way after the fact, but I find it best to coerce each >> drive down to the whole GB boundary using format (create Solaris >> partition just up to the boundary). Then if you ever get a drive a >> little smaller it still should fit. >>

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Casper . Dik
>Questions to answer would be: > >Is a ZIL log device used only by sync() and fsync() system calls? Is it >ever used to accelerate async writes? There are quite a few of "sync" writes, specifically when you mix in the NFS server. >Suppose there is an application which sometimes does sync writ

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Edward Ned Harvey
> The purpose of the ZIL is to act like a fast "log" for synchronous > writes. It allows the system to quickly confirm a synchronous write > request with the minimum amount of work. Bob and Casper and some others clearly know a lot here. But I'm hearing conflicting information, and don't know

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Edward Ned Harvey
> Only a broken application uses sync writes > sometimes, and async writes at other times. Suppose there is a virtual machine, with virtual processes inside it. Some virtual process issues a sync write to the virtual OS, meanwhile another virtual process issues an async write. Then the virtual O

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Edward Ned Harvey
> >Dude, don't be so arrogant. Acting like you know what I'm talking > about > >better than I do. Face it that you have something to learn here. > > You may say that, but then you post this: Acknowledged. I read something arrogant, and I replied even more arrogant. That was dumb of me. __

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Casper . Dik
>So you're saying that while the OS is building txg's to write to disk, the >OS will never reorder the sequence in which individual write operations get >ordered into the txg's. That is, an application performing a small sync >write, followed by a large async write, will never have the second op

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Casper . Dik
>> > http://nfs.sourceforge.net/ >> >> I think B4 is the answer to Casper's question: > >We were talking about ZFS, and under what circumstances data is flushed to >disk, in what way "sync" and "async" writes are handled by the OS, and what >happens if you disable ZIL and lose power to your syste

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Edward Ned Harvey
> If you have zpool less than version 19 (when ability to remove log > device > was introduced) and you have a non-mirrored log device that failed, you > had > better treat the situation as an emergency. > Instead, do "man zpool" and look for "zpool > remove." > If it says "supports removing log

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Edward Ned Harvey
> ZFS recovers to a crash-consistent state, even without the slog, > meaning it recovers to some state through which the filesystem passed > in the seconds leading up to the crash. This isn't what UFS or XFS > do. > > The on-disk log (slog or otherwise), if I understand right, can > actually make

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Edward Ned Harvey
> hello > > i have had this problem this week. our zil ssd died (apt slc ssd 16gb). > because we had no spare drive in stock, we ignored it. > > then we decided to update our nexenta 3 alpha to beta, exported the > pool and made a fresh install to have a clean system and tried to > import the poo

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Edward Ned Harvey
> > I am envisioning a database, which issues a small sync write, > followed by a > > larger async write. Since the sync write is small, the OS would > prefer to > > defer the write and aggregate into a larger block. So the > possibility of > > the later async write being committed to disk before

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Edward Ned Harvey
> > http://nfs.sourceforge.net/ > > I think B4 is the answer to Casper's question: We were talking about ZFS, and under what circumstances data is flushed to disk, in what way "sync" and "async" writes are handled by the OS, and what happens if you disable ZIL and lose power to your system. We w

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Roch
When we use one vmod, both machines are finished in about 6min45, zilstat maxes out at about 4200 IOPS. Using four vmods it takes about 6min55, zilstat maxes out at 2200 IOPS. Can you try 4 concurrent tar to four different ZFS filesystems (same pool). -r _

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Edward Ned Harvey
> I know it is way after the fact, but I find it best to coerce each > drive down to the whole GB boundary using format (create Solaris > partition just up to the boundary). Then if you ever get a drive a > little smaller it still should fit. It seems like it should be unnecessary. It seems like

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Edward Ned Harvey
> > Seriously, all disks configured WriteThrough (spindle and SSD disks > > alike) > > using the dedicated ZIL SSD device, very noticeably faster than > > enabling the > > WriteBack. > > What do you get with both SSD ZIL and WriteBack disks enabled? > > I mean if you have both why not use both? T

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Roch
Robert Milkowski writes: > On 01/04/2010 20:58, Jeroen Roodhart wrote: > > > >> I'm happy to see that it is now the default and I hope this will cause the > >> Linux NFS client implementation to be faster for conforming NFS servers. > >> > > Interesting thing is that apparently default

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Casper . Dik
>On 01/04/2010 20:58, Jeroen Roodhart wrote: >> >>> I'm happy to see that it is now the default and I hope this will cause the >>> Linux NFS client implementation to be faster for conforming NFS servers. >>> >> Interesting thing is that apparently defaults on Solaris an Linux are chosen >>

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Robert Milkowski
On 01/04/2010 20:58, Jeroen Roodhart wrote: I'm happy to see that it is now the default and I hope this will cause the Linux NFS client implementation to be faster for conforming NFS servers. Interesting thing is that apparently defaults on Solaris an Linux are chosen such that one can'

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Miles Nordin
> "enh" == Edward Ned Harvey writes: enh> Dude, don't be so arrogant. Acting like you know what I'm enh> talking about better than I do. Face it that you have enh> something to learn here. funny! AIUI you are wrong and Casper is right. ZFS recovers to a crash-consistent state, e

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Jeroen Roodhart
> It doesn't have to be F20. You could use the Intel > X25 for example. The mlc-based disks are bound to be too slow (we tested with an OCZ Vertex Turbo). So you're stuck with the X25-E (which Sun stopped supporting for some reason). I believe most "normal" SSDs do have some sort of cache and

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Carson Gaspar
Jeroen Roodhart wrote: The thread was started to get insight in behaviour of the F20 as ZIL. _My_ particular interest would be to be able to answer why perfomance doesn't seem to scale up when adding vmod-s... My best guess would be latency. If you are latency bound, adding additional paralle

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Jeroen Roodhart
Hi Casper, > :-) Leuk te zien dat je straal nog steeds even ver komt :-) >I'm happy to see that it is now the default and I hope this will cause the >Linux NFS client implementation to be faster for conforming NFS servers. Interesting thing is that apparently defaults on Solaris an Linux are ch

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Günther
hello i have had this problem this week. our zil ssd died (apt slc ssd 16gb). because we had no spare drive in stock, we ignored it. then we decided to update our nexenta 3 alpha to beta, exported the pool and made a fresh install to have a clean system and tried to import the pool. we only got

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Bob Friesenhahn
On Thu, 1 Apr 2010, casper@sun.com wrote: It does seem like rollback to a snapshot does help here (to assure that sync & async data is consistent), but it certainly does not help any NFS clients. Only a broken application uses sync writes sometimes, and async writes at other times. But do

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Casper . Dik
>It does seem like rollback to a snapshot does help here (to assure >that sync & async data is consistent), but it certainly does not help >any NFS clients. Only a broken application uses sync writes >sometimes, and async writes at other times. But doesn't that snapshot possibly have the sam

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Bob Friesenhahn
On Thu, 1 Apr 2010, Edward Ned Harvey wrote: Dude, don't be so arrogant. Acting like you know what I'm talking about better than I do. Face it that you have something to learn here. Geez! Yes, all the transactions in a transaction group are either committed entirely to disk, or not at all.

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Casper . Dik
>On 01/04/2010 13:01, Edward Ned Harvey wrote: >>> Is that what "sync" means in Linux? >>> >> A sync write is one in which the application blocks until the OS acks that >> the write has been committed to disk. An async write is given to the OS, >> and the OS is permitted to buffer the write

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Robert Milkowski
On 01/04/2010 13:01, Edward Ned Harvey wrote: Is that what "sync" means in Linux? A sync write is one in which the application blocks until the OS acks that the write has been committed to disk. An async write is given to the OS, and the OS is permitted to buffer the write to disk at its

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Bob Friesenhahn
On Thu, 1 Apr 2010, Edward Ned Harvey wrote: If I'm wrong about this, please explain. I am envisioning a database, which issues a small sync write, followed by a larger async write. Since the sync write is small, the OS would prefer to defer the write and aggregate into a larger block. So the

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Ross Walker
On Thu, Apr 1, 2010 at 10:03 AM, Darren J Moffat wrote: > On 01/04/2010 14:49, Ross Walker wrote: >>> >>> We're talking about the "sync" for NFS exports in Linux; what do they >>> mean >>> with "sync" NFS exports? >> >> See section A1 in the FAQ: >> >> http://nfs.sourceforge.net/ > > I think B4 is

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Darren J Moffat
On 01/04/2010 14:49, Ross Walker wrote: We're talking about the "sync" for NFS exports in Linux; what do they mean with "sync" NFS exports? See section A1 in the FAQ: http://nfs.sourceforge.net/ I think B4 is the answer to Casper's question: BEGIN QUOTE Linux servers (although not

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Ross Walker
On Apr 1, 2010, at 8:42 AM, casper@sun.com wrote: Is that what "sync" means in Linux? A sync write is one in which the application blocks until the OS acks that the write has been committed to disk. An async write is given to the OS, and the OS is permitted to buffer the write to di

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Ross Walker
On Mar 31, 2010, at 11:58 PM, Edward Ned Harvey wrote: We ran into something similar with these drives in an X4170 that turned out to be an issue of the preconfigured logical volumes on the drives. Once we made sure all of our Sun PCI HBAs where running the exact same version of firmware a

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Ross Walker
On Mar 31, 2010, at 11:51 PM, Edward Ned Harvey wrote: A MegaRAID card with write-back cache? It should also be cheaper than the F20. I haven't posted results yet, but I just finished a few weeks of extensive benchmarking various configurations. I can say this: WriteBack cache is much

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Casper . Dik
>> This approach does not solve the problem. When you do a snapshot, >> the txg is committed. If you wish to reduce the exposure to loss of >> sync data and run with ZIL disabled, then you can change the txg commit >> interval -- however changing the txg commit interval will not eliminate >> the

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Casper . Dik
>> Is that what "sync" means in Linux? > >A sync write is one in which the application blocks until the OS acks that >the write has been committed to disk. An async write is given to the OS, >and the OS is permitted to buffer the write to disk at its own discretion. >Meaning the async write fun

  1   2   3   >