Re: [zfs-discuss] slow zfs send/recv speed
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Anatoly > > I've just made clean test for sequential data read. System has 45 mirror > vdevs. 90 disks in the system... I bet you have a lot of ram? > 2. Read file normally: > # time dd if=./big_file bs=128k of=/dev/null > 161118683136 bytes (161 GB) copied, 103.455 seconds, 1.6 GB/s I wonder how much of that is being read back from cache. Would it be impossible to reboot, or otherwise invalidate the cache, before reading the file back? With 90 disks, in theory, you should be able to read something like 90Gbit = 11GB / sec. But of course various bus speed bottlenecks come into play, so I don't think the 1.6GB/s is unrealistically high in any way. > 3. Snapshot & send: > # zfs snapshot volume/test@A > # time zfs send volume/test@A > /dev/null > real 7m20.635s > user 0m0.004s > sys 0m52.760s This doesn't surprise me, based on gut feel, I don't think zfs send performs optimally, in general. I think your results are probably correct, and even if you revisit all this, doing the reboots (or cache invalidation) and/or using a newly created pool, as anyone here might suggest... I think you'll still see the same results. Somewhat unpredictably. Even so, I always find zfs send performance still beats the pants off any alternative... rsync and whatnot. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] how to set up solaris os and cache within one SSD
On 11/10/2011 7:42 AM, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of darkblue 1 * XEON 5606 1 * supermirco X8DT3-LN4F 6 * 4G RECC RAM 22 * WD RE3 1T harddisk 4 * intel 320 (160G) SSD 1 * supermicro 846E1-900B chassis I just want to say, this isn't supported hardware, and although many people will say they do this without problem, I've heard just as many people (including myself) saying it's unstable that way. I recommend buying either the oracle hardware or the nexenta on whatever they recommend for hardware. Definitely DO NOT run the free version of solaris without updates and expect it to be reliable. But that's a separate issue. I'm also emphasizing that even if you pay for solaris support on non-oracle hardware, don't expect it to be great. But maybe it will be. I think the key issue here, is whether this hardware will corrupt a pool or not. Ultimately, the promise of ZFS, for me anyways, is that I can take disks to new hardware if/when needed. I am not dependent on a controller or motherboard which provides some feature key to access the data on the disks. Companies which sell key software, that you depend on working, generally have proven that software to work reliably on hardware which they might sell to make use of said software. Apple's business model and success, for example is based on this fact, because they have a much smaller bug pool to consider. Oracle hardware works out the same way. I think supporting the development of ZFS is key to the next generation of storage solutions... But, I don't need the class of hardware that Oracle wants me to pay for. I need disks with 24/7 reliability. I can wait till tomorrow to store something onto my server from my laptop/desktop. Consumer/non-enterprise needs are quite different, and I don't think Oracle understands how to deal in the 1,000,000,000 potential customer marketplace. They've had a hard enough time just working in the 100,000 customer marketplace. Gregg ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slow zfs send/recv speed
On Wed, Nov 16 at 9:35, David Dyer-Bennet wrote: On Tue, November 15, 2011 17:05, Anatoly wrote: Good day, The speed of send/recv is around 30-60 MBytes/s for initial send and 17-25 MBytes/s for incremental. I have seen lots of setups with 1 disk to 100+ disks in pool. But the speed doesn't vary in any degree. As I understand 'zfs send' is a limiting factor. I did tests by sending to /dev/null. It worked out too slow and absolutely not scalable. None of cpu/memory/disk activity were in peak load, so there is of room for improvement. What you're probably seeing with incremental sends is that the disks being read are hitting their IOPS limits. Zfs send does random reads all over the place -- every block that's changed since the last incremental send is read, in TXG order. So that's essentially random reads all of the disk. Anatoly didn't state whether his 160GB file test was done on a virgin pool, or whether it was allocated out of an existing pool. If the latter, your comment is the likely explanation. If the former, your comment wouldn't explain the slow performance. --eric -- Eric D. Mudama edmud...@bounceswoosh.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slow zfs send/recv speed
On Wed, Nov 16, 2011 at 11:07 AM, Anatoly wrote: > I've just made clean test for sequential data read. System has 45 mirror > vdevs. > > 1. Create 160GB random file. > 2. Read it to /dev/null. > 3. Do Snaspshot and send it to /dev/null. > 4. Compare results. What OS? The following is under Solaris 10U9 with CPU_2010-10 + an IDR for a SAS/SATA drive bug. I just had to replicate over 20TB of small files, `zfs send -R | zfs recv -e `, and I got an AVERAGE throughput of over 77MB/sec. (over 6TB /day). The entire replication took just over 3 days. The source zpool is on J4400 750GB SATA drives, 110 of them in a RAIDz2 configuration (22 vdevs of 5 disks each), the target was a pair of old h/w raid boxes (one without any NVRAM cache) and a zpool configuration of 6 striped vdevs (a total of 72 drives behind the h/w raid controller doing raid5, this is temporary and only for moving data physically around, so the lack of ZFS redundancy is not an issue). There are over 2300 snapshots on the source side and we were replicating close to 2000 of them. -- {1-2-3-4-5-6-7-} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, RPI Players ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slow zfs send/recv speed
Good day, I've just made clean test for sequential data read. System has 45 mirror vdevs. 1. Create 160GB random file. 2. Read it to /dev/null. 3. Do Snaspshot and send it to /dev/null. 4. Compare results. 1. Write speed is slow due to 'urandom': # dd if=/dev/urandom bs=128k | pv > big_file 161118683136 bytes (161 GB) copied, 3962.15 seconds, 40.7 MB/s 2. Read file normally: # time dd if=./big_file bs=128k of=/dev/null 161118683136 bytes (161 GB) copied, 103.455 seconds, 1.6 GB/s real 1m43.459s user 0m0.899s sys 1m25.078s 3. Snapshot & send: # zfs snapshot volume/test@A # time zfs send volume/test@A > /dev/null real 7m20.635s user 0m0.004s sys 0m52.760s 4. As you see, there is 4 times difference on pure sequential read, greenhouse conditions. I repeated tests couple of times to check ARC influence - no much difference. Real send speed on this system is around 60 MBytes/s with some 100 peak. File read operation is good scaled for large number of disks. But 'zfs send' is lame. In normal conditions moving of large portions of data may take days to weeks. It can't fill 10G Ethernet connection, sometimes even 1G. Best regards, Anatoly Legkodymov. On 16.11.2011 06:08, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Anatoly The speed of send/recv is around 30-60 MBytes/s for initial send and 17-25 MBytes/s for incremental. I have seen lots of setups with 1 disk I suggest watching zpool iostat before, during, and after the send to /dev/null. Actually, I take that back - zpool iostat seems to measure virtual IOPS, as I just did this on my laptop a minute ago, I saw 1.2k ops, which is at least 5-6x higher than my hard drive can handle, which can only mean it's reading a lot of previously aggregated small blocks from disk, which are now sequentially organized on disk. How do you measure physical iops? Is it just regular iostat? I have seriously put zero effort into answering this question (sorry.) I have certainly noticed a delay in the beginning, while the system thinks about stuff for a little while to kick off an incremental... And it's acknowledged and normal that incrementals are likely fragmented all over the place so you could be IOPS limited (hence watching the iostat). Also, whenever I sit and watch it for long times, I see that it varies enormously. For 5 minutes it will be (some speed), and for 5 minutes it will be 5x higher... Whatever it is, it's something we likely are all seeing, but probably just ignoring. If you can find it in your heart to just ignore it too, then great, no problem. ;-) Otherwise, it's a matter of digging in and characterizing to learn more about it. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slow zfs send/recv speed
On Tue, November 15, 2011 17:05, Anatoly wrote: > Good day, > > The speed of send/recv is around 30-60 MBytes/s for initial send and > 17-25 MBytes/s for incremental. I have seen lots of setups with 1 disk > to 100+ disks in pool. But the speed doesn't vary in any degree. As I > understand 'zfs send' is a limiting factor. I did tests by sending to > /dev/null. It worked out too slow and absolutely not scalable. > None of cpu/memory/disk activity were in peak load, so there is of room > for improvement. What you're probably seeing with incremental sends is that the disks being read are hitting their IOPS limits. Zfs send does random reads all over the place -- every block that's changed since the last incremental send is read, in TXG order. So that's essentially random reads all of the disk. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Remove corrupt files from snapshot
On Tue, November 15, 2011 10:07, sbre...@hotmail.com wrote: > Would it make sense to do "zfs scrub" regularly and have a report sent, > i.e. once a day, so discrepancy would be noticed beforehand? Is there > anything readily available in the Freebsd ZFS package for this? If you're not scrubbing regularly, you're losing out on one of the key benefits of ZFS. In nearly all fileserver situations, a good amount of the content is essentially archival, infrequently accessed but important now and then. (In my case it's my collection of digital and digitized photos.) A weekly scrub combined with a decent backup plan will detect bit-rot before the backups with the correct data cycle into the trash (and, with redundant storage like mirroring or RAID, the scrub will probably be able to fix the error without resorting to restoring files from backup). -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slow zfs send/recv speed
On Tue, November 15, 2011 20:08, Edward Ned Harvey wrote: >> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- >> boun...@opensolaris.org] On Behalf Of Anatoly >> >> The speed of send/recv is around 30-60 MBytes/s for initial send and >> 17-25 MBytes/s for incremental. I have seen lots of setups with 1 disk > > I suggest watching zpool iostat before, during, and after the send to > /dev/null. Actually, I take that back - zpool iostat seems to measure > virtual IOPS, as I just did this on my laptop a minute ago, I saw 1.2k > ops, > which is at least 5-6x higher than my hard drive can handle, which can > only > mean it's reading a lot of previously aggregated small blocks from disk, > which are now sequentially organized on disk. How do you measure physical > iops? Is it just regular iostat? I have seriously put zero effort into > answering this question (sorry.) > > I have certainly noticed a delay in the beginning, while the system thinks > about stuff for a little while to kick off an incremental... And it's > acknowledged and normal that incrementals are likely fragmented all over > the > place so you could be IOPS limited (hence watching the iostat). > > Also, whenever I sit and watch it for long times, I see that it varies > enormously. For 5 minutes it will be (some speed), and for 5 minutes it > will be 5x higher... > > Whatever it is, it's something we likely are all seeing, but probably just > ignoring. If you can find it in your heart to just ignore it too, then > great, no problem. ;-) Otherwise, it's a matter of digging in and > characterizing to learn more about it. I see rather variable io stats while sending incremental backups. The receiver is a USB disk, so fairly slow, but I get 30MB/s in a good stretch. I'm compressing the ZFS filesystem on the receiving end, but much of my content is already-compressed photo files, so it doesn't make a huge difference. Helps some, though, and at 30MB/s there's no shortage of CPU horsepower to handle the compression. The raw files are around 12MB each, probably not fragmented much (they're just copied over from memory cards). For a small number of the files, there's a photoshop file that's much bigger (sometimes more than 1GB, if it's a stitched panorama with layers of changes). And then there are sidecar XMP files, mostly two per image, and for most of them web-resolution images, 100kB. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss