Re: [zfs-discuss] Removing Cloned Snapshot
On Fri, Feb 12, 2010 at 1:08 PM, Daniel Carosone wrote: > With dedup and bp-rewrite, a new operation could be created that takes > the shared data and makes it uniquely-referenced but deduplicated data. > This could be a lot more efficient and less disruptive because of the > advanced knnowledge that the data must already be the same. That's essentially what a send/recv does when dedup is enabled. -B -- Brandon High : bh...@freaks.com There is absolutely no substitute for a genuine lack of preparation. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs promote
Hello, # /usr/sbin/zfs list -r rgd3 NAME USEDAVAIL REFER MOUNTPOINT rgd3 16.5G23.4G20K /rgd3 rgd3/fs1 19K 23.4G21K /app/fs1 rgd3/fs1-patch 16.4G23.4G 16.4G /app/fs1-patch rgd3/fs1-pa...@snap134.8M - 16.4G - # /usr/sbin/zfs promote rgd3/fs1 snap is 16.4G in USED. # /usr/sbin/zfs list -r rgd3 NAME USED AVAIL REFER MOUNTPOINT rgd3 16.5G 23.4G20K /rgd3 rgd3/fs1 16.4G 23.4G21K /app/fs1 rgd3/f...@snap1 16.4G - 16.4G - rgd3/fs1-patch33.9M 23.4G 16.4G/app/fs1-patch 5.10 Generic_141414-10 I tired to line up numbers, it does not work. Sorry for the format. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD and ZFS
On Fri, Feb 12, 2010 at 02:25:51PM -0800, TMB wrote: > I have a similar question, I put together a cheapo RAID with four 1TB WD > Black (7200) SATAs, in a 3TB RAIDZ1, and I added a 64GB OCZ Vertex SSD, with > slice 0 (5GB) for ZIL and the rest of the SSD for cache: > # zpool status dpool > pool: dpool > state: ONLINE > scrub: none requested > config: > > NAMESTATE READ WRITE CKSUM > dpool ONLINE 0 0 0 > raidz1ONLINE 0 0 0 > c0t0d0 ONLINE 0 0 0 > c0t0d1 ONLINE 0 0 0 > c0t0d2 ONLINE 0 0 0 > c0t0d3 ONLINE 0 0 0 > [b]logs > c0t0d4s0 ONLINE 0 0 0[/b] > [b]cache > c0t0d4s1 ONLINE 0 0 0[/b] > spares > c0t0d6AVAIL > c0t0d7AVAIL > >capacity operationsbandwidth > pool used avail read write read write > -- - - - - - - > dpool 72.1G 3.55T237 12 29.7M 597K > raidz172.1G 3.55T237 9 29.7M 469K > c0t0d0 - -166 3 7.39M 157K > c0t0d1 - -166 3 7.44M 157K > c0t0d2 - -166 3 7.39M 157K > c0t0d3 - -167 3 7.45M 157K > c0t0d4s020K 4.97G 0 3 0 127K > cache - - - - - - > c0t0d4s1 17.6G 36.4G 3 1 249K 119K > -- - - - - - - > I just don't seem to be getting any bang for the buck I should be. This was > taken while rebuilding an Oracle index, all files stored in this pool. The > WD disks are at 100%, and nothing is coming from the cache. The cache does > have the entire DB cached (17.6G used), but hardly reads anything from it. I > also am not seeing the spike of data flowing into the ZIL either, although > iostat show there is just write traffic hitting the SSD: > > extended device statistics cpu > devicer/sw/s kr/s kw/s wait actv svc_t %w %b us sy wt id > sd0 170.00.4 7684.70.0 0.0 35.0 205.3 0 100 11 8 0 82 > sd1 168.40.4 7680.20.0 0.0 34.6 205.1 0 100 > sd2 172.00.4 7761.70.0 0.0 35.0 202.9 0 100 > sd3 0.0 0.0 0.00.0 0.0 0.00.0 0 0 > sd4 170.00.4 7727.10.0 0.0 35.0 205.3 0 100 > [b]sd5 1.6 2.6 182.4 104.8 0.0 0.5 117.8 0 31 [/b] > > Since this SSD is in a RAID array, and just presents as a regular disk LUN, > is there a special incantation required to turn on the Turbo mode? > > Doesnt it seem that all this traffic should be maxing out the SSD? Reads from > the cache, and writes to the ZIL? I have a seocnd identical SSD I wanted to > add as a mirror, but it seems pointless if there's no zip to be had The most likely reason is that this workload has been identified as streaming by ZFS, which is prefetching from disk instead of the L2ARC (l2arc_nopreftch=1). It also looks like you've used a 128 Kbyte ZFS record size. Is Oracle doing 128 Kbyte random I/O? We usually tune that down before creating the database; which will use the L2ARC device more efficiently. Brendan -- Brendan Gregg, Fishworks http://blogs.sun.com/brendan ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS ZIL + L2ARC SSD Setup
G'Day, On Sat, Feb 13, 2010 at 09:02:58AM +1100, Daniel Carosone wrote: > On Fri, Feb 12, 2010 at 11:26:33AM -0800, Richard Elling wrote: > > Mathing aorund a bit, for a 300 GB L2ARC (apologies for the tab separation): > > size (GB) 300 > > size (sectors) 585937500 > > labels (sectors)9232 > > available sectors 585928268 > > bytes/L2ARC header 200 > > > > recordsize (sectors)recordsize (kBytes) L2ARC capacity > > (records)Header size (MBytes) > > 1 0.5 585928268 111,760 > > 2 1 292964134 55,880 > > 4 2 146482067 27,940 > > 8 4 7324103313,970 > > 16 8 366205166,980 > > 32 16 183102583,490 > > 64 32 9155129 1,750 > > 128 64 4577564 870 > > 256 128 2288782 440 > > > > So, depending on the data, you need somewhere between 440 MBytes and 111 > > GBytes > > to hold the L2ARC headers. For a rule of thumb, somewhere between 0.15% and > > 40% > > of the total used size. Ok, that rule really isn't very useful... > > All that precision up-front for such a broad conclusion.. bummer :) > > I'm interested in a better rule of thumb, for rough planning > purposes. As previously noted, I'm especially interesed in the I use 2.5% for an 8 Kbyte record size. ie, for every 1 Gbyte of L2ARC, about 25 Mbytes of ARC is consumed. I don't recommand other record sizes since: - the L2ARC is currently intended for random I/O workloads. Such workloads usually have small record sizes, such as 8 Kbytes. Larger record sizes (such as the 128 Kbyte default) is better for streaming workloads. The L2ARC doesn't currently touch streaming workloads (l2arc_noprefetch=1). - The best performance from SSDs is with smaller I/O sizes, not larger. I get about 3200 x 8 Kbyte read I/O from my current L2ARC devices, yet only about 750 x 128 Kbyte read I/O from the same devices. - smaller than 4 Kbyte record sizes leads to a lot of ARC headers and worse streaming performance. I wouldn't tune it smaller unless I had to for some reason. So, from the table above I'd only really consider the 4 to 32 Kbyte size range. 4 Kbytes if you really wanted a smaller record size, and 32 Kbytes if you had limited DRAM you wanted to conserve (at the trade-off of SSD performance.) Brendan > combination with dedup, where DDT entries need to be cached. What's > the recordsize for L2ARC-of-on-disk-DDT, and how does that bias the > overhead %age above? > > I'm also interested in a more precise answer to a different question, > later on. Lets say I already have an L2ARC, running and warm. How do > I tell how much is being used? Presumably, if it's not full, RAM > to manage it is the constraint - how can I confirm that and how can I > tell how much RAM is currently used? > > If I can observe these figures, I can tell if I'm wasting ssd space > that can't be used. Either I can reallocate that space or know that > adding RAM will have an even bigger benefit (increasing both primary > and secondary cache sizes). Maybe I can even decide that L2ARC is not > worth it for this box (especially if it can't fit any more RAM). > > Finally, how smart is L2ARC at optimising this usage? If it's under > memory pressure, does it prefer to throw out smaller records in favour > of larger more efficient ones? > > My current rule of thumb for all this, absent better information, is > that you should just have gobs of RAM (no surprise there) but that if > you can't, then dedup seems to be most worthwhile when the pool itself > is on ssd, no l2arc. Say, a laptop. Here, you care most about saving > space and the IO overhead costs least. > > We need some thumbs in between these extremes. :-( > > -- > Dan. > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Brendan Gregg, Fishworks http://blogs.sun.com/brendan ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD and ZFS
I have a similar question, I put together a cheapo RAID with four 1TB WD Black (7200) SATAs, in a 3TB RAIDZ1, and I added a 64GB OCZ Vertex SSD, with slice 0 (5GB) for ZIL and the rest of the SSD for cache: # zpool status dpool pool: dpool state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM dpool ONLINE 0 0 0 raidz1ONLINE 0 0 0 c0t0d0 ONLINE 0 0 0 c0t0d1 ONLINE 0 0 0 c0t0d2 ONLINE 0 0 0 c0t0d3 ONLINE 0 0 0 [b]logs c0t0d4s0 ONLINE 0 0 0[/b] [b]cache c0t0d4s1 ONLINE 0 0 0[/b] spares c0t0d6AVAIL c0t0d7AVAIL capacity operationsbandwidth pool used avail read write read write -- - - - - - - dpool 72.1G 3.55T237 12 29.7M 597K raidz172.1G 3.55T237 9 29.7M 469K c0t0d0 - -166 3 7.39M 157K c0t0d1 - -166 3 7.44M 157K c0t0d2 - -166 3 7.39M 157K c0t0d3 - -167 3 7.45M 157K c0t0d4s020K 4.97G 0 3 0 127K cache - - - - - - c0t0d4s1 17.6G 36.4G 3 1 249K 119K -- - - - - - - I just don't seem to be getting any bang for the buck I should be. This was taken while rebuilding an Oracle index, all files stored in this pool. The WD disks are at 100%, and nothing is coming from the cache. The cache does have the entire DB cached (17.6G used), but hardly reads anything from it. I also am not seeing the spike of data flowing into the ZIL either, although iostat show there is just write traffic hitting the SSD: extended device statistics cpu devicer/sw/s kr/s kw/s wait actv svc_t %w %b us sy wt id sd0 170.00.4 7684.70.0 0.0 35.0 205.3 0 100 11 8 0 82 sd1 168.40.4 7680.20.0 0.0 34.6 205.1 0 100 sd2 172.00.4 7761.70.0 0.0 35.0 202.9 0 100 sd3 0.0 0.0 0.00.0 0.0 0.00.0 0 0 sd4 170.00.4 7727.10.0 0.0 35.0 205.3 0 100 [b]sd5 1.6 2.6 182.4 104.8 0.0 0.5 117.8 0 31 [/b] Since this SSD is in a RAID array, and just presents as a regular disk LUN, is there a special incantation required to turn on the Turbo mode? Doesnt it seem that all this traffic should be maxing out the SSD? Reads from the cache, and writes to the ZIL? I have a seocnd identical SSD I wanted to add as a mirror, but it seems pointless if there's no zip to be had help? Thanks, Tracey -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS ZIL + L2ARC SSD Setup
On Fri, Feb 12, 2010 at 11:26:33AM -0800, Richard Elling wrote: > Mathing aorund a bit, for a 300 GB L2ARC (apologies for the tab separation): > size (GB) 300 > size (sectors) 585937500 > labels (sectors)9232 > available sectors 585928268 > bytes/L2ARC header 200 > > recordsize (sectors)recordsize (kBytes) L2ARC capacity > (records)Header size (MBytes) > 1 0.5 585928268 111,760 > 2 1 292964134 55,880 > 4 2 146482067 27,940 > 8 4 7324103313,970 > 16 8 366205166,980 > 32 16 183102583,490 > 64 32 9155129 1,750 > 128 64 4577564 870 > 256 128 2288782 440 > > So, depending on the data, you need somewhere between 440 MBytes and 111 > GBytes > to hold the L2ARC headers. For a rule of thumb, somewhere between 0.15% and > 40% > of the total used size. Ok, that rule really isn't very useful... All that precision up-front for such a broad conclusion.. bummer :) I'm interested in a better rule of thumb, for rough planning purposes. As previously noted, I'm especially interesed in the combination with dedup, where DDT entries need to be cached. What's the recordsize for L2ARC-of-on-disk-DDT, and how does that bias the overhead %age above? I'm also interested in a more precise answer to a different question, later on. Lets say I already have an L2ARC, running and warm. How do I tell how much is being used? Presumably, if it's not full, RAM to manage it is the constraint - how can I confirm that and how can I tell how much RAM is currently used? If I can observe these figures, I can tell if I'm wasting ssd space that can't be used. Either I can reallocate that space or know that adding RAM will have an even bigger benefit (increasing both primary and secondary cache sizes). Maybe I can even decide that L2ARC is not worth it for this box (especially if it can't fit any more RAM). Finally, how smart is L2ARC at optimising this usage? If it's under memory pressure, does it prefer to throw out smaller records in favour of larger more efficient ones? My current rule of thumb for all this, absent better information, is that you should just have gobs of RAM (no surprise there) but that if you can't, then dedup seems to be most worthwhile when the pool itself is on ssd, no l2arc. Say, a laptop. Here, you care most about saving space and the IO overhead costs least. We need some thumbs in between these extremes. :-( -- Dan. pgpgmOukpOsXT.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Removing Cloned Snapshot
On Fri, Feb 12, 2010 at 09:50:32AM -0500, Mark J Musante wrote: > The other option is to zfs send the snapshot to create a copy > instead of a clone. One day, in the future, I hope there might be a third option, somewhat as an optimimsation. With dedup and bp-rewrite, a new operation could be created that takes the shared data and makes it uniquely-referenced but deduplicated data. This could be a lot more efficient and less disruptive because of the advanced knnowledge that the data must already be the same. Whether its worth the implementation effort is another issue, but in the meantime we have plenty of time to try and come up with a sensible name for it. "unclone" is too boring :) -- Dan. pgpZRhozoZ5Cf.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced
On Fri, Feb 12, 2010 at 12:11 PM, Al Hopper wrote: > There's your first mistake. You're probably eligible for a very nice > Federal Systems discount. My *guess* would be about 40%. Promise JBOD and similar systems are often the only affordable choice for those of us who can't get sweetheart discounts, don't work at billion dollar corporations, or aren't bankrolled by the Federal Leviathan. Daniel Bakken Systems Administrator Economic Modeling Specialists Inc 1187 Alturas Drive Moscow, Idaho 83843 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD and ZFS
I don't think adding an SSD mirror to an existing pool will do much for performance. Some of your data will surely go to those SSDs, but I don't think the solaris will know they are SSDs and move blocks in and out according to usage patterns to give you an all around boost. They will just be used to store data, nothing more. Perhaps it will be more useful to add the SSDs as either an L2ARC or SLOG for the ZIL, but that will depend upon your work load. If you do NFS or iSCSI access, the putting the ZIL onto the SSD drive(s) will speed up writes. Added to the L2ARC will speed up reads. Here is the ZFS best practices guide, which should help with this decision: http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide Read that, then come back with more questions. Best, Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced
On Tue, Feb 9, 2010 at 12:55 PM, matthew patton wrote: . snip > Enter the J4500, 48 drives in 4U, what looks to be solid engineering, and > redundancy in all the right places. An empty chassis at $3000 is totally > justifiable. Maybe as high as $4000. In comparison a naked Dell MD1000 is > $2000. If you do the subtraction from SUN's claimed "breakthru" pricing of > $1/GB, the chassis cost works out to $4000. I can live with that. > > Now look up the price for 24TB and it's 28 freaking thousand! There's your first mistake. You're probably eligible for a very nice Federal Systems discount. My *guess* would be about 40%. . snip ... Regards, -- Al Hopper Logical Approach Inc,Plano,TX a...@logical-approach.com Voice: 972.379.2133 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS ZIL + L2ARC SSD Setup
On Feb 12, 2010, at 9:36 AM, Felix Buenemann wrote: > Am 12.02.10 18:17, schrieb Richard Elling: >> On Feb 12, 2010, at 8:20 AM, Felix Buenemann wrote: >> >>> Hi Mickaël, >>> >>> Am 12.02.10 13:49, schrieb Mickaël Maillot: Intel X-25 M are MLC not SLC, there are very good for L2ARC. >>> >>> Yes, I'm only using those for L2ARC, I'm planing on getting to Mtron Pro >>> 7500 16GB SLC SSDs for ZIL. >>> and next, you need more RAM: ZFS can't handle 4x 80 Gb of L2ARC with only 4Gb of RAM because ZFS use memory to allocate and manage L2ARC. >>> >>> Is there a guideline in which relation L2ARC size should be to RAM? >> >> Approximately 200 bytes per record. I use the following example: >> Suppose we use a Seagate LP 2 TByte disk for the L2ARC >> + Disk has 3,907,029,168 512 byte sectors, guaranteed >> + Workload uses 8 kByte fixed record size >> RAM needed for arc_buf_hdr entries >> + Need = ~(3,907,029,168 - 9,232) * 200 / 16 = ~48 GBytes >> >> Don't underestimate the RAM needed for large L2ARCs > > I'm not sure how your workload record size plays into above formula (where > does - 9232 come from?), but given I've got ~300GB L2ARC, I'd need about > 7.2GB RAM, so upgrading to 8GB would be enough to satisfy the L2ARC. recordsize=8kB=16 sectors @ 512 bytes/sector 9,232 is the number of sectors reserved for labels, around 4.75 MBytes Mathing aorund a bit, for a 300 GB L2ARC (apologies for the tab separation): size (GB) 300 size (sectors) 585937500 labels (sectors)9232 available sectors 585928268 bytes/L2ARC header 200 recordsize (sectors)recordsize (kBytes) L2ARC capacity (records)Header size (MBytes) 1 0.5 585928268 111,760 2 1 292964134 55,880 4 2 146482067 27,940 8 4 7324103313,970 16 8 366205166,980 32 16 183102583,490 64 32 9155129 1,750 128 64 4577564 870 256 128 2288782 440 So, depending on the data, you need somewhere between 440 MBytes and 111 GBytes to hold the L2ARC headers. For a rule of thumb, somewhere between 0.15% and 40% of the total used size. Ok, that rule really isn't very useful... The next question is, what does my data look like? The answer is that there will most likely be a distribution of various sized record. But the distribution isn't as interesting for this calculation than the actual number of records. I'm not sure there is an easy way to get that information, but I'll look around... -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS ZIL + L2ARC SSD Setup
On 02/12/10 09:36, Felix Buenemann wrote: given I've got ~300GB L2ARC, I'd need about 7.2GB RAM, so upgrading to 8GB would be enough to satisfy the L2ARC. But that would only leave ~800MB free for everything else the server needs to do. - Bill ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS ZIL + L2ARC SSD Setup
Am 12.02.10 18:17, schrieb Richard Elling: On Feb 12, 2010, at 8:20 AM, Felix Buenemann wrote: Hi Mickaël, Am 12.02.10 13:49, schrieb Mickaël Maillot: Intel X-25 M are MLC not SLC, there are very good for L2ARC. Yes, I'm only using those for L2ARC, I'm planing on getting to Mtron Pro 7500 16GB SLC SSDs for ZIL. and next, you need more RAM: ZFS can't handle 4x 80 Gb of L2ARC with only 4Gb of RAM because ZFS use memory to allocate and manage L2ARC. Is there a guideline in which relation L2ARC size should be to RAM? Approximately 200 bytes per record. I use the following example: Suppose we use a Seagate LP 2 TByte disk for the L2ARC + Disk has 3,907,029,168 512 byte sectors, guaranteed + Workload uses 8 kByte fixed record size RAM needed for arc_buf_hdr entries + Need = ~(3,907,029,168 - 9,232) * 200 / 16 = ~48 GBytes Don't underestimate the RAM needed for large L2ARCs I'm not sure how your workload record size plays into above formula (where does - 9232 come from?), but given I've got ~300GB L2ARC, I'd need about 7.2GB RAM, so upgrading to 8GB would be enough to satisfy the L2ARC. -- richard I could upgrade the server to 8GB, but that's the maximum the i975X chipset can handle. Best Regards, Felix Buenemann - Felix ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS ZIL + L2ARC SSD Setup
On Feb 12, 2010, at 8:20 AM, Felix Buenemann wrote: > Hi Mickaël, > > Am 12.02.10 13:49, schrieb Mickaël Maillot: >> Intel X-25 M are MLC not SLC, there are very good for L2ARC. > > Yes, I'm only using those for L2ARC, I'm planing on getting to Mtron Pro 7500 > 16GB SLC SSDs for ZIL. > >> and next, you need more RAM: >> ZFS can't handle 4x 80 Gb of L2ARC with only 4Gb of RAM because ZFS >> use memory to allocate and manage L2ARC. > > Is there a guideline in which relation L2ARC size should be to RAM? Approximately 200 bytes per record. I use the following example: Suppose we use a Seagate LP 2 TByte disk for the L2ARC + Disk has 3,907,029,168 512 byte sectors, guaranteed + Workload uses 8 kByte fixed record size RAM needed for arc_buf_hdr entries + Need = ~(3,907,029,168 - 9,232) * 200 / 16 = ~48 GBytes Don't underestimate the RAM needed for large L2ARCs -- richard > > I could upgrade the server to 8GB, but that's the maximum the i975X chipset > can handle. > > Best Regards, >Felix Buenemann > > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS ZIL + L2ARC SSD Setup
Hi Mickaël, Am 12.02.10 13:49, schrieb Mickaël Maillot: Intel X-25 M are MLC not SLC, there are very good for L2ARC. Yes, I'm only using those for L2ARC, I'm planing on getting to Mtron Pro 7500 16GB SLC SSDs for ZIL. and next, you need more RAM: ZFS can't handle 4x 80 Gb of L2ARC with only 4Gb of RAM because ZFS use memory to allocate and manage L2ARC. Is there a guideline in which relation L2ARC size should be to RAM? I could upgrade the server to 8GB, but that's the maximum the i975X chipset can handle. Best Regards, Felix Buenemann ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Removing Cloned Snapshot
On Fri, 12 Feb 2010, Daniel Carosone wrote: You can use zfs promote to change around which dataset owns the base snapshot, and which is the dependant clone with a parent, so you can deletehe other - but if you want both datasets you will need to keep the snapshot they share. Right. The other option is to zfs send the snapshot to create a copy instead of a clone. Once the zfs recv completes, the snapshot can be destroyed. Of course, it takes much longer to do this, as zfs is going to create a full copy of the snapshot. The appeal of clones is that they, at least initially, take no extra space, and also that they're nearly instantaneous. But they require the snapshot to remain for the lifetime of the clone. Regards, markm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] SSD and ZFS
Hi all, just after sending a message to sunmanagers I realized that my question should rather have gone here. So sunmanagers please excus ethe double post: I have inherited a X4140 (8 SAS slots) and have just setup the system with Solaris 10 09. I first setup the system on a mirrored pool over the first two disks pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t0d0s0 ONLINE 0 0 0 c1t1d0s0 ONLINE 0 0 0 errors: No known data errors and then tried to add the second pair of disks to this pool which did not work (famous error message reagding label, root pool BIOS issue). I therefore simply created an additional pool tank. pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t0d0s0 ONLINE 0 0 0 c1t1d0s0 ONLINE 0 0 0 errors: No known data errors pool: tank state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM tankONLINE 0 0 0 mirrorONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 errors: No known data errors So far so good. I have now replaced the last two SAS disks with 32GB SSDs and am wondering how to add these to the system. I googled a lot for best practise but found nothing so far that made me any wiser. My current approach still is to simply do zpool add tank mirror c0t6d0 c0t7d0 as I would do with normal disks but I am wondering whether that's the right approach to significantly increase system performance. Will ZFS automatically use these SSDs and optimize accesses to tank? Probably! But it won't optimize accesses to rpool of course. Not sure whether I need that or should look for that. Should I try to get all disks into rpool inspite of the BIOS label issue so that SSDs are used for all accesses to the disk system? Hints (best practises) are greatly appreciated? Thanks a lot, Andreas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS ZIL + L2ARC SSD Setup
Hi Intel X-25 M are MLC not SLC, there are very good for L2ARC. and next, you need more RAM: ZFS can't handle 4x 80 Gb of L2ARC with only 4Gb of RAM because ZFS use memory to allocate and manage L2ARC. 2010/2/10 Felix Buenemann : > Am 09.02.10 09:58, schrieb Felix Buenemann: >> >> Am 09.02.10 02:30, schrieb Bob Friesenhahn: >>> >>> On Tue, 9 Feb 2010, Felix Buenemann wrote: Well to make things short: Using JBOD + ZFS Striped Mirrors vs. controller's RAID10, dropped the max. sequential read I/O from over 400 MByte/s to below 300 MByte/s. However random I/O and sequential writes seemed to perform >>> >>> Much of the difference is likely that your controller implements true >>> RAID10 wereas ZFS "striped" mirrors are actually load-shared mirrors. >>> Since zfs does not use true striping across vdevs, it relies on >>> sequential prefetch requests to get the sequential read rate up. >>> Sometimes zfs's prefetch is not aggressive enough. >>> >>> I have observed that there may still be considerably more read >>> performance available (to another program/thread) even while a benchmark >>> program is reading sequentially as fast as it can. >>> >>> Try running two copies of your benchmark program at once and see what >>> happens. >> >> Yes, JBOD + ZFS load-balanced mirrors does seem to work better under >> heavy load. I tried rebooting a Windows VM from NFS, which took about 43 >> sec with hot cache in both cases. But when doing this during a bonnie++ >> benchmark run, the ZFS mirrors would win big time, taking just 2:47sec >> instead of over 4min to reboot the VM. >> So I think in a real world scenario, the ZFS mirrors will win. >> >> On a sitenote however I noticed that small sequential I/O (copying a >> 150MB sourcetree to NFS), the ZFS mirrors where 50% slower than the >> controllers RAID10. > > I had a hunch that the controllers volume read ahead would interfere with > the ZFS load-shared mirrors and voilà: sequential reads jumped from 270 > MByte/s to 420 MByte/s, which checks out nicely, because writes are about > 200 MByte/s. > >> >>> Bob >> >> - Felix > > - Felix > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cannot receive new filesystem stream: invalid backup stream
Darren J Moffat wrote: On 12/02/2010 09:55, Andrew Gabriel wrote: Can anyone suggest how I can get around the above error when sending/receiving a ZFS filesystem? It seems to fail when about 2/3rds of the data have been passed from send to recv. Is it possible to get more diagnostics out? You could try using /usr/bin/zstreamdump on recent builds. Ah, thanks Darren, didn't know about that. As far as I can see, it runs without a problem right to the end of the stream. Not sure what I should be looking for in the way of errors, but there are no occurrences in the output of any of the error strings revealed by running strings(1) on zstreamdump. Without -v ... zfs send export/h...@20100211 | zstreamdump BEGIN record version = 1 magic = 2f5bacbac creation_time = 4b7499c5 type = 2 flags = 0x0 toguid = 435481b4a8c20fbd fromguid = 0 toname = export/h...@20100211 END checksum = 6797d43d709150c1/e1ec581cbaf0cfa4/7f6d80fa0f23c741/2c2cb821b4a2e639 SUMMARY: Total DRR_BEGIN records = 1 Total DRR_END records = 1 Total DRR_OBJECT records = 90963 Total DRR_FREEOBJECTS records = 23389 Total DRR_WRITE records = 212381 Total DRR_FREE records = 520856 Total records = 847591 Total write size = 17395359232 (0x40cd81e00) Total stream length = 17683854968 (0x41e0a3678) This filesystem has failed in this way for a long time, and I've ignored it thinking something might get fixed in the future, but this hasn't happened yet. It's a home directory which has been in existence and used for about 3 years. One thing is that the pool version (3) and zfs version (1) are old - could that be the problem? The sending system is currently running build 125 and receiving system something approximating to 133, but I've had the same problem with this filesystem for all builds I've used over the last 2 years. -- Andrew ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cannot receive new filesystem stream: invalid backup stream
On 12/02/2010 09:55, Andrew Gabriel wrote: Can anyone suggest how I can get around the above error when sending/receiving a ZFS filesystem? It seems to fail when about 2/3rds of the data have been passed from send to recv. Is it possible to get more diagnostics out? You could try using /usr/bin/zstreamdump on recent builds. This filesystem has failed in this way for a long time, and I've ignored it thinking something might get fixed in the future, but this hasn't happened yet. It's a home directory which has been in existence and used for about 3 years. One thing is that the pool version (3) and zfs version (1) are old - could that be the problem? The sending system is currently running build 125 and receiving system something approximating to 133, but I've had the same problem with this filesystem for all builds I've used over the last 2 years. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] cannot receive new filesystem stream: invalid backup stream
Can anyone suggest how I can get around the above error when sending/receiving a ZFS filesystem? It seems to fail when about 2/3rds of the data have been passed from send to recv. Is it possible to get more diagnostics out? This filesystem has failed in this way for a long time, and I've ignored it thinking something might get fixed in the future, but this hasn't happened yet. It's a home directory which has been in existence and used for about 3 years. One thing is that the pool version (3) and zfs version (1) are old - could that be the problem? The sending system is currently running build 125 and receiving system something approximating to 133, but I've had the same problem with this filesystem for all builds I've used over the last 2 years. -- Cheers Andrew Gabriel ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs import fails even though all disks are online
I did some more digging through forum posts and found this: http://opensolaris.org/jive/thread.jspa?threadID=104654 I ran zdb -l again and saw that even though zpool references /dev/dsk/c4d0 zdb shows no labels at that point... the labels are on /dev/dsk/c4d0s0 However, the log label entries match what zpool states i.e. c12t0d0p0. I have attached the output again. I'm not sure if any of this is pertinent, but I thought it was strange enough to mention. Cheers, Marc. -- This message posted from opensolaris.org LABEL 0 failed to unpack label 0 LABEL 1 failed to unpack label 1 LABEL 2 failed to unpack label 2 LABEL 3 failed to unpack label 3 LABEL 0 version=14 name='zedpool' state=0 txg=1149194 pool_guid=10232199590840258590 hostid=8400371 hostname='vault' top_guid=9120924193762564033 guid=9165505680810820027 vdev_tree type='raidz' id=0 guid=9120924193762564033 nparity=1 metaslab_array=23 metaslab_shift=34 ashift=9 asize=6001127325696 is_log=0 children[0] type='disk' id=0 guid=9165505680810820027 path='/dev/dsk/c4d0s0' devid='id1,c...@ast31500341as=9vs09v48/a' phys_path='/p...@0,0/pci-...@e/i...@0/c...@0,0:a' whole_disk=1 DTL=28 children[1] type='disk' id=1 guid=13787734166139205988 path='/dev/dsk/c5d0s0' devid='id1,c...@ast31500341as=9vs0lv3g/a' phys_path='/p...@0,0/pci-...@e/i...@1/c...@0,0:a' whole_disk=1 DTL=31 children[2] type='disk' id=2 guid=9920804058323031478 path='/dev/dsk/c6d0s0' devid='id1,c...@ast31500341as=9vs21egh/a' phys_path='/p...@0,0/pci-...@f/i...@0/c...@0,0:a' whole_disk=1 DTL=153 children[3] type='disk' id=3 guid=15591554387677897236 path='/dev/dsk/c7d0s0' devid='id1,c...@ast31500341as=9vs21d04/a' phys_path='/p...@0,0/pci-...@f/i...@1/c...@0,0:a' whole_disk=1 DTL=27 LABEL 1 version=14 name='zedpool' state=0 txg=1149194 pool_guid=10232199590840258590 hostid=8400371 hostname='vault' top_guid=9120924193762564033 guid=9165505680810820027 vdev_tree type='raidz' id=0 guid=9120924193762564033 nparity=1 metaslab_array=23 metaslab_shift=34 ashift=9 asize=6001127325696 is_log=0 children[0] type='disk' id=0 guid=9165505680810820027 path='/dev/dsk/c4d0s0' devid='id1,c...@ast31500341as=9vs09v48/a' phys_path='/p...@0,0/pci-...@e/i...@0/c...@0,0:a' whole_disk=1 DTL=28 children[1] type='disk' id=1 guid=13787734166139205988 path='/dev/dsk/c5d0s0' devid='id1,c...@ast31500341as=9vs0lv3g/a' phys_path='/p...@0,0/pci-...@e/i...@1/c...@0,0:a' whole_disk=1 DTL=31 children[2] type='disk' id=2 guid=9920804058323031478 path='/dev/dsk/c6d0s0' devid='id1,c...@ast31500341as=9vs21egh/a' phys_path='/p...@0,0/pci-...@f/i...@0/c...@0,0:a' whole_disk=1 DTL=153 children[3] type='disk' id=3 guid=15591554387677897236 path='/dev/dsk/c7d0s0' devid='id1,c...@ast31500341as=9vs21d04/a' phys_path='/p...@0,0/pci-...@f/i...@1/c...@0,0:a' whole_disk=1 DTL=27 LABEL 2 version=14 name='zedpool' state=0 txg=1149194 pool_guid=10232199590840258590 hostid=8400371 hostname='vault' top_guid=91209241937625