Re: [zfs-discuss] How many disk in one pool

2012-10-08 Thread Brad Stone
Here's an example of a ZFS-based product you can buy with a large
number of disks in the volume:

http://www.aberdeeninc.com/abcatg/petarack.htm
360 3T drives
A full petabyte of storage (1080TB) in a single rack, under a single
namespace or volume


On Sat, Oct 6, 2012 at 11:48 AM, Richard Elling
 wrote:
> On Oct 5, 2012, at 1:57 PM, Albert Shih  wrote:
>
>> Hi all,
>>
>> I'm actually running ZFS under FreeBSD. I've a question about how many
>> disks I «can» have in one pool.
>>
>> At this moment I'm running with one server (FreeBSD 9.0) with 4 MD1200
>> (Dell) meaning 48 disks. I've configure with 4 raidz2 in the pool (one on
>> each MD1200)
>>
>> On what I understand I can add more more MD1200. But if I loose one MD1200
>> for any reason I lost the entire pool.
>>
>> In your experience what's the «limit» ? 100 disk ?
>
> I can't speak for current FreeBSD, but I've seen more than 400
> disks (HDDs) in a single pool.
>
>  -- richard
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving L1ARC cache efficiency with dedup

2011-12-29 Thread Brad Diggs
Reducing the record size would negatively impact performance.  For rational why, see thesection titled "Match Average I/O Block Sizes" in my blog post on filesystem caching:http://www.thezonemanager.com/2009/03/filesystem-cache-optimization.htmlBrad
Brad Diggs | Principal Sales Consultant | 972.814.3698eMail: brad.di...@oracle.comTech Blog: http://TheZoneManager.comLinkedIn: http://www.linkedin.com/in/braddiggs

On Dec 29, 2011, at 8:08 AM, Robert Milkowski wrote: Try reducing recordsize to 8K or even less *before* you put any data.This can potentially improve your dedup ratio and keep it higher after you start modifying data.  From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Brad DiggsSent: 28 December 2011 21:15To: zfs-discuss discussion listSubject: Re: [zfs-discuss] Improving L1ARC cache efficiency with dedup As promised, here are the findings from my testing.  I created 6 directory server instances where the firstinstance has roughly 8.5GB of data.  Then I initialized the remaining 5 instances from a binary backup ofthe first instance.  Then, I rebooted the server to start off with an empty ZFS cache.  The following tableshows the increased L1ARC size, increased search rate performance, and increase CPU% busy witheach starting and applying load to each successive directory server instance.  The L1ARC cache grewa little bit with each additional instance but largely stayed the same size.  Likewise, the ZFS dedup ratioremained the same because no data on the directory server instances was changing. However, once I started modifying the data of the replicated directory server topology, the caching efficiency quickly diminished.  The following table shows that the delta for each instance increased by roughly 2GB after only 300k of changes. I suspect the divergence in data as seen by ZFS deduplication most likely occurs because reduplication occurs at the block level rather than at the byte level.  When a write is sent to one directory server instance, the exact same write is propagated to the other 5 instances and therefore should be considered a duplicate.  However this was not the case.  There could be other reasons for the divergence as well. The two key takeaways from this exercise were as follows.  There is tremendous caching potentialthrough the use of ZFS deduplication.  However, the current block level deduplication does not benefit directory as much as it perhaps could if deduplication occurred at the byte level rather thanthe block level.  It very could be that even byte level deduplication doesn't work as well either.  Until that option is available, we won't know for sure. Regards, BradBrad Diggs | Principal Sales ConsultantTech Blog: http://TheZoneManager.comLinkedIn: http://www.linkedin.com/in/braddiggs On Dec 12, 2011, at 10:05 AM, Brad Diggs wrote:Thanks everyone for your input on this thread.  It sounds like there is sufficient weightbehind the affirmative that I will include this methodology into my performance analysistest plan.  If the performance goes well, I will share some of the results when we concludein January/February timeframe. Regarding the great dd use case provided earlier in this thread, the L1 and L2 ARC detect and prevent streaming reads such as from dd from populating the cache.  Seemy previous blog post at the web site link below for a way around this protectivecaching control of ZFS. http://www.thezonemanager.com/2010/02/directory-data-priming-strategies.html Thanks again! BradBrad Diggs | Principal Sales ConsultantTech Blog: http://TheZoneManager.comLinkedIn: http://www.linkedin.com/in/braddiggs On Dec 8, 2011, at 4:22 PM, Mark Musante wrote:You can see the original ARC case here:http://arc.opensolaris.org/caselog/PSARC/2009/557/20091013_lori.altOn 8 Dec 2011, at 16:41, Ian Collins wrote:On 12/ 9/11 12:39 AM, Darren J Moffat wrote:On 12/07/11 20:48, Mertol Ozyoney wrote:Unfortunetly the answer is no. Neither l1 nor l2 cache is dedup aware. The only vendor i know that can do this is Netapp In fact , most of our functions, like replication is not dedup aware.For example, thecnicaly it's possible to optimize our replication thatit does not send daya chunks if a data chunk with the same chechsumexists in target, without enabling dedup on target and source.We already do that with 'zfs send -D':  -D  Perform dedup processing on the stream. Deduplicated streams  cannot  be  received on systems that do not support the stream deduplication feature.Is there any more published information on how this feature works? --Ian. ___zfs-discuss mailing listzfs-discuss@opensolaris.orghttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss___zfs-discuss mailing listzfs-discuss@opensolaris.orghttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___

Re: [zfs-discuss] Improving L1ARC cache efficiency with dedup

2011-12-29 Thread Brad Diggs
S11 FCSBrad
Brad Diggs | Principal Sales Consultant | 972.814.3698eMail: brad.di...@oracle.comTech Blog: http://TheZoneManager.comLinkedIn: http://www.linkedin.com/in/braddiggs

On Dec 29, 2011, at 8:11 AM, Robert Milkowski wrote: And these results are from S11 FCS I assume.On older builds or Illumos based distros I would expect L1 arc to grow much bigger.  From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Brad DiggsSent: 28 December 2011 21:15To: zfs-discuss discussion listSubject: Re: [zfs-discuss] Improving L1ARC cache efficiency with dedup As promised, here are the findings from my testing.  I created 6 directory server instances where the firstinstance has roughly 8.5GB of data.  Then I initialized the remaining 5 instances from a binary backup ofthe first instance.  Then, I rebooted the server to start off with an empty ZFS cache.  The following tableshows the increased L1ARC size, increased search rate performance, and increase CPU% busy witheach starting and applying load to each successive directory server instance.  The L1ARC cache grewa little bit with each additional instance but largely stayed the same size.  Likewise, the ZFS dedup ratioremained the same because no data on the directory server instances was changing. However, once I started modifying the data of the replicated directory server topology, the caching efficiency quickly diminished.  The following table shows that the delta for each instance increased by roughly 2GB after only 300k of changes. I suspect the divergence in data as seen by ZFS deduplication most likely occurs because reduplication occurs at the block level rather than at the byte level.  When a write is sent to one directory server instance, the exact same write is propagated to the other 5 instances and therefore should be considered a duplicate.  However this was not the case.  There could be other reasons for the divergence as well. The two key takeaways from this exercise were as follows.  There is tremendous caching potentialthrough the use of ZFS deduplication.  However, the current block level deduplication does not benefit directory as much as it perhaps could if deduplication occurred at the byte level rather thanthe block level.  It very could be that even byte level deduplication doesn't work as well either.  Until that option is available, we won't know for sure. Regards, BradBrad Diggs | Principal Sales ConsultantTech Blog: http://TheZoneManager.comLinkedIn: http://www.linkedin.com/in/braddiggs On Dec 12, 2011, at 10:05 AM, Brad Diggs wrote:Thanks everyone for your input on this thread.  It sounds like there is sufficient weightbehind the affirmative that I will include this methodology into my performance analysistest plan.  If the performance goes well, I will share some of the results when we concludein January/February timeframe. Regarding the great dd use case provided earlier in this thread, the L1 and L2 ARC detect and prevent streaming reads such as from dd from populating the cache.  Seemy previous blog post at the web site link below for a way around this protectivecaching control of ZFS. http://www.thezonemanager.com/2010/02/directory-data-priming-strategies.html Thanks again! BradBrad Diggs | Principal Sales ConsultantTech Blog: http://TheZoneManager.comLinkedIn: http://www.linkedin.com/in/braddiggs On Dec 8, 2011, at 4:22 PM, Mark Musante wrote:You can see the original ARC case here:http://arc.opensolaris.org/caselog/PSARC/2009/557/20091013_lori.altOn 8 Dec 2011, at 16:41, Ian Collins wrote:On 12/ 9/11 12:39 AM, Darren J Moffat wrote:On 12/07/11 20:48, Mertol Ozyoney wrote:Unfortunetly the answer is no. Neither l1 nor l2 cache is dedup aware. The only vendor i know that can do this is Netapp In fact , most of our functions, like replication is not dedup aware.For example, thecnicaly it's possible to optimize our replication thatit does not send daya chunks if a data chunk with the same chechsumexists in target, without enabling dedup on target and source.We already do that with 'zfs send -D':  -D  Perform dedup processing on the stream. Deduplicated streams  cannot  be  received on systems that do not support the stream deduplication feature.Is there any more published information on how this feature works? --Ian. ___zfs-discuss mailing listzfs-discuss@opensolaris.orghttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss___zfs-discuss mailing listzfs-discuss@opensolaris.orghttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___zfs-discuss mailing listzfs-discuss@opensolaris.orghttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving L1ARC cache efficiency with dedup

2011-12-29 Thread Brad Diggs
Jim,You are spot on.  I was hoping that the writes would be close enough to identical thatthere would be a high ratio of duplicate data since I use the same record size, page size,compression algorithm, … etc.  However, that was not the case.  The main thing that Iwanted to prove though was that if the data was the same the L1 ARC only caches thedata that was actually written to storage.  That is a really cool thing!  I am sure there willbe future study on this topic as it applies to other scenarios.With regards to directory engineering investing any energy into optimizing ODSEE DS to more effectively leverage this caching potential, that won't happen.  OUD far outperforms ODSEE.  That said OUD may get some focus in this area.  However, time willtell on that one.For now, I hope everyone benefits from the little that I did validate.Have a great day!Brad
Brad Diggs | Principal Sales ConsultantTech Blog: http://TheZoneManager.comLinkedIn: http://www.linkedin.com/in/braddiggs


On Dec 29, 2011, at 4:45 AM, Jim Klimov wrote:Thanks for running and publishing the tests :)A comment on your testing technique follows, though.2011-12-29 1:14, Brad Diggs wrote:As promised, here are the findings from my testing. I created 6directory server instances ...However, once I started modifying the data of the replicated directoryserver topology, the caching efficiencyquickly diminished. The following table shows that the delta for eachinstance increased by roughly 2GBafter only 300k of changes.I suspect the divergence in data as seen by ZFS deduplication mostlikely occurs because reduplicationoccurs at the block level rather than at the byte level. When a write issent to one directory server instance,the exact same write is propagated to the other 5 instances andtherefore should be considered a duplicate.However this was not the case. There could be other reasons for thedivergence as well.Hello, Brad,If you tested with Sun DSEE (and I have no reason tobelieve other descendants of iPlanet Directory serverwould work differently under the hood), then there aretwo factors hindering your block-dedup gains:1) The data is stored in the backend BerkeleyDB binaryfile. In Sun DSEE7 and/or in ZFS this could also becompressed data. Since for ZFS you dedup unique blocks,including same data at same offsets, it is quite unlikelyyou'd get the same data often enough. For example, eachdatabase might position same userdata blocks at differentoffsets due to garbage collection or whatever otheroptimisation the DB might think of, making on-diskblocks different and undedupable.You might look if it is possible to tune the databaseto write in sector-sized -> min.block-sized (512b/4096b)records and consistently use the same DSEE compression(or lack thereof) - in this case you might get more sameblocks and win with dedup. But you'll likely lose withcompression, especially of the empty sparse structurewhich a database initially is.2) During replication each database actually becomesunique. There are hidden records with "ns" prefix whichmark when the record was created and replicated, whoinitiated it, etc. Timestamps in the data alreadywarrant uniqueness ;)This might be an RFE for the DSEE team though - to keepsuch volatile metadata separately from userdata. Thenyour DS instances would more likely dedup well afterreplication, and unique metadata would be storedseparately and stay unique. You might even keep it ina different dataset with no dedup, then... :)---So, at the moment, this expectation does not hold true:  "When a write is sent to one directory server instance,  the exact same write is propagated to the other five  instances and therefore should be considered a duplicate."These writes are not exact.HTH,//Jim Klimov___zfs-discuss mailing listzfs-discuss@opensolaris.orghttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving L1ARC cache efficiency with dedup

2011-12-12 Thread Brad Diggs
Thanks everyone for your input on this thread.  It sounds like there is sufficient weightbehind the affirmative that I will include this methodology into my performance analysistest plan.  If the performance goes well, I will share some of the results when we concludein January/February timeframe.Regarding the great dd use case provided earlier in this thread, the L1 and L2 ARC detect and prevent streaming reads such as from dd from populating the cache.  Seemy previous blog post at the web site link below for a way around this protectivecaching control of ZFS.http://www.thezonemanager.com/2010/02/directory-data-priming-strategies.htmlThanks again!Brad
Brad Diggs | Principal Sales ConsultantTech Blog: http://TheZoneManager.comLinkedIn: http://www.linkedin.com/in/braddiggs


On Dec 8, 2011, at 4:22 PM, Mark Musante wrote:You can see the original ARC case here:http://arc.opensolaris.org/caselog/PSARC/2009/557/20091013_lori.altOn 8 Dec 2011, at 16:41, Ian Collins wrote:On 12/ 9/11 12:39 AM, Darren J Moffat wrote:On 12/07/11 20:48, Mertol Ozyoney wrote:Unfortunetly the answer is no. Neither l1 nor l2 cache is dedup aware.The only vendor i know that can do this is NetappIn fact , most of our functions, like replication is not dedup aware.For example, thecnicaly it's possible to optimize our replication thatit does not send daya chunks if a data chunk with the same chechsumexists in target, without enabling dedup on target and source.We already do that with 'zfs send -D':  -D  Perform dedup processing on the stream. Deduplicated  streams  cannot  be  received on systems that do not  support the stream deduplication feature.Is there any more published information on how this feature works?-- Ian.___zfs-discuss mailing listzfs-discuss@opensolaris.orghttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss___zfs-discuss mailing listzfs-discuss@opensolaris.orghttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Improving L1ARC cache efficiency with dedup

2011-12-07 Thread Brad Diggs
Hello,I have a hypothetical question regarding ZFS reduplication.  Does the L1ARC cache benefit from reduplicationin the sense that the L1ARC will only need to cache one copy of the reduplicated data versus many copies?  Here is an example:Imagine that I have a server with 2TB of RAM and a PB of disk storage.  On this server I create a single 1TB data file that is full of unique data.  Then I make 9 copies of that file giving each file a unique name and location within the same ZFS zpool.  If I start up 10 application instances where each application reads all of its own unique copy of the data, will the L1ARC contain only the deduplicated data or will it cache separate copies the data from each file?  In simpler terms, will the L1ARC require 10TB of RAM or just 1TB of RAM to cache all 10 1TB files worth of data?My hope is that since the data only physically occupies 1TB of storage via deduplication that the L1ARCwill also only require 1TB of RAM for the data.Note that I know the deduplication table will use the L1ARC as well.  However, the focus of my questionis on how the L1ARC would benefit from a data caching standpoint.Thanks in advance!Brad
Brad Diggs | Principal Sales ConsultantTech Blog: http://TheZoneManager.comLinkedIn: http://www.linkedin.com/in/braddiggs


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OpenIndiana | ZFS | scrub | network | awful slow

2011-06-15 Thread Brad Stone
3G per TB would be a better ballpark estimate.

On Wed, Jun 15, 2011 at 8:17 PM, Daniel Carosone  wrote:
> On Wed, Jun 15, 2011 at 07:19:05PM +0200, Roy Sigurd Karlsbakk wrote:
>>
>> Dedup is known to require a LOT of memory and/or L2ARC, and 24GB isn't 
>> really much with 34TBs of data.
>
> The fact that your second system lacks the l2arc cache device is absolutely 
> your prime suspect.
>
> --
> Dan.
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disk space size, used, available mismatch

2011-05-12 Thread Brad Kroeger
Thank you for your insight.  This is a system that was handed down to me when 
another sysadmin went to greener pastures.  There were no quotas set on the 
system.  I used zfs destroy to free up some space and did put a quota on it.  I 
still have 0 freespace available.  I think this is due to the quota limit.  
Before I rebooted I had about a 68GB bootpool.  After the zfs destroy I had 
about 1.7GB free.  I put a 66.5 GB quota on it which I am hitting so services 
will not start up.

I don't want to saw off the tree branch I am sitting on so I am reluctant to 
increase the quota too much.  Here are some questions I have:

1) zfs destroy did free up a snapshot but it is still showing up in lustatus.  
How to I correct this?

2) This system is installed with everything under / so the ETL team can fill up 
root with out bounds.  What are the best practices for separating filesystems 
in ZFS so I can bound the ETL team with out affecting the OS?

3) I have captured all the critical data on to SAN disk and am thinking about 
jumpstarting the host cleanly.  That way I will have a known baseline to start 
with.  Does anyone have any suggestions here?  

4) We deal with very large data sets.  These usually exist just in Oracle but 
this host is for ETL and Informatica processing.  What would be a good quota to 
set so I have a back door onto the system to take care of problems.

Thanks for your feedback.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A few questions

2011-01-09 Thread Brad Stone
> As for certified systems, It's my understanding that Nexenta themselves don't 
> "certify" anything.  They have systems which are recommended and supported by 
> their network of VAR's.

The certified solutions listed on Nexenta's website were certified by Nexenta.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Administation Concole

2010-11-13 Thread Brad Henderson
I am new to OpenSolaris and I have been reading about and seeing screenshots of 
the ZFS Administration Console. I have been looking at the dates on it and 
every post is from about two years ago. I am just wondering is this option not 
available on OpenSolaris anymore and if it is how do I set it up and use it.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup relationship between pool and filesystem

2010-09-24 Thread Brad Stone
For de-duplication to perform well you need to be able to fit the de-dup table 
in memory. Is a good rule-of-thumb for needed RAM  Size=(pool capacity/avg 
block size)*270 bytes? Or perhaps it's Size/expected_dedup_ratio?

And if you limit de-dup to certain datasets in the pool, how would this
calculation change?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs compression with Oracle - anyone implemented?

2010-09-15 Thread Brad
Ed,

See my answers inline:

"I don't think your question is clear. What do you mean "on oracle backed by
storage luns?""

We'll be using luns from a storage array vs ZFS controller disks.  The luns are 
mapped the db server and from there initialize under ZFS.

" Do you mean "on oracle hardware?" " 
On Sun/Oracle x86 hardware 

"Do you mean you plan to run oracle database on the server, with ZFS under
the database? "

Yes
Generally speaking, you can enable compression on any zfs filesystem, and
the cpu overhead is not very big, and the compression level is not very
strong by default. However, if the data you have is generally
uncompressible, any overhead is a waste.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS on solid state as disk rather than L2ARC...

2010-09-15 Thread Brad Diggs
Has anyone done much testing of just using the solid state devices (F20 or F5100) as devices for ZFS pools?  Are there any concerns with running in this mode versus usingsolid state devices for L2ARC cache?Second, has anyone done this sort of testing with MLC based solid state drives?What has your experience been?Thanks in advance!BradBrad Diggs | Principal Security Sales ConsultantOracle North America Technology Organization16000 Dallas Parkway, Dallas, TX 75248Tech Blog: http://TheZoneManager.comLinkedIn: http://www.linkedin.com/in/braddiggs ___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs compression with Oracle - anyone implemented?

2010-09-13 Thread Brad
Hi!  I'd been scouring the forums and web for admins/users who deployed zfs 
with compression enabled on Oracle backed by storage array luns.
Any problems with cpu/memory overhead?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup - Does "on" imply "sha256?"

2010-08-24 Thread Brad Stone
Correct, but presumably "for a limited time only". I would think that over time
as the technology improves that the default would change.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] OpenStorage Summit

2010-08-21 Thread Brad Stone
Just wanted to make a quick announcement that there will be an OpenStorage 
Summit in Palo Alto, CA in late October. The conference should have a lot of
good OpenSolaris talks, with ZFS experts such as Bill Moore, Adam Levanthal, 
and Ben Rockwood already planning to give presentations. The conference is open 
to other storage solutions, and we also expect participation from FreeNAS, 
OpenFiler, and Lustre for example. There will be presentations on SSDs, ZFS 
basics, performance tuning, etc.

The agenda is still being formed, as we are hoping to get more presentation 
proposals from the community. To submit a proposal, send an email to 
summit2...@nexenta.com.

For additional details or to take advantage of early bird registration, go to 
http://nexenta-summit2010.eventbrite.com.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Possible to save custom properties on a zfs file system?

2010-08-02 Thread Brad Stone
Peter -
Here is an example, where the company myco wants to add a property "myprop"
to a file system "myfs" contained
within the pool "mypool".

zfs set myco:myprop=11 mypool/myfs

On Mon, Aug 2, 2010 at 1:45 PM, Peter Taps  wrote:

> Folks,
>
> I need to store some application-specific settings for a ZFS filesystem. Is
> it possible to extend a ZFS filesystem and add additional properties?
>
> Thank you in advance for your help.
>
> Regards,
> Peter
> --
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] hybrid drive: flash and platters

2010-05-25 Thread Brad Diggs
Hello,As an avid fan of the application to flash technologies to the storage stratum, I researched theDMCache project (maintained here).  It appears that the DmCache project is quite a bit behindL2ARC but headed in the right direction.I found the lwn article very interesting as it is effectively a Linux application of L2ARC to improve MySQL performance.  I had proposed the same idea in my blog post titledFilesystem Cache Optimization Strategies.The net there is that if you can cache the data in the filesystem cache, you can improve overallperformance by reducing the I/O to disk.  I had hoped to have someone do some benchmarkingof MySQL in a cache optimized server with F20 PCIe flash cards but never got around to it.So, if you want to get all of the caching benefits of DmCache, just run your app on Solaris 10 today. ;-)Have a great day!Brad Brad Diggs | Principal Security Sales Consultant | +1.972.814.3698Oracle North America Technology Organization16000 Dallas Parkway, Dallas, TX 75248eMail: brad.di...@oracle.comTech Blog: http://TheZoneManager.comLinkedIn: http://www.linkedin.com/in/braddiggs On May 21, 2010, at 8:00 PM, David Magda wrote:Seagate is planning on releasing a disk that's part spinning rust and part flash:	http://www.theregister.co.uk/2010/05/21/seagate_momentus_xt/The design will have the flash be transparent to the operating system, but I wish they would have some way to access the two components separately. ZFS could certainly make use of it, and Linux is also working on a capability:	http://kernelnewbies.org/KernelProjects/DmCache	http://lwn.net/Articles/385442/___zfs-discuss mailing listzfs-discuss@opensolaris.orghttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replaced disk...copy back completed but spare is in use

2010-05-04 Thread Brad
Thanks!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] replaced disk...copy back completed but spare is in use

2010-05-04 Thread Brad
I yanked a disk to simulate failure to the test pool to test hot spare failover 
- everything seemed fine until the copy back completed.  The hot spare is still 
showing in used...do we need to remove the spare from the pool to get it to 
deattach?


# zpool status
  pool: ZPOOL.TEST
 state: ONLINE
 scrub: resilver completed after 7h55m with 0 errors on Tue May  4 16:33:33 2010
config:

NAME STATE READ WRITE CKSUM
ZPOOL.TEST   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c10t5000C5001A3B6695d0ONLINE   0 0 0
c10t5000C5001A3CED7Fd0ONLINE   0 0 0
c10t5000C5001A5A45C1d0ONLINE   0 0 0
  mirror ONLINE   0 0 0
c10t5000C5001A6B2300d0ONLINE   0 0 0
c10t5000C5001A6BC6C6d0ONLINE   0 0 0
c10t5000C5001A6C3439d0ONLINE   0 0 0
  mirror ONLINE   0 0 0
c10t5000C5001A6F177Bd0ONLINE   0 0 0
c10t5000C5001A6FDB0Bd0ONLINE   0 0 0
c10t5000C5001A6FFF86d0ONLINE   0 0 0
  mirror ONLINE   0 0 0
c10t5000C5001A39D7BEd0ONLINE   0 0 0
c10t5000C5001A60BED0d0ONLINE   0 0 0
c10t5000C5001A70D8AAd0ONLINE   0 0 0
  mirror ONLINE   0 0 0
c10t5000C5001A70D9B0d0ONLINE   0 0 0
c10t5000C5001A70D89Ed0ONLINE   0 0 0
c10t5000C5001A70D719d0ONLINE   0 0 0
  mirror ONLINE   0 0 0
c10t5000C5001A700E07d0ONLINE   0 0 0
c10t5000C5001A701A12d0ONLINE   0 0 0
c10t5000C5001A701CD0d0ONLINE   0 0 0
  mirror ONLINE   0 0 0
c10t5000C5001A702c10Ed0ONLINE   0 0 0
c10t5000C5001A702C8Ed0ONLINE   0 0 0
c10t5000C5001A703D23d0ONLINE   0 0 0
  mirror ONLINE   0 0 0
c10t5000C5001A703FADd0ONLINE   0 0 0
c10t5000C5001A707D86d0ONLINE   0 0 0
c10t5000C5001A707EDCd0ONLINE   0 0 0
  mirror ONLINE   0 0 0
c10t5000C5001A7013D4d0ONLINE   0 0 0
c10t5000C5001A7013E6d0ONLINE   0 0 0
c10t5000C5001A7013FDd0ONLINE   0 0 0
  mirror ONLINE   0 0 0
c10t5000C5001A7021ADd0ONLINE   0 0 0
c10t5000C5001A7028B6d0ONLINE   0 0 0
c10t5000C5001A7029A2d0ONLINE   0 0 0
  mirror ONLINE   0 0 0
c10t5000C5001A7036F4d0ONLINE   0 0 0
c10t5000C5001A7053ADd0ONLINE   0 0 0
spareONLINE   6.05M 0 0
  c10t5000C5001A7069CAd0  ONLINE   0 0 0  171G 
resilvered
  c10t5000C5001A703651d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c10t5000C5001A70104Dd0ONLINE   0 0 0
c10t5000C5001A70126Fd0ONLINE   0 0 0
c10t5000C5001A70183Cd0ONLINE   0 0 0
  mirror ONLINE   0 0 0
c10t5000C5001A70296Cd0ONLINE   0 0 0
c10t5000C5001A70395Ed0ONLINE   0 0 0
c10t5000C5001A70587Dd0ONLINE   0 0 0
  mirror ONLINE   0 0 0
c10t5000C5001A70704Ad0ONLINE   0 0 0
c10t5000C5001A70830Ed0ONLINE   0 0 0
c10t5000C5001A701563d0ONLINE   0 0 0
  mirror ONLINE   0 0 0
c10t5000C5001A702542d0ONLINE   0 0 0
c10t5000C5001A702625d0ONLINE   0 0 0
c10t5000C5001A703374d0ONLINE   0 0 0
logs
  mirror ONLINE   0 0 0
c1t3d0   ONLINE   0 0 0
c1t4d0   ONLINE   0 0 0
cache
  c1t1d0 ONLINE   0 0 0
  c1t2d0 ONLINE   0 0 0
spares
  c10t5000C5001A703651d0  INUSE currently in use
  c10t5000C50

Re: [zfs-discuss] Solaris 10 default caching segmap/vpm size

2010-04-27 Thread Brad
The reason I asked was just to understand how those attributes play with 
ufs/vxfs...
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Solaris 10 default caching segmap/vpm size

2010-04-27 Thread Brad
Whats the default size of the file system cache for Solaris 10 x86 and can it 
be tuned?
I read various posts on the subject and its confusing..
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] not showing data in L2ARC or ZIL

2010-04-24 Thread Brad
thanks - :)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] not showing data in L2ARC or ZIL

2010-04-24 Thread Brad
Hmm so that means read requests are hitting/fulfilled by the arc cache?

Am I correct in assuming that because the ARC cache is fulfilling read 
requests, the zpool and l2arc is barely touched?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] not showing data in L2ARC or ZIL

2010-04-24 Thread Brad
I'm not showing any data being populated in the L2ARC or ZIL SSDs with a J4500 
(48 - 500GB SATA drives).


# zpool iostat -v
  capacity operationsbandwidth
poolused  avail   read  write   read  write
-  -  -  -  -  -  -
POOL   2.71T  4.08T 35492  1.06M  5.67M
  mirror185G   279G  2 30  72.5K   327K
c10t5000C5001A3B6695d0  -  -  0  4  24.5K   327K
c10t5000C5001A3CED7Fd0  -  -  0  4  24.5K   327K
c10t5000C5001A5A45C1d0  -  -  0  5  24.5K   327K
  mirror185G   279G  2 30  72.8K   327K
c10t5000C5001A6B2300d0  -  -  0  5  24.6K   327K
c10t5000C5001A6BC6C6d0  -  -  0  5  24.6K   327K
c10t5000C5001A6C3439d0  -  -  0  5  24.6K   327K
  mirror185G   279G  2 30  72.6K   327K
c10t5000C5001A6F177Bd0  -  -  0  4  24.4K   327K
c10t5000C5001A6FDB0Bd0  -  -  0  4  24.7K   327K
c10t5000C5001A6FFF86d0  -  -  0  5  24.5K   327K
  mirror185G   279G  2 30  72.4K   327K
c10t5000C5001A39D7BEd0  -  -  0  4  24.6K   327K
c10t5000C5001A60BED0d0  -  -  0  4  24.6K   327K
c10t5000C5001A70D8AAd0  -  -  0  4  24.4K   327K
  mirror185G   279G  2 30  72.5K   327K
c10t5000C5001A70D9B0d0  -  -  0  5  24.6K   327K
c10t5000C5001A70D89Ed0  -  -  0  5  24.6K   327K
c10t5000C5001A70D719d0  -  -  0  5  24.5K   327K
  mirror185G   279G  2 30  72.5K   327K
c10t5000C5001A700E07d0  -  -  0  4  24.7K   327K
c10t5000C5001A701A12d0  -  -  0  5  24.5K   327K
c10t5000C5001A701CD0d0  -  -  0  5  24.4K   327K
  mirror185G   279G  2 30  72.4K   327K
c10t5000C5001A702c10Ed0  -  -  0  4  24.4K   327K
c10t5000C5001A702C8Ed0  -  -  0  4  24.5K   327K
c10t5000C5001A703D23d0  -  -  0  4  24.6K   327K
  mirror185G   279G  2 30  72.4K   327K
c10t5000C5001A703FADd0  -  -  0  4  24.4K   327K
c10t5000C5001A707D86d0  -  -  0  4  24.5K   327K
c10t5000C5001A707EDCd0  -  -  0  4  24.5K   327K
  mirror185G   279G  2 30  72.7K   327K
c10t5000C5001A7013D4d0  -  -  0  4  24.5K   327K
c10t5000C5001A7013E6d0  -  -  0  4  24.6K   327K
c10t5000C5001A7013FDd0  -  -  0  4  24.5K   327K
  mirror185G   279G  2 30  72.6K   327K
c10t5000C5001A7021ADd0  -  -  0  4  24.6K   327K
c10t5000C5001A7028B6d0  -  -  0  4  24.5K   327K
c10t5000C5001A7029A2d0  -  -  0  4  24.5K   327K
  mirror185G   279G  2 30  72.6K   327K
c10t5000C5001A7036F4d0  -  -  0  4  24.5K   327K
c10t5000C5001A7053ADd0  -  -  0  5  24.5K   327K
c10t5000C5001A7069CAd0  -  -  0  5  24.6K   327K
  mirror185G   279G  2 30  72.5K   327K
c10t5000C5001A70104Dd0  -  -  0  4  24.6K   327K
c10t5000C5001A70126Fd0  -  -  0  4  24.5K   327K
c10t5000C5001A70183Cd0  -  -  0  5  24.5K   327K
  mirror185G   279G  2 30  72.7K   327K
c10t5000C5001A70296Cd0  -  -  0  4  24.6K   327K
c10t5000C5001A70395Ed0  -  -  0  5  24.5K   327K
c10t5000C5001A70587Dd0  -  -  0  5  24.7K   327K
  mirror186G   278G  2 30  72.2K   327K
c10t5000C5001A70704Ad0  -  -  0  4  24.4K   327K
c10t5000C5001A70830Ed0  -  -  0  4  24.5K   327K
c10t5000C5001A701563d0  -  -  0  5  24.3K   327K
  mirror185G   279G  2 30  72.2K   327K
c10t5000C5001A702542d0  -  -  0  4  24.5K   327K
c10t5000C5001A702625d0  -  -  0  4  24.4K   327K
c10t5000C5001A703374d0  -  -  0  4  24.4K   327K
  mirror236K  29.5G  0 37  0   909K
c1t3d0 -  -  0 37  0   909K
c1t4d0 -  -  0 37  0   909K
cache  -  -  -  -  -  -
  c1t1d0   29.7G 8M  6 21   175K  1.13M
  c1t2d0   29.7G 8M  6 21   175K  1.13M
-  -  -  -  -  -  -

Re: [zfs-discuss] zpool import -F hangs system

2010-04-21 Thread Brad Stone
What build are you on?
zpool import hangs for me on b134.

On Wed, Apr 21, 2010 at 9:21 AM, John Balestrini wrote:

> Howdy All,
>
> I have a raidz pool that hangs the system when importing. I attempted a
> pfexec zpool import -F pool1 (which has been importing for two days with no
> result), but doesn't seem to get anywhere and makes the system mostly
> non-responsive -- existing logins continue to work, new logins never
> complete and running any zpool or zfs commands will hang the session. Zdb
> commands seem function ok. Apparently the pool has some corruption that
> causes havoc. I'd like to attempt to roll back to an older txg, but the
> descriptions of how to do it only detail it when working with a single vdev
> -- this one has three.
>
> Any ideas, pointers or help would be greatly appreciated.
>
> Thanks,
>
> -- John
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] mpxio load-balancing...it doesn't work??

2010-04-05 Thread Brad
I'm wondering if the author is talking about "cache mirroring" where the cache 
is mirrored between both controllers.  If that is the case, is he saying that 
for every write to the active controlle,r a second write issued on the passive 
controller to keep the cache mirrored?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] mpxio load-balancing...it doesn't work??

2010-04-04 Thread Brad
I had always thought that with mpxio, it load-balances IO request across your 
storage ports but this article 
http://christianbilien.wordpress.com/2007/03/23/storage-array-bottlenecks/ has 
got me thinking its not true.

"The available bandwidth is 2 or 4Gb/s (200 or 400MB/s – FC frames are 10 bytes 
long -) per port. As load balancing software (Powerpath, MPXIO, DMP, etc.) are 
most of the times used both for redundancy and load balancing, I/Os coming from 
a host can take advantage of an aggregated bandwidth of two ports. However, 
reads can use only one path, but writes are duplicated, i.e. a host write ends 
up as one write on each host port. "

Is this true?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] j4500 cache flush

2010-03-05 Thread Brad
Marion - Do you happen to know which SAS hba it applys to?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] j4500 cache flush

2010-03-04 Thread Brad
Since the j4500 doesn't have a internal SAS controller, would it be safe to say 
that ZFS cache flushes would be handled by the host's SAS hba?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] naming zfs disks

2010-02-17 Thread Brad
Is there anyway to assign a unique name or id to a disk part of a zpool?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Oracle Performance - ZFS vs UFS

2010-02-13 Thread Brad
Don't use raidz for the raid type - go with a striped set
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Instructions for ignoring ZFS write cache flushing on intelligent arrays

2010-01-27 Thread Brad
We're running 10/09 on the dev box but 11/06 is prodqa.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Instructions for ignoring ZFS write cache flushing on intelligent arrays

2010-01-27 Thread Brad
Cindy,

It does not list our SAN (LSI/STK/NetApp)...I'm confused about disabling cache 
from the wiki entries.

Should we disable it by turning off zfs cache syncs via "echo 
zfs_nocacheflush/W0t1 | mdb -kw " or specify it by storage device via the 
sd.conf method where the array ignores cache flushes from zfs?

Brad
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] compression ratio

2010-01-26 Thread Brad
With the default compression scheme (LZJB ), how does one calculate the ratio 
or amount compressed ahead of time when allocating storage?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Instructions for ignoring ZFS write cache flushing on intelligent arrays

2010-01-25 Thread Brad
Hi!  So after reading through this thread and checking the bug report...do we 
still need to tell zfs to disable cache flush?

set zfs:zfs_nocacheflush=1
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500...need input and clarity on striped/mirrored configuration

2010-01-21 Thread Brad
Did you buy the SSDs directly from Sun?  I've heard there could possibly be 
firmware that's vendor specific for the X25-E.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500...need input and clarity on striped/mirrored configuration

2010-01-20 Thread Brad
I was reading your old posts about load-shares 
http://opensolaris.org/jive/thread.jspa?messageID=294580񇺴 .

So between raidz and load-share "striping", raidz stripes a file system block 
evenly across each vdev but with load sharing the file system block is written 
on a vdev that's not filled up (slab??) then for the next file system block it 
continues filling up the 1MB slab until its full being moving on to the next 
one?

Richard can you comment? :)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500...need input and clarity on striped/mirrored configuration

2010-01-20 Thread Brad
"Zfs does not do striping across vdevs, but its load share approach
will write based on (roughly) a round-robin basis, but will also
prefer a less loaded vdev when under a heavy write load, or will
prefer to write to an empty vdev rather than write to an almost full
one."

I'm trying to visualize this...can you elaborate or give a ascii example?

So with the syntax below, load sharing is implemented?

zpool create testpool disk1 disk2 disk3
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500...need input and clarity on striped/mirrored configuration

2010-01-20 Thread Brad
@hortnon - ASM is not within the scope of this project.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] x4500...need input and clarity on striped/mirrored configuration

2010-01-20 Thread Brad
Can anyone recommend a optimum and redundant striped configuration for a X4500? 
 We'll be using it for a OLTP (Oracle) database and will need best performance. 
 Is it also true that the reads will be load-balanced across the mirrors?

Is this considered a raid 1+0 configuration?  
zpool create -f testpool mirror c0t0d0 c1t0d0 mirror c4t0d0 c6t0d0 
 mirror c0t1d0 c1t1d0 mirror c4t1d0 c5t1d0 mirror c6t1d0 c7t1d0 mirror 
c0t2d0 c1t2d0 
 mirror c4t2d0 c5t2d0 mirror c6t2d0 c7t2d0 mirror c0t3d0 c1t3d0 mirror 
c4t3d0 c5t3d0 
 mirror c6t3d0 c7t3d0 mirror c0t4d0 c1t4d0 mirror c4t4d0 c6t4d0 mirror 
c0t5d0 c1t5d0 
 mirror c4t5d0 c5t5d0 mirror c6t5d0 c7t5d0 mirror c0t6d0 c1t6d0 mirror 
c4t6d0 c5t6d0 
 mirror c6t6d0 c7t6d0 mirror c0t7d0 c1t7d0 mirror c4t7d0 c5t7d0 mirror 
c6t7d0 c7t7d0 
 mirror c7t0d0 c7t4d0

Is it even possible to do a raid 0+1?
zpool create -f testpool c0t0d0 c4t0d0 c0t1d0 c4t1d0 c6t1d0 c0t2d0 c4t2d0 
c6t2d0 c0t3d0 c4t3d0 c6t3d0 c0t4d0 c4t4d0 c0t5d0 c4t5d0 c6t5d0 c0t6d0 c4t6d0  
c6t6d0 c0t7d0 c4t7d0 c6t7d0 c7t0d0 mirror c1t0d0 c6t0d0 c1t1d0 c5t1d0 c7t1d0 
c1t2d0 c5t2d0 c7t2d0 c1t3d0 c5t3d0 c7t3d0 c1t4d0 c6t4d0 c1t5d0 c5t5d0 c7t5d0 
c1t6d0 c5t6d0 c7t6d0 c1t7d0 c5t7d0 c7t7d0 c7t4d0
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500/x4540 does the internal controllers have a bbu?

2010-01-12 Thread Brad
Richard,

"Yes, write cache is enabled by default, depending on the pool configuration."
Is it enabled for a striped (mirrored configuration) zpool?  I'm asking because 
of a concern I've read on this forum about a problem with SSDs (and disks) 
where if a power outage occurs any data in cache would be lost if it hasn't 
been flushed to disk.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500/x4540 does the internal controllers have a bbu?

2010-01-12 Thread Brad
"(Caching isn't the problem; ordering is.)"

Weird I was reading about a problem where using SSDs (intel x25-e) if the power 
goes out and the data in cache is not flushed, you would have loss of data.

Could you elaborate on "ordering"?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] x4500/x4540 does the internal controllers have a bbu?

2010-01-12 Thread Brad
Has anyone worked with a x4500/x4540 and know if the internal raid controllers 
have a bbu?  I'm concern that we won't be able to turn off the write-cache on 
the internal hds and SSDs to prevent data corruption in case of a power failure.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] raidz stripe size (not stripe width)

2010-01-04 Thread Brad
Hi Adam,

>From your the picture, it looks like the data is distributed evenly (with the 
>exception of parity) across each spindle then wrapping around again (final 4K) 
>- is this one single write operation or two?

| P | D00 | D01 | D02 | D03 | D04 | D05 | D06 | D07 | <-one write 
op??
| P | D08 | D09 | D10 | D11 | D12 | D13 | D14 | D15 | <-one write 
op??

For a stripe configuration, is this would it would like look for 8K?

| D00 D01 D02 D03 D04 D05 D06 D07 D08 |
| D09 D10 D11 D12 D13 D14 D15 D16 D17 |
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] raidz stripe size (not stripe width)

2010-01-04 Thread Brad
If a 8K file system block is written on a 9 disk raidz vdev, how is the data 
distributed (writtened) between all devices in the vdev since a zfs write is 
one continuously IO operation?

Is it distributed evenly (1.125KB) per device?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] raidz vs raid5 clarity needed

2009-12-29 Thread Brad
@ross

"If the write doesn't span the whole stripe width then there is a read
of the parity chunk, write of the block and a write of the parity
chunk which is the write hole penalty/vulnerability, and is 3
operations (if the data spans more then 1 chunk then it is written in
parallel so you can think of it as one operation, if the data doesn't
fill any given chunk then a read of the existing data chunk is
necessary to fill in the missing data making it 4 operations). No
other operation on the array can execute while this is happening."

I thought with raid5 for a new FS block write, the previous block is read in, 
then read parity, write/update parity then write the new block (2 reads 2 
writes)??

"Yes, reads are exactly like writes on the raidz vdev, no other
operation, read or write, can execute while this is happening. This is
where the problem lies, and is felt hardest with random IOs."

Ah - so with a random read workload, a read IO can not be
executed in multiple streams or simultaneously until the current IO has
completed with raidz.  Was the thought process behind this to mitigate the
write hole issue or for performance (a write is a single IO instead of  3 or 4 
IOs with raid5)?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] raidz vs raid5 clarity needed

2009-12-29 Thread Brad
Hi!  I'm attempting to understand the pros/cons  between raid5 and raidz after 
running into a performance issue with Oracle on zfs  
(http://opensolaris.org/jive/thread.jspa?threadID=120703&tstart=0).

I would appreciate some feedback on what I've understood so far:

WRITES

raid5 - A FS block is written on a single disk (or multiple disks depending on 
size data???)
raidz - A FS block is written in a dynamic stripe (depending on size of 
data?)across n number of vdevs (minus parity).

READS

raid5 - IO count depends on  how many disks FS block written to. (data crosses 
two disks 2 IOs??)
raidz - A single read will span across n number of vdevs (minus parity).  
(1single IO??)

NEGATIVES

raid5 - Write hole penalty, where if system crashes in the middle of a write 
block update before or after updating parity - data is corrupt.
 - Overhead (read previous block, read parity, update parity and write 
block)
- No checksumming of data!
- Slow read sequential performance.

raidz - Bound by x number of IOPS from slowest vdev since blocks are striped.
  Bad for small random reads

POSITIVES

raid5 - Good for random reads (between raid5 and raidz!) since blocks are not 
striped across sum of disks.
raidz - Good for sequential reads and writes since data is striped across sum 
of vdevs.
- No write hole penalty!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] repost - high read iops

2009-12-29 Thread Brad
@relling
"For small, random read IOPS the performance of a single, top-level
vdev is
performance = performance of a disk * (N / (N - P))  
  133 * 12/(12-1)=
  133 * 12/11

where,
N = number of disks in the vdev
P = number of parity devices in the vdev"

performance of a disk => Is this a rough estimate of the disk's IOP?


"For example, using 5 disks @ 100 IOPS we get something like:
2-disk mirror: 200 IOPS
4+1 raidz: 125 IOPS
3+2 raidz2: 167 IOPS
2+3 raidz3: 250 IOPS"

So if the rated iops on our disks is @133 iops
133 * 12/(12-1) = 145

11+1 raidz: 145 IOPS?

If that's the rate for a 11+1 raidz vdev, then why is iostat showing
about 700 combined IOPS (reads/writes) per disk?

r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t0d0
1402.2 7805.3 2.7 36.2 0.2 54.9 0.0 6.0 0 940 c1
10.8 1.0 0.1 0.0 0.0 0.1 0.0 7.0 0 7 c1t0d0
117.1 640.7 0.2 1.8 0.0 4.5 0.0 5.9 1 76 c1t1d0
116.9 638.2 0.2 1.7 0.0 4.6 0.0 6.1 1 78 c1t2d0
116.4 639.1 0.2 1.8 0.0 4.6 0.0 6.0 1 78 c1t3d0
116.6 638.1 0.2 1.7 0.0 4.6 0.0 6.1 1 77 c1t4d0
113.2 638.0 0.2 1.8 0.0 4.6 0.0 6.1 1 77 c1t5d0
116.6 635.3 0.2 1.7 0.0 4.5 0.0 6.0 1 76 c1t6d0
116.2 637.8 0.2 1.8 0.0 4.7 0.0 6.2 1 79 c1t7d0
115.3 636.7 0.2 1.8 0.0 4.4 0.0 5.8 1 77 c1t8d0
115.4 637.8 0.2 1.8 0.0 4.5 0.0 5.9 1 77 c1t9d0
114.8 635.0 0.2 1.8 0.0 4.3 0.0 5.7 1 76 c1t10d0
114.9 639.9 0.2 1.8 0.0 4.7 0.0 6.2 1 78 c1t11d0
115.1 638.7 0.2 1.8 0.0 4.4 0.0 5.9 1 77 c1t12d0
1.6 140.0 0.0 15.1 0.0 0.6 0.0 4.4 0 8 c1t13d0
1.3 9.1 0.0 0.1 0.0 0.0 0.0 1.0 0 0 c1t14d0
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] repost - high read iops

2009-12-29 Thread Brad
@eric

"As a general rule of thumb, each vdev has the random performance
roughly the same as a single member of that vdev. Having six RAIDZ
vdevs in a pool should give roughly the performance as a stripe of six
bare drives, for random IO."

It sounds like we'll need 16 vdevs striped in a pool to at least get the 
performance of 15 drives plus another 16 mirrored for redundancy.

If we are bounded in iops by the vdev, would it make sense to go with the bare 
minimum of drives (3) per vdev?

"This winds up looking similar to RAID10 in layout, in that you're
striping across a lot of disks that each consists of a mirror, though
the checksumming rules are different. Performance should also be
similar, though it's possible RAID10 may give slightly better random
read performance at the expense of some data quality guarantees, since
I don't believe RAID10 normally validates checksums on returned data
if the device didn't return an error. In normal practice, RAID10 and
a pool of mirrored vdevs should benchmark against each other within
your margin of error."

That's interesting to know that with ZFS's implementation of raid10 it doesn't 
have checksumming built-in.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] repost - high read iops

2009-12-29 Thread Brad
@ross

"Because each write of a raidz is striped across the disks the
effective IOPS of the vdev is equal to that of a single disk. This can
be improved by utilizing multiple (smaller) raidz vdevs which are
striped, but not by mirroring them."

So with random reads, would it perform better on a raid5 layout since the FS 
blocks are written to each disk instead of a stripe?

With zfs's implementation of raid10, would we still get data protection and 
checksumming?

"How many luns are you working with now? 15?  
Is the storage direct attached or is it coming from a storage server
that may have the physical disks in a raid configuration already?
If direct attached, create a pool of mirrors. If it's coming from a
storage server where the disks are in a raid already, just create a
striped pool and set copies=2."

We're not using a SAN but a Sun X4270 with sixteen SAS drives (two dedicated to 
OS, two for ssd, raid 11+1.
There's a total of seven datasets from a single pool.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] repost - high read iops

2009-12-29 Thread Brad
Thanks for the suggestion!

I have heard mirrored vdevs configuration are preferred for Oracle but whats 
the difference between a raidz mirrored vdev vs a raid10 setup? 

We have tested a zfs stripe configuration before with 15 disks and our tester 
was extremely happy with the performance.  After talking to our tester, she 
doesn't feel comfortable with the current raidz setup.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] repost - high read iops

2009-12-28 Thread Brad
"This doesn't make sense to me. You've got 32 GB, why not use it?
Artificially limiting the memory use to 20 GB seems like a waste of
good money."

I'm having a hard time convincing the dbas to increase the size of the SGA to 
20GB because their philosophy is, no matter what eventually you'll have to hit 
disk to pick up data thats not stored in cache (arc or l2arc).  The typical 
database server in our environment holds over 3TB of data.

If the performance does not improve then we'll possibly have to change the raid 
layout from raidz to a raid10.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] repost - high read iops

2009-12-28 Thread Brad
"Try an SGA more like 20-25 GB. Remember, the database can cache more
effectively than any file system underneath. The best I/O is the I/O
you don't have to make."

We'll be turning up the SGA size from 4GB to 16GB.
The arc size will be set from 8GB to 4GB.

"This can be a red herring. Judging by the number of IOPS below,
it has not improved. At this point, I will assume you are using
disks that have NCQ or CTQ (eg most SATA and all FC/SAS drives).
If you only issue one command at a time, you effectively disable
NCQ and thus cannot take advantage of its efficiencies."

Here's another sample of the data taken at another time after the number of 
concurrent ios change from 10 to 1.  We're using Seagate Savio 10K SAS 
drives...I could not pull up info if the drives support NCQ or not.  What's the 
recommended value to set concurrent IOs to?  

r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
0.00.00.00.0  0.0  0.00.00.0   0   0 c0
0.00.00.00.0  0.0  0.00.00.0   0   0 c0t0d0
 1402.2 7805.32.7   36.2  0.2 54.90.06.0   0 940 c1
   10.81.00.10.0  0.0  0.10.07.0   0   7 c1t0d0
  117.1  640.70.21.8  0.0  4.50.05.9   1  76 c1t1d0
  116.9  638.20.21.7  0.0  4.60.06.1   1  78 c1t2d0
  116.4  639.10.21.8  0.0  4.60.06.0   1  78 c1t3d0
  116.6  638.10.21.7  0.0  4.60.06.1   1  77 c1t4d0
  113.2  638.00.21.8  0.0  4.60.06.1   1  77 c1t5d0
  116.6  635.30.21.7  0.0  4.50.06.0   1  76 c1t6d0
  116.2  637.80.21.8  0.0  4.70.06.2   1  79 c1t7d0
  115.3  636.70.21.8  0.0  4.40.05.8   1  77 c1t8d0
  115.4  637.80.21.8  0.0  4.50.05.9   1  77 c1t9d0
  114.8  635.00.21.8  0.0  4.30.05.7   1  76 c1t10d0
  114.9  639.90.21.8  0.0  4.70.06.2   1  78 c1t11d0
  115.1  638.70.21.8  0.0  4.40.05.9   1  77 c1t12d0
1.6  140.00.0   15.1  0.0  0.60.04.4   0   8 c1t13d0
1.39.10.00.1  0.0  0.00.01.0   0   0 c1t14d0
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] repost - high read iops

2009-12-27 Thread Brad
Richard - the l2arc is c1t13d0.  What tools can be use to show the l2arc stats?

  raidz1 2.68T   580G543453  4.22M  3.70M
c1t1d0   -  -258102   689K   358K
c1t2d0   -  -256103   684K   354K
c1t3d0   -  -258102   690K   359K
c1t4d0   -  -260103   687K   354K
c1t5d0   -  -255101   686K   358K
c1t6d0   -  -263103   685K   354K
c1t7d0   -  -259101   689K   358K
c1t8d0   -  -259103   687K   354K
c1t9d0   -  -260102   689K   358K
c1t10d0  -  -263103   686K   354K
c1t11d0  -  -260102   687K   359K
c1t12d0  -  -263104   684K   354K
  c1t14d0 396K  29.5G  0 65  7  3.61M
cache-  -  -  -  -  -
  c1t13d029.7G  11.1M157 84  3.93M  6.45M

We've added 16GB to the box bring the overall total to 32GB.
arc_max is set to 8GB:
set zfs:zfs_arc_max = 8589934592

arc_summary output:
ARC Size:
 Current Size: 8192 MB (arcsize)
 Target Size (Adaptive):   8192 MB (c)
 Min Size (Hard Limit):1024 MB (zfs_arc_min)
 Max Size (Hard Limit):8192 MB (zfs_arc_max)

ARC Size Breakdown:
 Most Recently Used Cache Size:  39%3243 MB (p)
 Most Frequently Used Cache Size:60%4948 MB (c-p)

ARC Efficency:
 Cache Access Total: 154663786
 Cache Hit Ratio:  41%   64221251   [Defined State for 
buffer]
 Cache Miss Ratio: 58%   90442535   [Undefined State for 
Buffer]
 REAL Hit Ratio:   41%   64221251   [MRU/MFU Hits Only]

 Data Demand   Efficiency:38%
 Data Prefetch Efficiency:DISABLED (zfs_prefetch_disable)

CACHE HITS BY CACHE LIST:
  Anon:   --%Counter Rolled.
  Most Recently Used: 17%8906 (mru) [ 
Return Customer ]
  Most Frequently Used:   82%53102345 (mfu) [ 
Frequent Customer ]
  Most Recently Used Ghost:   14%9427708 (mru_ghost)[ 
Return Customer Evicted, Now Back ]
  Most Frequently Used Ghost:  6%4344287 (mfu_ghost)[ 
Frequent Customer Evicted, Now Back ]
CACHE HITS BY DATA TYPE:
  Demand Data:84%5108
  Prefetch Data:   0%0
  Demand Metadata:15%9777143
  Prefetch Metadata:   0%0
CACHE MISSES BY DATA TYPE:
  Demand Data:96%87542292
  Prefetch Data:   0%0
  Demand Metadata: 3%2900243
  Prefetch Metadata:   0%0


Also disabled file-level pre-fletch and vdev cache max:
set zfs:zfs_prefetch_disable = 1
set zfs:zfs_vdev_cache_max = 0x1

After reading about some issues with concurrent ios, I tweaked the setting down 
from 35 to 1 and it reduced the response times greatly (2 -> 8ms):
set zfs:zfs_vdev_max_pending=1

It did increased the actv...I'm still unsure about the side-effects here:
r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
0.00.00.00.0  0.0  0.00.00.0   0   0 c0
0.00.00.00.0  0.0  0.00.00.0   0   0 c0t0d0
 2295.2  398.74.27.2  0.0 18.60.06.9   0 1084 c1
0.00.80.00.0  0.0  0.00.00.1   0   0 c1t0d0
  190.3   22.90.40.0  0.0  1.50.07.0   0  87 c1t1d0
  180.9   20.60.30.0  0.0  1.70.08.5   0  95 c1t2d0
  195.0   43.00.30.2  0.0  1.60.06.8   0  93 c1t3d0
  193.2   21.70.40.0  0.0  1.50.06.8   0  88 c1t4d0
  195.7   34.80.30.1  0.0  1.70.07.5   0  97 c1t5d0
  186.8   20.60.30.0  0.0  1.50.07.3   0  88 c1t6d0
  188.4   21.00.40.0  0.0  1.60.07.7   0  91 c1t7d0
  189.6   21.20.30.0  0.0  1.60.07.4   0  91 c1t8d0
  193.8   22.60.40.0  0.0  1.50.07.1   0  91 c1t9d0
  192.6   20.80.30.0  0.0  1.40.06.8   0  88 c1t10d0
  195.7   22.20.30.0  0.0  1.50.06.7   0  88 c1t11d0
  184.7   20.30.30.0  0.0  1.40.06.8   0  84 c1t12d0
7.3   82.40.15.5  0.0  0.00.00.2   0   1 c1t13d0
1.3   23.90.01.3  0.0  0.00.00.2   0   0 c1t14d0

I'm still in talks with the dba in seeing if we can raise the SGA from 4GB to 
6GB to see if it'll help.

The changes that showed a lot of improvement is disabling file/device level 
pre-fletch and reducing concurrent ios from 35 to 1 (tried 10 but it didn't 
help much).  Is there anything else that could be tweaked to increase write 
performance?  Record sizes are set according to 8K and 128K for redo logs.
-- 
This m

[zfs-discuss] repost - high read iops

2009-12-26 Thread Brad
repost - Sorry for ccing the other forums.

I'm running into a issue where there seems to be a high number of read iops 
hitting disks and physical free memory is fluctuating between 200MB -> 450MB 
out of 16GB total. We have the l2arc configured on a 32GB Intel X25-E ssd and 
slog on another 32GB X25-E ssd.

According to our tester, Oracle writes are extremely slow (high latency).

Below is a snippet of iostat:

r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t0d0
4898.3 34.2 23.2 1.4 0.1 385.3 0.0 78.1 0 1246 c1
0.0 0.8 0.0 0.0 0.0 0.0 0.0 16.0 0 1 c1t0d0
401.7 0.0 1.9 0.0 0.0 31.5 0.0 78.5 1 100 c1t1d0
421.2 0.0 2.0 0.0 0.0 30.4 0.0 72.3 1 98 c1t2d0
403.9 0.0 1.9 0.0 0.0 32.0 0.0 79.2 1 100 c1t3d0
406.7 0.0 2.0 0.0 0.0 33.0 0.0 81.3 1 100 c1t4d0
414.2 0.0 1.9 0.0 0.0 28.6 0.0 69.1 1 98 c1t5d0
406.3 0.0 1.8 0.0 0.0 32.1 0.0 79.0 1 100 c1t6d0
404.3 0.0 1.9 0.0 0.0 31.9 0.0 78.8 1 100 c1t7d0
404.1 0.0 1.9 0.0 0.0 34.0 0.0 84.1 1 100 c1t8d0
407.1 0.0 1.9 0.0 0.0 31.2 0.0 76.6 1 100 c1t9d0
407.5 0.0 2.0 0.0 0.0 33.2 0.0 81.4 1 100 c1t10d0
402.8 0.0 2.0 0.0 0.0 33.5 0.0 83.2 1 100 c1t11d0
408.9 0.0 2.0 0.0 0.0 32.8 0.0 80.3 1 100 c1t12d0
9.6 10.8 0.1 0.9 0.0 0.4 0.0 20.1 0 17 c1t13d0
0.0 22.7 0.0 0.5 0.0 0.5 0.0 22.8 0 33 c1t14d0

Is this an indicator that we need more physical memory? From 
http://blogs.sun.com/brendan/entry/test, the order that a read request is 
satisfied is:

1) ARC
2) vdev cache of L2ARC devices
3) L2ARC devices
4) vdev cache of disks
5) disks

Using arc_summary.pl, we determined that prefletch was not helping much so we 
disabled.

CACHE HITS BY DATA TYPE:
Demand Data: 22% 158853174
Prefetch Data: 17% 123009991 <---not helping???
Demand Metadata: 60% 437439104
Prefetch Metadata: 0% 2446824

The write iops started to kick in more and latency reduced on spinning disks:

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t0d0
1629.0 968.0 17.4 7.3 0.0 35.9 0.0 13.8 0 1088 c1
0.0 1.9 0.0 0.0 0.0 0.0 0.0 1.7 0 0 c1t0d0
126.7 67.3 1.4 0.2 0.0 2.9 0.0 14.8 0 90 c1t1d0
129.7 76.1 1.4 0.2 0.0 2.8 0.0 13.7 0 90 c1t2d0
128.0 73.9 1.4 0.2 0.0 3.2 0.0 16.0 0 91 c1t3d0
128.3 79.1 1.3 0.2 0.0 3.6 0.0 17.2 0 92 c1t4d0
125.8 69.7 1.3 0.2 0.0 2.9 0.0 14.9 0 89 c1t5d0
128.3 81.9 1.4 0.2 0.0 2.8 0.0 13.1 0 89 c1t6d0
128.1 69.2 1.4 0.2 0.0 3.1 0.0 15.7 0 93 c1t7d0
128.3 80.3 1.4 0.2 0.0 3.1 0.0 14.7 0 91 c1t8d0
129.2 69.3 1.4 0.2 0.0 3.0 0.0 15.2 0 90 c1t9d0
130.1 80.0 1.4 0.2 0.0 2.9 0.0 13.6 0 89 c1t10d0
126.2 72.6 1.3 0.2 0.0 2.8 0.0 14.2 0 89 c1t11d0
129.7 81.0 1.4 0.2 0.0 2.7 0.0 12.9 0 88 c1t12d0
90.4 41.3 1.0 4.0 0.0 0.2 0.0 1.2 0 6 c1t13d0
0.0 24.3 0.0 1.2 0.0 0.0 0.0 0.2 0 0 c1t14d0


Is it true if your MFU stats start to go over 50% then more memory is needed?
CACHE HITS BY CACHE LIST:
Anon: 10% 74845266 [ New Customer, First Cache Hit ]
Most Recently Used: 19% 140478087 (mru) [ Return Customer ]
Most Frequently Used: 65% 475719362 (mfu) [ Frequent Customer ]
Most Recently Used Ghost: 2% 20785604 (mru_ghost) [ Return Customer Evicted, 
Now Back ]
Most Frequently Used Ghost: 1% 9920089 (mfu_ghost) [ Frequent Customer Evicted, 
Now Back ]
CACHE HITS BY DATA TYPE:
Demand Data: 22% 158852935
Prefetch Data: 17% 123009991
Demand Metadata: 60% 437438658
Prefetch Metadata: 0% 2446824

My theory is since there's not enough memory for the arc to cache data, its 
hits the l2arc where it can't find data and has to query the disk for the 
request. This causes contention between reads and writes causing the service 
times to inflate.

uname: 5.10 Generic_141445-09 i86pc i386 i86pc
Sun Fire X4270: 11+1 raidz (SAS)
l2arc Intel X25-E
slog Intel X25-E
Thoughts?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] high read iops - more memory for arc?

2009-12-24 Thread Brad
I'm running into a issue where there seems to be a high number of read iops 
hitting disks and physical free memory is fluctuating between 200MB -> 450MB 
out of 16GB total.  We have the l2arc configured on a 32GB Intel X25-E ssd and 
slog on another32GB X25-E ssd.

According to our tester, Oracle writes are extremely slow (high latency).   

Below is a snippet of iostat:

r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
0.00.00.00.0  0.0  0.00.00.0   0   0 c0
0.00.00.00.0  0.0  0.00.00.0   0   0 c0t0d0
 4898.3   34.2   23.21.4  0.1 385.30.0   78.1   0 1246 c1
0.00.80.00.0  0.0  0.00.0   16.0   0   1 c1t0d0
  401.70.01.90.0  0.0 31.50.0   78.5   1 100 c1t1d0
  421.20.02.00.0  0.0 30.40.0   72.3   1  98 c1t2d0
  403.90.01.90.0  0.0 32.00.0   79.2   1 100 c1t3d0
  406.70.02.00.0  0.0 33.00.0   81.3   1 100 c1t4d0
  414.20.01.90.0  0.0 28.60.0   69.1   1  98 c1t5d0
  406.30.01.80.0  0.0 32.10.0   79.0   1 100 c1t6d0
  404.30.01.90.0  0.0 31.90.0   78.8   1 100 c1t7d0
  404.10.01.90.0  0.0 34.00.0   84.1   1 100 c1t8d0
  407.10.01.90.0  0.0 31.20.0   76.6   1 100 c1t9d0
  407.50.02.00.0  0.0 33.20.0   81.4   1 100 c1t10d0
  402.80.02.00.0  0.0 33.50.0   83.2   1 100 c1t11d0
  408.90.02.00.0  0.0 32.80.0   80.3   1 100 c1t12d0
9.6   10.80.10.9  0.0  0.40.0   20.1   0  17 c1t13d0
0.0   22.70.00.5  0.0  0.50.0   22.8   0  33 c1t14d0

Is this an indicator that we need more physical memory?  From 
http://blogs.sun.com/brendan/entry/test, the order that a read request is 
satisfied is:

1) ARC
2) vdev cache of L2ARC devices
3) L2ARC devices
4) vdev cache of disks
5) disks

Using arc_summary.pl, we determined that prefletch was not helping much so we 
disabled.

CACHE HITS BY DATA TYPE:
  Demand Data:22%158853174
  Prefetch Data:  17%123009991   <---not helping???
  Demand Metadata:60%437439104
  Prefetch Metadata:   0%2446824

The write iops started to kick in more and latency reduced on spinning disks:
r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
0.00.00.00.0  0.0  0.00.00.0   0   0 c0
0.00.00.00.0  0.0  0.00.00.0   0   0 c0t0d0
 1629.0  968.0   17.47.3  0.0 35.90.0   13.8   0 1088 c1
0.01.90.00.0  0.0  0.00.01.7   0   0 c1t0d0
  126.7   67.31.40.2  0.0  2.90.0   14.8   0  90 c1t1d0
  129.7   76.11.40.2  0.0  2.80.0   13.7   0  90 c1t2d0
  128.0   73.91.40.2  0.0  3.20.0   16.0   0  91 c1t3d0
  128.3   79.11.30.2  0.0  3.60.0   17.2   0  92 c1t4d0
  125.8   69.71.30.2  0.0  2.90.0   14.9   0  89 c1t5d0
  128.3   81.91.40.2  0.0  2.80.0   13.1   0  89 c1t6d0
  128.1   69.21.40.2  0.0  3.10.0   15.7   0  93 c1t7d0
  128.3   80.31.40.2  0.0  3.10.0   14.7   0  91 c1t8d0
  129.2   69.31.40.2  0.0  3.00.0   15.2   0  90 c1t9d0
  130.1   80.01.40.2  0.0  2.90.0   13.6   0  89 c1t10d0
  126.2   72.61.30.2  0.0  2.80.0   14.2   0  89 c1t11d0
  129.7   81.01.40.2  0.0  2.70.0   12.9   0  88 c1t12d0
   90.4   41.31.04.0  0.0  0.20.01.2   0   6 c1t13d0
0.0   24.30.01.2  0.0  0.00.00.2   0   0 c1t14d0


Is it true if your MFU stats start to go over 50% then more memory is needed?
CACHE HITS BY CACHE LIST:
  Anon:   10%74845266   [ New 
Customer, First Cache Hit ]
  Most Recently Used: 19%140478087 (mru)[ 
Return Customer ]
  Most Frequently Used:   65%475719362 (mfu)[ 
Frequent Customer ]
  Most Recently Used Ghost:2%20785604 (mru_ghost)   [ 
Return Customer Evicted, Now Back ]
  Most Frequently Used Ghost:  1%9920089 (mfu_ghost)[ 
Frequent Customer Evicted, Now Back ]
CACHE HITS BY DATA TYPE:
  Demand Data:22%158852935
  Prefetch Data:  17%123009991
  Demand Metadata:60%437438658
  Prefetch Metadata:   0%2446824

My theory is since there's not enough memory for the arc to cache data, its 
hits the l2arc where it can't find data and has to query the disk for the 
request.  This causes contention between reads and writes causing the service 
times to inflate.

Thoughts?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo

Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-22 Thread Brad Diggs
Have you considered running your script with ZFS pre-fetching disabled  
altogether to see if

the results are consistent between runs?

Brad
Brad Diggs
Senior Directory Architect
Virtualization Architect
xVM Technology Lead


Sun Microsystems, Inc.
Phone x52957/+1 972-992-0002
Mail bradley.di...@sun.com
Blog http://TheZoneManager.com
Blog http://BradDiggs.com

On Jul 15, 2009, at 9:59 AM, Bob Friesenhahn wrote:


On Wed, 15 Jul 2009, Ross wrote:

Yes, that makes sense.  For the first run, the pool has only just  
been mounted, so the ARC will be empty, with plenty of space for  
prefetching.


I don't think that this hypothesis is quite correct.  If you use  
'zpool iostat' to monitor the read rate while reading a large  
collection of files with total size far larger than the ARC, you  
will see that there is no fall-off in read performance once the ARC  
becomes full.  The performance problem occurs when there is still  
metadata cached for a file but the file data has since been expunged  
from the cache.  The implication here is that zfs speculates that  
the file data will be in the cache if the metadata is cached, and  
this results in a cache miss as well as disabling the file read- 
ahead algorithm.  You would not want to do read-ahead on data that  
you already have in a cache.


Recent OpenSolaris seems to take a 2X performance hit rather than  
the 4X hit that Solaris 10 takes.  This may be due to improvement of  
existing algorithm function performance (optimizations) rather than  
a related design improvement.


I wonder if there is any tuning that can be done to counteract  
this? Is there any way to tell ZFS to bias towards prefetching  
rather than preserving data in the ARC?  That may provide better  
performance for scripts like this, or for random access workloads.


Recent zfs development focus has been on how to keep prefetch from  
damaging applications like database where prefetch causes more data  
to be read than is needed.  Since OpenSolaris now apparently  
includes an option setting which blocks file data caching and  
prefetch, this seems to open the door for use of more aggressive  
prefetch in the normal mode.


In summary, I agree with Richard Elling's hypothesis (which is the  
same as my own).


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Brad Diggs
You might want to have a look at my blog on filesystem cache  
tuning...  It will probably help

you to avoid memory contention between the ARC and your apps.

http://www.thezonemanager.com/2009/03/filesystem-cache-optimization.html

Brad
Brad Diggs
Senior Directory Architect
Virtualization Architect
xVM Technology Lead


Sun Microsystems, Inc.
Phone x52957/+1 972-992-0002
Mail bradley.di...@sun.com
Blog http://TheZoneManager.com
Blog http://BradDiggs.com

On Jul 4, 2009, at 2:48 AM, Phil Harman wrote:

ZFS doesn't mix well with mmap(2). This is because ZFS uses the ARC  
instead of the Solaris page cache. But mmap() uses the latter. So if  
anyone maps a file, ZFS has to keep the two caches in sync.


cp(1) uses mmap(2). When you use cp(1) it brings pages of the files  
it copies into the Solaris page cache. As long as they remain there  
ZFS will be slow for those files, even if you subsequently use  
read(2) to access them.


If you reboot, your cpio(1) tests will probably go fast again, until  
someone uses mmap(2) on the files again. I think tar(1) uses  
read(2), but from my iPod I can't be sure. It would be interesting  
to see how tar(1) performs if you run that test before cp(1) on a  
freshly rebooted system.


I have done some work with the ZFS team towards a fix, but it is  
only currently in OpenSolaris.


The other thing that slows you down is that ZFS only flushes to disk  
every 5 seconds if there are no synchronous writes. It would be  
interesting to see iostat -xnz 1 while you are running your tests.  
You may find the disks are writing very efficiently for one second  
in every five.


Hope this helps,
Phil

blogs.sun.com/pgdh


Sent from my iPod

On 4 Jul 2009, at 05:26, Bob Friesenhahn  
 wrote:



On Fri, 3 Jul 2009, Bob Friesenhahn wrote:


Copy MethodData Rate
==
cpio -pdum75 MB/s
cp -r32 MB/s
tar -cf - . | (cd dest && tar -xf -)26 MB/s


It seems that the above should be ammended.  Running the cpio based  
copy again results in zpool iostat only reporting a read bandwidth  
of 33 MB/second.  The system seems to get slower and slower as it  
runs.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import hangs

2009-06-16 Thread Brad Reese
Hi Victor,

Yes, you may access the system via ssh. Please contact me at bar001 at uark dot 
edu and I will reply with details of how to connect.

Thanks,

Brad
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import hangs

2009-06-15 Thread Brad Reese
Hi Victor,

'zdb -e -bcsv -t 2435913 tank' ran for about a week with no output. We had yet 
another brown out and then the comp shut down (have a UPS on the way). A few 
days before that I started the following commands, which also had no output:

zdb -e -bcsv -t 2435911 tank
zdb -e -bcsv -t 2435897 tank

I've given up on these because I don't think they'll finish...should I try 
again?

Right now I am trying the following commands which so far have no output:

zdb -e -bcsvL -t 2435913 tank
zdb -e -bsvL -t 2435913 tank
zdb -e -bb -t 2435913 tank

'zdb -e - -t 2435913 tank' has output and is very long...is there anything 
I should be looking for? Without -t 243... this command failed on dmu_read, now 
it just keeps going forever.

Your help is much appreciated.

Thanks,

Brad
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import hangs

2009-06-10 Thread Brad Reese
Hi Victor,

Sorry it took a while for me to reply, I was traveling and had limited network 
access.

'zdb -e -bcsv -t 2435913 tank' has been running for a few days with no 
output...want to try something else?

Here's the output of 'zdb -e -u -t 2435913 tank':

Uberblock

magic = 00bab10c
version = 4
txg = 2435911
guid_sum = 16655261404755214374
timestamp = 1240287900 UTC = Mon Apr 20 23:25:00 2009

Thanks,

Brad
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import hangs

2009-06-02 Thread Brad Reese
Hi Victor,

Here's the output of 'zdb -e -bcsvL tank' (similar to above but with -c). 

Thanks,

Brad

Traversing all blocks to verify checksums ...
zdb_blkptr_cb: Got error 50 reading <0, 11, 0, 0> [L0 packed nvlist] 
4000L/4000P DVA[0]=<0:2500014000:4000> DVA[1]=<0:4400014000:4000> fletcher4 
uncompressed LE contiguous birth=2435914 fill=1 
cksum=2cdaaa4db0:2b105dbdf910e:14b020cadaf6:1720f5444d0b5366 -- skipping
zdb_blkptr_cb: Got error 50 reading <0, 12, 0, 0> [L0 bplist] 4000L/4000P 
DVA[0]=<0:258000:4000> DVA[1]=<0:448000:4000> fletcher4 uncompressed LE 
contiguous birth=2435914 fill=1 
cksum=16b3e12568:143f7c0b11757:91395667fbce35f:9f686628032bddf2 -- skipping
zdb_blkptr_cb: Got error 50 reading <0, 24, 0, 0> [L0 SPA space map] 
1000L/1000P DVA[0]=<0:2500024000:1000> DVA[1]=<0:440002c000:1000> fletcher4 
uncompressed LE contiguous birth=2435914 fill=1 
cksum=76284dc2e9:1438efeb0fb9b:1d2f57253c8d409:d4a881948152382b -- skipping
zdb_blkptr_cb: Got error 50 reading <0, 30, 0, 4> [L0 SPA space map] 
1000L/1000P DVA[0]=<0:256000:1000> DVA[1]=<0:446000:1000> fletcher4 
uncompressed LE contiguous birth=2435914 fill=1 
cksum=5d6df356b5:eed102f1beb0:15280d72b8c8588:5925604865323b6b -- skipping
zdb_blkptr_cb: Got error 50 reading <0, 35, 0, 0> [L0 SPA space map] 
1000L/1000P DVA[0]=<0:257000:1000> DVA[1]=<0:447000:1000> fletcher4 
uncompressed LE contiguous birth=2435914 fill=1 
cksum=335f09c578:8b7b18876592:c58c7a556ca72b:c22e91ead638c69c -- skipping

Error counts:

errno  count
   50  5
block traversal size 431585053184 != alloc 431585209344 (unreachable 156160)

bp count: 4078410
bp logical:433202894336  avg: 106218
bp physical:   431089822720  avg: 105700compression:   1.00
bp allocated:  431585053184  avg: 105821compression:   1.00
SPA allocated: 431585209344 used: 57.75%

Blocks  LSIZE   PSIZE   ASIZE avgcomp   %Total  Type
 -  -   -   -   -   --  deferred free
 1512 512  1K  1K1.00 0.00  object directory
 3  1.50K   1.50K   3.00K  1K1.00 0.00  object array
 116K 16K 32K 32K1.00 0.00  packed nvlist
 -  -   -   -   -   --  packed nvlist size
   114  13.9M   1.05M   2.11M   18.9K   13.25 0.00  bplist
 -  -   -   -   -   --  bplist header
 -  -   -   -   -   --  SPA space map header
 1.36K  6.57M   3.88M   7.94M   5.82K1.69 0.00  SPA space map
 -  -   -   -   -   --  ZIL intent log
 49.4K   791M220M442M   8.95K3.60 0.11  DMU dnode
 4 4K   2.50K   7.50K   1.88K1.60 0.00  DMU objset
 -  -   -   -   -   --  DSL directory
 2 1K  1K  2K  1K1.00 0.00  DSL directory child map
 1512 512  1K  1K1.00 0.00  DSL dataset snap map
 2 1K  1K  2K  1K1.00 0.00  DSL props
 -  -   -   -   -   --  DSL dataset
 -  -   -   -   -   --  ZFS znode
 -  -   -   -   -   --  ZFS V0 ACL
 3.77M   403G401G401G106K1.0099.87  ZFS plain file
 72.4K   114M   49.0M   98.8M   1.37K2.32 0.02  ZFS directory
 1512 512  1K  1K1.00 0.00  ZFS master node
 3  19.5K   1.50K   3.00K  1K   13.00 0.00  ZFS delete queue
 -  -   -   -   -   --  zvol object
 -  -   -   -   -   --  zvol prop
 -  -   -   -   -   --  other uint8[]
 -  -   -   -   -   --  other uint64[]
 -  -   -   -   -   --  other ZAP
 -  -   -   -   -   --  persistent error log
 1   128K   5.00K   10.0K   10.0K   25.60 0.00  SPA history
 -  -   -   -   -   --  SPA history offsets
 -  -   -   -   -   --  Pool properties
 -  -   -   -   -   --  DSL permissions
 -  -   -   -   -   --  ZFS ACL
 -  -   -   -   -   --  ZFS SYSACL
 -  -   -   -   -   --  FUID table
 -  -   -   -   -   --  FUID table size
 -  -   -   -   -   --  DSL dataset next clones
 -  -   -   -   -   --  scrub work queue
 3.89M   403G401G402G103K1.00   100.00  Total

capacity   operations   bandwidth   errors 
description 

Re: [zfs-discuss] zpool import hangs

2009-06-01 Thread Brad Reese
Here's the output of 'zdb -e -bsvL tank' (without -c) in case it helps. I'll 
post with -c if it finishes.

Thanks,

Brad

Traversing all blocks ...
block traversal size 431585053184 != alloc 431585209344 (unreachable 156160)

bp count: 4078410
bp logical:433202894336  avg: 106218
bp physical:   431089822720  avg: 105700compression:   1.00
bp allocated:  431585053184  avg: 105821compression:   1.00
SPA allocated: 431585209344 used: 57.75%

Blocks  LSIZE   PSIZE   ASIZE avgcomp   %Total  Type
 -  -   -   -   -   --  deferred free
 1512 512  1K  1K1.00 0.00  object directory
 3  1.50K   1.50K   3.00K  1K1.00 0.00  object array
 116K 16K 32K 32K1.00 0.00  packed nvlist
 -  -   -   -   -   --  packed nvlist size
   114  13.9M   1.05M   2.11M   18.9K   13.25 0.00  bplist
 -  -   -   -   -   --  bplist header
 -  -   -   -   -   --  SPA space map header
 1.36K  6.57M   3.88M   7.94M   5.82K1.69 0.00  SPA space map
 -  -   -   -   -   --  ZIL intent log
 49.4K   791M220M442M   8.95K3.60 0.11  DMU dnode
 4 4K   2.50K   7.50K   1.88K1.60 0.00  DMU objset
 -  -   -   -   -   --  DSL directory
 2 1K  1K  2K  1K1.00 0.00  DSL directory child map
 1512 512  1K  1K1.00 0.00  DSL dataset snap map
 2 1K  1K  2K  1K1.00 0.00  DSL props
 -  -   -   -   -   --  DSL dataset
 -  -   -   -   -   --  ZFS znode
 -  -   -   -   -   --  ZFS V0 ACL
 3.77M   403G401G401G106K1.0099.87  ZFS plain file
 72.4K   114M   49.0M   98.8M   1.37K2.32 0.02  ZFS directory
 1512 512  1K  1K1.00 0.00  ZFS master node
 3  19.5K   1.50K   3.00K  1K   13.00 0.00  ZFS delete queue
 -  -   -   -   -   --  zvol object
 -  -   -   -   -   --  zvol prop
 -  -   -   -   -   --  other uint8[]
 -  -   -   -   -   --  other uint64[]
 -  -   -   -   -   --  other ZAP
 -  -   -   -   -   --  persistent error log
 1   128K   5.00K   10.0K   10.0K   25.60 0.00  SPA history
 -  -   -   -   -   --  SPA history offsets
 -  -   -   -   -   --  Pool properties
 -  -   -   -   -   --  DSL permissions
 -  -   -   -   -   --  ZFS ACL
 -  -   -   -   -   --  ZFS SYSACL
 -  -   -   -   -   --  FUID table
 -  -   -   -   -   --  FUID table size
 -  -   -   -   -   --  DSL dataset next clones
 -  -   -   -   -   --  scrub work queue
 3.89M   403G401G402G103K1.00   100.00  Total

capacity   operations   bandwidth   errors 
descriptionused avail  read write  read write  read write cksum
tank   402G  294G   463 0 1.27M 0 0 0 1
  mirror   402G  294G   463 0 1.27M 0 0 0 4
/dev/dsk/c2d0p0  69 0 4.05M 0 0 0 4
/dev/dsk/c1d0p0  67 0 3.96M 0 0 0 4
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import hangs

2009-06-01 Thread Brad Reese
Hi Victor,

zdb -e -bcsvL tank
(let this go for a few hours...no output. I will let it go overnight)

zdb -e -u tank
Uberblock

magic = 00bab10c
version = 4
txg = 2435914
guid_sum = 16655261404755214374
timestamp = 1240517036 UTC = Thu Apr 23 15:03:56 2009

Thanks for your help,

Brad
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import hangs

2009-05-31 Thread Brad Reese
Hello,

I've run into a problem with zpool import that seems very similar to the 
following thread as far as I can tell:

http://opensolaris.org/jive/thread.jspa?threadID=70205&tstart=15

The suggested solution was to use a later version of open solaris (b99 or 
later) but that did not work. I've tried the following versions of solaris 
without success:

Solaris 10 u4 (original system)
Solaris 10 u6
Opensolaris 2008.11
Opensolaris 2008.11 b99
SXCE b113

Any help with this will be greatly appreciated...my last backup was four months 
ago so a lot of my thesis work will be lost. I mistakenly thought a mirrored 
zpool on new drives would be good enough for a while. 

So here's what happened: we had a power outage one day and as soon as I tried 
to boot the server again it enters an endless reboot cycle. So I thought the OS 
drive became corrupted (not mirrored) and reinstalled the OS. Then when I try 
zpool import it hangs forever. I even left it going for a couple days in case 
it was trying to correct corrupted data. The same thing happens no matter what 
version of solaris I use. The symptoms and diagnostic results (see below) seem 
to be very similar to the post above but the solution doesn't work.

Please let me know if you need any other information.

Thanks,

Brad


bash-3.2# zpool import
  pool: tank
id: 4410438565134310480
 state: ONLINE
status: The pool is formatted using an older on-disk version.
action: The pool can be imported using its name or numeric identifier, though
some features will not be available without an explicit 'zpool upgrade'.
config:

tankONLINE
  mirrorONLINE
c2d0p0  ONLINE
c1d0p0  ONLINE
bash-3.2# zpool import tank
cannot import 'tank': pool may be in use from other system
use '-f' to import anyway
bash-3.2# zpool import -f tank
(then it hangs here forever, can't be killed)
(the following commands were performed while this was running)


bash-3.2# fmdump -eV
TIME   CLASS
May 27 2009 22:22:55.308533986 ereport.fs.zfs.checksum
nvlist version: 0
class = ereport.fs.zfs.checksum
ena = 0xd22e37db9000401
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = zfs
pool = 0x3d350681ea839c50
vdev = 0x85cc302105002c5d
(end detector)

pool = tank
pool_guid = 0x3d350681ea839c50
pool_context = 0
pool_failmode = wait
vdev_guid = 0x85cc302105002c5d
vdev_type = disk
vdev_path = /dev/dsk/c2d0p0
vdev_devid = id1,c...@ast3750640as=5qd3myrh/q
parent_guid = 0x8fb729a008f16e65
parent_type = mirror
zio_err = 50
zio_offset = 0x2500407000
zio_size = 0x1000
zio_objset = 0x0
zio_object = 0x23
zio_level = 0
zio_blkid = 0x0
__ttl = 0x1
__tod = 0x4a1e038f 0x1263dae2
(and many others just like this)



bash-3.2# echo "0t3735::pid2proc|::walk thread|::findtsack -v" | mdb -k
stack pointer for thread df6ed220: e1ca8c54
  e1ca8c94 swtch+0x188()
  e1ca8ca4 cv_wait+0x53(e08442aa, e084426c, , f9c305ac)
  e1ca8ce4 txg_wait_synced+0x90(e0844100, 252b4c, 0, 0)
  e1ca8d34 spa_config_update_common+0x88(d6c429c0, 0, 0, e1ca8d68)
  e1ca8d84 spa_import_common+0x3cf()
  e1ca8db4 spa_import+0x18(dbfcf000, e3c0d018, 0, f9c65810)
  e1ca8de4 zfs_ioc_pool_import+0xcd(dbfcf000, 0, 0)
  e1ca8e14 zfsdev_ioctl+0x124()
  e1ca8e44 cdev_ioctl+0x31(2d8, 5a02, 80418d0, 13, dfde91f8, e1ca8f00)
  e1ca8e74 spec_ioctl+0x6b(d7a593c0, 5a02, 80418d0, 13, dfde91f8, e1ca8f00)
  e1ca8ec4 fop_ioctl+0x49(d7a593c0, 5a02, 80418d0, 13, dfde91f8, e1ca8f00)
  e1ca8f84 ioctl+0x171()
  e1ca8fac sys_sysenter+0x106()



bash-3.2# echo "::threadlist -v" | mdb -k
d4ed8dc0 fec1f5580   0  60 d5033604
  PC: _resume_from_idle+0xb1THREAD: txg_sync_thread()
  stack pointer for thread d4ed8dc0: d4ed8ba8
swtch+0x188()
cv_wait+0x53()
zio_wait+0x55()  
vdev_uberblock_sync_list+0x19e()
vdev_config_sync+0x11c()
spa_sync+0x5a5()
txg_sync_thread+0x308()
thread_start+8()
(just chose one seemingly relevant thread from the long list)



bash-3.2# zdb -e -bb tank
Traversing all blocks to verify nothing leaked ...
Assertion failed: space_map_load(&msp->ms_map, &zdb_space_map_ops, 0x0, 
&msp->ms_smo, spa->spa_meta_objset) == 0, file ../zdb.c, line 1420, function 
zdb_leak_init
Abort (core dumped)



bash-3.2# zdb -e - tank
Dataset mos [META], ID 0, cr_txg 4, 10.3M, 137 objects, rootbp [L0 DMU objset] 
400L/400P DVA[0]=<0:255c00:400> DVA[1]=<0:445c00:400> 
DVA[2]=<0:633000:400> fletcher4 uncompressed LE contiguous birth=2435914 
fill=137 cksum=5224494b4:4524146f316:1d44c6f4690ea:84ef3bd0c105a0

Object

Re: [zfs-discuss] Data size grew.. with compression on

2009-03-30 Thread Brad Plecs

I've run into this too... I believe the issue is that the block
size/allocation unit size in ZFS is much larger than the default size
on older filesystems (ufs, ext2, ext3).

The result is that if you have lots of small files smaller than the
block size, they take up more total space on the filesystem because
they occupy at least the block size amount.

See the 'recordsize' ZFS filesystem property, though re-reading the
man pages, I'm not 100% sure that tuning this property will have the
intended effect.

BP 


> I rsynced an 11gb pile of data from a remote linux machine to a zfs
> filesystem with compression turned on.
> 
> The data appears to have grown in size rather than been compressed.
> 
> Many, even most of the files are formats that are already compressed,
> such as mpg jpg avi and several others.  But also many text files
> (*.html) are in there.  So didn't expect much compression but also
> didn't expect the size to grow.
> 
> I realize these are different filesystems that may report
> differently.  Reiserfs on the linux machine and zfs on osol.
> 
> in bytes:
> 
>  Osol:11542196307
> linux:11525114469
> =
>  17081838
> 
> Or (If I got the math right) about  16.29 MB bigger on the zfs side
> with compression on.
> 
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 
bpl...@cs.umd.edu
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Any way to set casesensitivity=mixed on the main pool?

2009-02-04 Thread Brad
If you have an older Solaris release using ZFS and Samba, and you upgrade to a 
version with CIFS support, how do you ensure the file systems/pools have 
casesensitivity mixed?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Raidz1 faulted with single bad disk. Requesting

2009-01-28 Thread Brad Hill
Yes. I have disconnected the bad disk and booted with nothing in the slot, and 
also with known good replacement disk in on the same sata port. Doesn't change 
anything.

Running 2008.11 on the box and 2008.11 snv_101b_rc2 on the LiveCD. I'll give it 
a shot booting from the latest build and see if that makes any kind of 
difference.

Thanks for the suggestions.

Brad

> Just a thought, but have you physically disconnected
> the bad disk?  It's not unheard of for a bad disk to
> cause problems with others.
> 
> Failing that, it's the "corrupted data" bit that's
> worrying me, it sounds like you may have other
> corruption on the pool (always a risk with single
> parity raid), but I'm worried that it's not giving
> you any more details as to what's wrong.
> 
> Also, what version of OpenSolaris are you running?
> Could you maybe try booting off a CD of the latest
> build?  There are often improvements in the way ZFS
> copes with errors, so it's worth a try.  I don't
> think it's likely to help, but I wouldn't discount
>  it.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistan

2009-01-27 Thread Brad Hill
I do, thank you. The disk that went out sounds like it had a head crash or some 
such - loud clicking shortly after spin-up then it spins down and gives me 
nothing. BIOS doesn't even detect it properly to do a firmware update.


> Do you know 7200.11 has firmware bugs? 
> 
> Go to seagate website to check.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Raidz1 faulted with single bad disk. Requesting

2009-01-27 Thread Brad Hill
r...@opensolaris:~# zpool import -f tank
internal error: Bad exchange descriptor
Abort (core dumped)

Hoping someone has seen that before... the Google is seriously letting me down 
on that one.

> I guess you could try 'zpool import -f'.  This is a
> pretty odd status,
> I think.  I'm pretty sure raidz1 should survive a
> single disk failure.
> 
> Perhaps a more knowledgeable list member can explain.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistan

2009-01-27 Thread Brad Hill
Any ideas on this? It looks like a potential bug to me, or there is something 
that I'm not seeing.

Thanks again!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistance.

2009-01-24 Thread Brad Hill
> I've seen reports of a recent Seagate firmware update
> bricking drives again.
> 
> What's the output of 'zpool import' from the LiveCD?
>  It sounds like
> ore than 1 drive is dropping off.


r...@opensolaris:~# zpool import
  pool: tank
id: 16342816386332636568
 state: FAULTED
status: The pool was last accessed by another system.
action: The pool cannot be imported due to damaged devices or data.
The pool may be active on another system, but can be imported using
the '-f' flag.
   see: http://www.sun.com/msg/ZFS-8000-EY
config:

tankFAULTED  corrupted data
  raidz1DEGRADED
c6t0d0  ONLINE
c6t1d0  ONLINE
c6t2d0  ONLINE
c6t3d0  UNAVAIL  cannot open
c6t4d0  ONLINE

  pool: rpool
id: 9891756864015178061
 state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier and
the '-f' flag.
   see: http://www.sun.com/msg/ZFS-8000-EY
config:

rpool   ONLINE
  c3d0s0ONLINE
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistance.

2009-01-22 Thread Brad Hill
> I would get a new 1.5 TB and make sure it has the new
> firmware and replace 
> c6t3d0 right away - even if someone here comes up
> with a magic solution, you 
> don't want to wait for another drive to fail.

The replacement disk showed up today but I'm unable to replace the one marked 
UNAVAIL:

r...@blitz:~# zpool replace tank c6t3d0
cannot open 'tank': pool is unavailable

> I would in this case also immediately export the pool (to prevent any 
> write attempts) and see about a firmware update for the failed drive 
> (probably need windows for this).

While I didn't export first, I did boot with a livecd and tried to force the 
import with that:

r...@opensolaris:~# zpool import -f tank
internal error: Bad exchange descriptor
Abort (core dumped)

Hopefully someone on this list understands what situation I am in and how to 
resolve it. Again, many thanks in advance for any suggestions you all have to 
offer.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Raidz1 p

2009-01-19 Thread Brad Hill
Sure, and thanks for the quick reply.

Controller: Supermicro AOC-SAT2-MV8 plugged into a 64-big PCI-X 133 bus
Drives: 5 x Seagate 7200.11 1.5TB disks for the raidz1.
Single 36GB western digital 10krpm raptor as system disk. Mate for this is in 
but not yet mirrored.
Motherboard: Tyan Thunder K8W S2885 (Dual AMD CPU) with 1GB ECC Ram

Anything else I can provide?

(thanks again)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Raidz1 p

2009-01-19 Thread Brad Hill
Greetings!

I lost one out of five disks on a machine with a raidz1 and I'm not sure 
exactly how to recover from it. The pool is marked as FAULTED which I certainly 
wasn't expecting with only one bum disk. 

r...@blitz:/# zpool status -v tank
  pool: tank
 state: FAULTED
status: One or more devices could not be opened.  There are insufficient
replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-3C
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
tankFAULTED  0 0 1  corrupted data
  raidz1DEGRADED 0 0 6
c6t0d0  ONLINE   0 0 0
c6t1d0  ONLINE   0 0 0
c6t2d0  ONLINE   0 0 0
c6t3d0  UNAVAIL  0 0 0  cannot open
c6t4d0  ONLINE   0 0 0


Any recovery guidance I may gain from the esteemed experts of this group would 
be extremely appreciated. I recently migrated to opensolaris + zfs on the 
impassioned advice of a coworker and will lose some data that has been modified 
since the move, but not yet backed up yet.

Many thanks in advance...
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Aggregate Pool I/O

2009-01-18 Thread Brad
Well if I do fsstat mountpoint on all the filesystems in the ZFS pool, then I 
guess my aggregate number for read and write bandwidth should equal the 
aggregate numbers for the pool? Yes?

The downside is that fsstat has the same granularity issue as zpool iostat. 
What I'd really like is nread and nwrite numbers instead of r/s w/s. That way, 
if I miss some polls I can smooth out the results.

kstat -c disk sd::: is interesting, but seems to be only for locally-attached 
disks, right? I am using iSCSI although soon will also have pools with local 
disks.

For device data, I'd really like the per-pool and per-pool per device 
breakdowns provided by zpool iostat, if only it weren't summarized in a 
5-character field. Perhaps I should simply be asking for sample code that 
accesses libzfs

I have rolled my own cron scheduler so I can have the sub-second queries.

Thanks for the info!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Aggregate Pool I/O

2009-01-17 Thread Brad
I'd like to track a server's ZFS pool I/O throughput over time. What's a good 
data source to use for this? I like zpool iostat for this, but if I poll at two 
points in time I would get a number since boot (e.g. 1.2M) and a current number 
(e.g. 1.3K). If I use the current number then I've lost data between polling 
intervals. But if I use the number since boot it's not precise enough to be 
useful.

Is there a kstat equivalent to the I/O since boot? Some other good data source?

And then is there a similar kstat equivalent to iostat? Would both data values 
then allow me to trend file i/O versus physical disk I/O?

Thanks.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool add dumping core

2009-01-10 Thread Brad Plecs
> Are you sure this isn't a case of CR 6433264 which
> was fixed
> long ago, but arrived in patch 118833-36 to Solaris
> 10?

It certainly looks similar, but this system already had 118833-36 when the 
error occurred, so if this bug is truly fixed, it must be something else.  Then 
again, I wasn't adding spares, I was adding a raidz1 group, so maybe it was 
patched for adding spares but not other vdevs.  

I looked at the bug ID but couldn't tell if there was a simple test I could 
perform to determine if this was the same or a related bug, or something 
completely new.   The error message is the same, except for the reported line 
number. 

Here's some mdb output similar to what was in the original bug report: 

r...@kronos:/ # mdb core
Loading modules: [ libumem.so.1 libnvpair.so.1 libuutil.so.1 libc.so.1 
libavl.so.1 libsysevent.so.1 ld.so.1 ]
> $c
libc.so.1`_lwp_kill+8(6, 0, ff1c3058, ff12bed8, , 6)
libc.so.1`abort+0x110(ffbfb760, 1, 0, fcba0, ff1c13d8, 0)
libc.so.1`_assert+0x64(213a8, 213d8, 277, 8d990, fc8bc, 32008)
0x1afe8(11, 0, 1a2d78, dff40, 16f2a400, 4)
0x1b028(8df60, 8cfd0, 0, 0, 0, 4)
make_root_vdev+0x9c(abe48, 0, 1, 0, 8df60, 8cfd0)
0x1342c(8, abe48, 0, 7, 0, ffbffdca)
main+0x154(9, ffbffce4, 9, 3, 33400, ffbffdc6)
_start+0x108(0, 0, 0, 0, 0, 0)

I'm happy to further poke at the core file or provide other data if anyone's 
interested...
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool add dumping core

2009-01-10 Thread Brad Plecs
Problem solved... after the resilvers completed, the status reported that the 
filesystem needed an upgrade. 

I did a zpool upgrade -a, and after that completed and there was no resilvering 
going on, the zpool add ran successfully. 

I would like to suggest, however, that the behavior be fixed -- it should 
report something more intelligent, either "cannot add to pool during resilver", 
or "cannot add to pool until the filesystem is upgraded", whichever is correct, 
instead of dumping core.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool add dumping core

2009-01-09 Thread Brad Plecs
I'm trying to add some additional devices to my existing pool, but it's not 
working.  I'm adding a raidz group of 5 300 GB drives, but the command always 
fails: 

r...@kronos:/ # zpool add raid raidz c8t8d0 c8t13d0 c7t8d0 c3t8d0 c5t8d0
Assertion failed: nvlist_lookup_string(cnv, "path", &path) == 0, file 
zpool_vdev.c, line 631
Abort (core dumped)

The disks all work, were labeled easily using 'format' after zfs and other 
tools refused to look at them. 
Creating a UFS filesystem with newfs on them runs with no issues, but I can't 
add them to the existing zpool.  

I can use the same devices to create a NEW zpool without issue. 

I fully patched up this system after encountering this problem, no change. 

The zpool to which I am adding them is fairly large and in a degraded state 
(three resilvers running, one that never seems to complete and two related to 
trying to add these new disks), but I didn't think that should prevent me from 
adding another vdev. 

For those who suggest waiting 20 minutes for the resilver to finish, it's been 
estimating less than 30 minutes for the last 12 hours, and we're running out of 
space, so I wanted to add the new devices sooner rather than later. 

Can anyone help? 

extra details below:  

r...@kronos:/ # uname -a
SunOS kronos 5.10 Generic_137137-09 sun4u sparc SUNW,Sun-Fire-480R

r...@kronos:/ # smpatch analyze 
137276-01 SunOS 5.10: uucico patch
122470-02 Gnome 2.6.0: GNOME Java Help Patch
121430-31 SunOS 5.8 5.9 5.10: Live Upgrade Patch
121428-11 SunOS 5.10: Live Upgrade Zones Support Patch

r...@kronos:patch # zpool list
NAME   SIZE   USED  AVAILCAP  HEALTH  ALTROOT
raid  4.32T  4.23T  92.1G97%  DEGRADED  -

r...@kronos:patch # zpool status   
  pool: raid
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
repaired.
 scrub: resilver in progress for 12h22m, 97.25% done, 0h20m to go
config:

NAMESTATE READ WRITE CKSUM
raidDEGRADED 0 0 0
  raidz1ONLINE   0 0 0
c9t0d0  ONLINE   0 0 0
c6t0d0  ONLINE   0 0 0
c2t0d0  ONLINE   0 0 0
c4t0d0  ONLINE   0 0 0
c10t0d0 ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c9t1d0  ONLINE   0 0 0
c6t1d0  ONLINE   0 0 0
c2t1d0  ONLINE   0 0 0
c4t1d0  ONLINE   0 0 0
c10t1d0 ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c9t3d0  ONLINE   0 0 0
c6t3d0  ONLINE   0 0 0
c2t3d0  ONLINE   0 0 0
c4t3d0  ONLINE   0 0 0
c10t3d0 ONLINE   0 0 0
  raidz1DEGRADED 0 0 0
c9t4d0  ONLINE   0 0 0
spare   DEGRADED 0 0 0
  c5t13d0   ONLINE   0 0 0
  c6t4d0FAULTED  0 12.3K 0  too many errors
c2t4d0  ONLINE   0 0 0
c4t4d0  ONLINE   0 0 0
c10t4d0 ONLINE   0 0 0
  raidz1DEGRADED 0 0 0
c9t5d0  ONLINE   0 0 0
spare   DEGRADED 0 0 0
  replacing DEGRADED 0 0 0
c6t5d0s0/o  UNAVAIL  0 0 0  cannot open
c6t5d0  ONLINE   0 0 0
  c11t13d0  ONLINE   0 0 0
c2t5d0  ONLINE   0 0 0
c4t5d0  ONLINE   0 0 0
c10t5d0 ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c5t9d0  ONLINE   0 0 0
c7t9d0  ONLINE   0 0 0
c3t9d0  ONLINE   0 0 0
c8t9d0  ONLINE   0 0 0
c11t9d0 ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c5t10d0 ONLINE   0 0 0
c7t10d0 ONLINE   0 0 0
c3t10d0 ONLINE   0 0 0
c8t10d0 ONLINE   0 0 0
c11t10d0ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c5t11d0 ONLINE   0 0 0
  

[zfs-discuss] zpool add dumping core

2009-01-09 Thread Brad Plecs
I'm trying to add some additional devices to my existing pool, but it's not 
working.  I'm adding a raidz group of 5 300 GB drives, but the command always 
fails: 

r...@kronos:/ # zpool add raid raidz c8t8d0 c8t13d0 c7t8d0 c3t8d0 c5t8d0
Assertion failed: nvlist_lookup_string(cnv, "path", &path) == 0, file 
zpool_vdev.c, line 631
Abort (core dumped)

The disks all work, were labeled easily using 'format' after zfs and other 
tools refused to look at them. 
Creating a UFS filesystem with newfs on them runs with no issues, but I can't 
add them to the existing zpool.  

I can use the same devices to create a NEW zpool without issue. 

I fully patched up this system after encountering this problem, no change. 

The zpool to which I am adding them is fairly large and in a degraded state 
(three resilvers running, one that never seems to complete and two related to 
trying to add these new disks), but I didn't think that should prevent me from 
adding another vdev. 

For those who suggest waiting 20 minutes for the resilver to finish, it's been 
estimating < 30 minutes
for the last 12 hours, and we're running out of space, so I wanted to add the 
new devices sooner rather than later. 

Can anyone help? 

extra details below:  

r...@kronos:/ # uname -a
SunOS kronos 5.10 Generic_137137-09 sun4u sparc SUNW,Sun-Fire-480R

r...@kronos:/ # smpatch analyze 
137276-01 SunOS 5.10: uucico patch
122470-02 Gnome 2.6.0: GNOME Java Help Patch
121430-31 SunOS 5.8 5.9 5.10: Live Upgrade Patch
121428-11 SunOS 5.10: Live Upgrade Zones Support Patch

r...@kronos:patch # zpool list
NAME   SIZE   USED  AVAILCAP  HEALTH  ALTROOT
raid  4.32T  4.23T  92.1G97%  DEGRADED  -

r...@kronos:patch # zpool status   
  pool: raid
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
repaired.
 scrub: resilver in progress for 12h22m, 97.25% done, 0h20m to go
config:

NAMESTATE READ WRITE CKSUM
raidDEGRADED 0 0 0
  raidz1ONLINE   0 0 0
c9t0d0  ONLINE   0 0 0
c6t0d0  ONLINE   0 0 0
c2t0d0  ONLINE   0 0 0
c4t0d0  ONLINE   0 0 0
c10t0d0 ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c9t1d0  ONLINE   0 0 0
c6t1d0  ONLINE   0 0 0
c2t1d0  ONLINE   0 0 0
c4t1d0  ONLINE   0 0 0
c10t1d0 ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c9t3d0  ONLINE   0 0 0
c6t3d0  ONLINE   0 0 0
c2t3d0  ONLINE   0 0 0
c4t3d0  ONLINE   0 0 0
c10t3d0 ONLINE   0 0 0
  raidz1DEGRADED 0 0 0
c9t4d0  ONLINE   0 0 0
spare   DEGRADED 0 0 0
  c5t13d0   ONLINE   0 0 0
  c6t4d0FAULTED  0 12.3K 0  too many errors
c2t4d0  ONLINE   0 0 0
c4t4d0  ONLINE   0 0 0
c10t4d0 ONLINE   0 0 0
  raidz1DEGRADED 0 0 0
c9t5d0  ONLINE   0 0 0
spare   DEGRADED 0 0 0
  replacing DEGRADED 0 0 0
c6t5d0s0/o  UNAVAIL  0 0 0  cannot open
c6t5d0  ONLINE   0 0 0
  c11t13d0  ONLINE   0 0 0
c2t5d0  ONLINE   0 0 0
c4t5d0  ONLINE   0 0 0
c10t5d0 ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c5t9d0  ONLINE   0 0 0
c7t9d0  ONLINE   0 0 0
c3t9d0  ONLINE   0 0 0
c8t9d0  ONLINE   0 0 0
c11t9d0 ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c5t10d0 ONLINE   0 0 0
c7t10d0 ONLINE   0 0 0
c3t10d0 ONLINE   0 0 0
c8t10d0 ONLINE   0 0 0
c11t10d0ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c5t11d0 ONLINE   0 0 0
c7t11d

Re: [zfs-discuss] ZFS filesystem creation during JumpStart

2008-12-15 Thread Brad Hudson
Thanks for the response Peter.  However, I'm not looking to create a different 
boot environment (bootenv).  I'm actually looking for a way within JumpStart to 
separate out the ZFS filesystems from a new installation to have better control 
over quotas and reservations for applications that usually run rampant later.  
In particular, I would like better control over the following (e.g. the ability 
to explicitly create them at install time):

rpool/opt - /opt
rpool/usr - /usr
rpool/var - /var
rpool/home - /home

Of the above /home can easily be created post-install, but the others need to 
have the flexibility of being explicitly called out in the JumpStart profile 
from the initial install to provide better ZFS accounting/controls.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS filesystem creation during JumpStart

2008-12-15 Thread Brad Hudson
Does anyone know of a way to specify the creation of ZFS file systems for a ZFS 
root pool during a JumpStart installation?  For example, creating the following 
during the install:

Filesystem   Mountpoint
rpool/var /var
rpool/var/tmp /var/tmp
rpool/home /home

The creation of separate filesystems allows the use of quotas/reservations via 
ZFS, whereas these are not created/protected during a JumpStart install with 
ZFS root.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and SAN

2008-07-27 Thread Brad
> - on a sun cluster, luns are seen on both nodes. Can
> we prevent mistakes like creating a pool on already
> assigned luns ? for example, veritas wants a "force"
> flag. With ZFS i can do :
> node1: zpool create X add lun1 lun2
> node2 : zpool create Y add lun1 lun2
> and then, results are unexpected, but pool X will
> never switch again ;-) resource and zone are dead.

For our iSCSI SAN, we use iSNS to put LUNs into separate
discovery domains (default + domain per host). So as part
of creating or expanding a pool we first move LUNs to the
appropriate host's domain. Create would fail on node2 because
it wouldn't have visibility to the luns. Would that address your issue?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can't rm file when "No space left on device"...

2008-06-10 Thread Brad Diggs
Great point.  Hadn't thought of it in that way.
I haven't tried truncating a file prior to trying
to remove it.  Either way though, I think it is a
bug if once the filesystem fills up, you can't remove
a file.

Brad

On Thu, 2008-06-05 at 21:13 -0600, Keith Bierman wrote:
> On Jun 5, 2008, at 8:58 PM   6/5/, Brad Diggs wrote:
> 
> > Hi Keith,
> >
> > Sure you can truncate some files but that effectively corrupts
> > the files in our case and would cause more harm than good. The
> > only files in our volume are data files.
> >
> 
> 
> 
> So an rm is ok, but a truncation is not?
> 
> Seems odd to me, but if that's your constraint so be it.
> 
-- 
-----
  _/_/_/  _/_/  _/ _/   Brad Diggs
 _/  _/_/  _/_/   _/Communications Area Market
_/_/_/  _/_/  _/  _/ _/ Senior Directory Architect
   _/  _/_/  _/   _/_/
  _/_/_/   _/_/_/   _/ _/   Office:  972-992-0002
E-Mail:  [EMAIL PROTECTED]
 M  I  C  R  O  S  Y  S  T  E  M  S

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Can't rm file when "No space left on device"...

2008-06-04 Thread Brad Diggs
Hello,

A customer recently brought to my attention that ZFS can get
into a situation where the filesystem is full but no files 
can be removed.  The workaround is to remove a snapshot and
then you should have enough free space to remove a file.  
Here is a sample series of commands to reproduce the 
problem.

# mkfile 1g /tmp/disk.raw
# zpool create -f zFullPool /tmp/disk2.raw
# sz=`df -k /zFullPool | awk '{ print $2 }' | tail -1`
# mkfile $((${sz}-1024))k /zFullPool/f1
# zfs snapshot [EMAIL PROTECTED]
# sz=`df -k /zFullPool | awk '{ print $2 }' | tail -1`
# mkfile ${sz}k /zFullPool/f2
/zFullPool/f2: initialized 401408 of 1031798784 bytes: No space left on
device
# df -k /zFullPool
Filesystemkbytesused   avail capacity  Mounted on
zFullPool1007659 1007403   0   100%/zFullPool
# rm -f /zFullPool/f1
# ls -al /zFullPool
total 2014797
drwxr-xr-x   2 root sys4 Jun  4 12:15 .
drwxr-xr-x  31 root root   18432 Jun  4 12:14 ..
-rw--T   1 root root 1030750208 Jun  4 12:15 f1
-rw---   1 root root 1031798784 Jun  4 12:15 f2
# rm -f /zFullPool/f2
# ls -al /zFullPool
total 2014797
drwxr-xr-x   2 root sys4 Jun  4 12:15 .
drwxr-xr-x  31 root root   18432 Jun  4 12:14 ..
-rw--T   1 root root 1030750208 Jun  4 12:15 f1
-rw---   1 root root 1031798784 Jun  4 12:15 f2

At this point, the only way in which I can free up sufficient
space to remove either file is to first remove the snapshot.

# zfs destroy [EMAIL PROTECTED]
# rm -f /zFullPool/f1
# ls -al /zFullPool
total 1332
drwxr-xr-x   2 root sys3 Jun  4 12:17 .
drwxr-xr-x  31 root root   18432 Jun  4 12:14 ..
-rw---   1 root root 1031798784 Jun  4 12:15 f2

Is there an existing bug on this that is going to address
enabling the removal of a file without the pre-requisite 
removal of a snapshot?

Thanks in advance,
Brad
-- 
-
  _/_/_/  _/_/  _/ _/   Brad Diggs
 _/  _/_/  _/_/   _/Communications Area Market
_/_/_/  _/_/  _/  _/ _/ Senior Directory Architect
   _/  _/_/  _/   _/_/
  _/_/_/   _/_/_/   _/ _/   Office:  972-992-0002
E-Mail:  [EMAIL PROTECTED]
 M  I  C  R  O  S  Y  S  T  E  M  S

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Shrinking a zpool?

2008-05-06 Thread Brad Bender
Solaris 10 update 5 was released 05/2008, but no zpool shrink :-(  Any update?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] How do you determine the zfs_vdev_cache_size current value?

2008-04-29 Thread Brad Diggs
How do you ascertain the current zfs vdev cache size (e.g. 
zfs_vdev_cache_size) via mdb or kstat or any other cmd?

Thanks in advance,
Brad
-- 
The Zone Manager
http://TheZoneManager.COM
http://opensolaris.org/os/project/zonemgr

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] RFE: Start with desired end state in mind...

2008-02-29 Thread Brad Diggs
I love the send and receive feature of zfs.  However, the one feature 
that it lacks is that I can't specify on the receive end how I want 
the destination zfs filesystem to be be created before receiving the
data being sent.  

For example, lets say that I would like to do a compression study to
determine which level of compression of the gzip algorithm would save
the most space for my data.  One of the easiest ways to do that 
locally or remotely would be to use send/receive like so.

zfs snapshot zpool/[EMAIL PROTECTED]
gz=1
while [ ${gz} -le 9 ]
do
   zfs send zpool/[EMAIL PROTECTED] | \
 zfs receive -o compression=gzip-${gz} zpool/gz${gz}data
   zfs list zpool/gz${gz}data
done
zfs destroy zpool/[EMAIL PROTECTED]

Another example.  Lets assume that that the zfs encryption feature was
available today.  Further, lets assume that I have a filesystem that
has compression and encryption enabled.  I want to duplicate that exact
zfs filesystem on another system through send/receive.  Today the
receive feature does not give me the ability to specify the desired end
state configuration of the destination zfs filesystem before receiving
the data.  I think that would be a great feature.

Just some food for thought.

Thanks in advance,
Brad

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Is gzip planned to be in S10U5?

2008-02-13 Thread Brad Diggs
Hello,

Is the gzip compression algorithm planned to be in Solaris 10 Update 5?

Thanks in advance,
Brad
-- 
The Zone Manager
http://TheZoneManager.COM
http://opensolaris.org/os/project/zonemgr

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] UFS on zvol Cache Questions...

2008-02-08 Thread Brad Diggs
Hello Darren,

Please find responses in line below...

On Fri, 2008-02-08 at 10:52 +, Darren J Moffat wrote:
> Brad Diggs wrote:
> > I would like to use ZFS but with ZFS I cannot prime the cache
> > and I don't have the ability to control what is in the cache 
> > (e.g. like with the directio UFS option).
> 
> Why do you believe you need that at all ?  

My application is directory server.  The #1 resource that 
directory needs to make maximum utilization of is RAM.  In 
order to do that, I want to control every aspect of RAM
utilization both to safely use as much RAM as possible AND
avoid contention among things trying to use RAM.

Lets consider the following example.  A customer has a 
50M entry directory.  The sum of the data (db3 files) is
approximately 60GB.  However, there is another 2GB for the
root filesystem, 30GB for the changelog, 1GB for the 
transaction logs, and 10GB for the informational logs.

The system on which directory server will run has only 
64GB of RAM.  The system is configured with the following
partitions:

  FS  Used(GB)  Description
   /  2 root
   /db60directory data
   /logs  41changelog, txn logs, and info logs
   swap   10system swap

I prefer to keep the directory db cache and entry caches
relatively small.  So the db cache is 2GB and the entry 
cache is 100M.  This leaves roughly 63GB of RAM for my 60GB
of directory data and Solaris. The only way to ensure that
the directory data (/db) is the only thing in the filesystem
cache is to set directio on / (root) and (/logs).

> What do you do to "prime" the cache with UFS 

cd /db
for i in `find . -name '*.db3"`
do
  dd if="${i}" of=/dev/null
done

> and what benefit do you think it is giving you ?

Priming the directory server data into filesystem cache 
reduces ldap response time for directory data in the
filesystem cache.  This could mean the difference between
a sub ms response time and a response time on the order of
tens or hundreds of ms depending on the underlying storage
speed.  For telcos in particular, minimal response time is 
paramount.

Another common scenario is when we do benchmark bakeoffs
with another vendor's product.  If the data isn't pre-
primed, then ldap response time and throughput will be
artificially degraded until the data is primed into either
the filesystem or directory (db or entry) cache.  Priming
via ldap operations can take many hours or even days 
depending on the number of entries in the directory server.
However, priming the same data via dd takes minutes to hours
depending on the size of the files.  

As you know in benchmarking scenarios, time is the most limited
resource that we typically have.  Thus, priming via dd is much
preferred.

Lastly, in order to achieve optimal use of available RAM, we
use directio for the root (/) and other non-data filesystems.
This makes certain that the only data in the filesystem cache
is the directory data.

> Have you tried just using ZFS and found it doesn't perform as you need 
> or are you assuming it won't because it doesn't have directio ?

We have done extensive testing with ZFS and love it.  The three 
areas lacking for our use cases are as follows:
 * No ability to control what is in cache. e.g. no directio
 * No absolute ability to apply an upper boundary to the amount
   of RAM consumed by ZFS.  I know that the arc cache has a 
   control that seems to work well. However, the arc cache is
   only part of ZFS ram consumption.
 * No ability to rapidly prime the ZFS cache with the data that 
   I want in the cache.

I hope that helps give understanding to where I am coming from!

Brad

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] UFS on zvol Cache Questions...

2008-02-07 Thread Brad Diggs
Hello,

I have a unique deployment scenario where the marriage
of ZFS zvol and UFS seem like a perfect match.  Here are
the list of feature requirements for my use case:

* snapshots
* rollback
* copy-on-write
* ZFS level redundancy (mirroring, raidz, ...)
* compression
* filesystem cache control (control what's in and out)
* priming the filesystem cache (dd if=file of=/dev/null)
* control the upper boundary of RAM consumed by the
  filesystem.  This helps me to avoid contention between
  the filesystem cache and my application.

Before zfs came along, I could achieve all but rollback,
copy-on-write and compression through UFS+some volume manager.

I would like to use ZFS but with ZFS I cannot prime the cache
and I don't have the ability to control what is in the cache 
(e.g. like with the directio UFS option).

If I create a ZFS zvol and format it as a UFS filesystem, it
seems like I get the best of both worlds.  Can anyone poke 
holes in this strategy?

I think the biggest possible risk factor is if the ZFS zvol
still uses the arc cache.  If this is the case, I may be 
double-dipping on the filesystem cache.  e.g. The UFS filesystem
uses some RAM and ZFS zvol uses some RAM for filesystem cache.
Is this a true statement or does the zvol use a minimal amount
of system RAM?

Lastly, if I were to try this scenario, does anyone know how to
monitor the RAM consumed by the zvol and UFS?  e.g. Is there a 
dtrace script for monitoring ZFS or UFS memory consuption?

Thanks in advance,
Brad

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS quota

2007-08-27 Thread Brad Plecs

> OK, you asked for "creative" workarounds... here's one (though it requires 
> that the filesystem be briefly unmounted, which may be deal-killing):

That is, indeed, creative.  :)   And yes, the unmount make it 
impractical in my environment.  

I ended up going back to rsync, because we had more and more
complaints as the snapshots accumulated, but am now just rsyncing to
another system, which in turn runs snapshots on the backup copy.  It's
still time- and i/o-consuming, and the users can't recover their own
files, but at least I'm not eating up 200% of the space otherwise
necessary on the expensive new hardware raid and fielding daily 
over-quota (when not really over-quota) complaints. 

Thanks for the suggestion.  Looking forward to the new feature... 

BP 



> 
> zfs create pool/realfs
> zfs set quota=1g pool/realfs
> 
> again:
> zfs umount pool/realfs
> zfs rename pool/realfs pool/oldfs
> zfs snapshot pool/[EMAIL PROTECTED]
> zfs clone pool/[EMAIL PROTECTED] pool/realfs
> zfs set quota=1g pool/realfs  (6364688 would be useful here)
> zfs set quota=none pool/oldfs
> zfs promote pool/oldfs
> zfs destroy pool/backupfs
> zfs rename pool/oldfs pool/backupfs
> backup pool/[EMAIL PROTECTED]
> sleep $backupinterval
> goto again
> 
> FYI, we are working on "fs-only" quotas.
> 
> --matt

-- 
[EMAIL PROTECTED]
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS quota

2007-08-22 Thread Brad Plecs
Just wanted to voice another request for this feature.

I was forced on a previous Solaris10/ZFS system to rsync whole filesystems, and 
snapshot the backup copy to prevent the snapshots from negatively impacting 
users.  This obviously has the effect of reducing available space on the system 
by over half.  It also robs you of lots of I/O bandwidth while all that data is 
rsyncing, and  means that users can't see their snapshots, only a sysadmin with 
access to the backup copy can.  

We've got a new system that isn't doing the rsync, and users very quickly 
discovered 
over-quota problems when their directories appeared empty, and deleting files 
didn't help. 
They required sysadmin intervention to increase their filesystem quotas to 
accomodate the snapshots and their real data.  Trying to anticipate the space 
required for the snapshots and
giving them that as a quota is more or less hopeless, plus it gives them that 
much more 
rope with which to hang themselves with massive snapshots.  

I hate to start rsyncing again, but may be forced to; policing the snapshot 
space consumption is 
getting painful, but the online snapshot feature is too valuable to discard 
altogether.  

or if there are other creative solutions, I'm all ears...
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS - Use h/w raid or not? Thoughts. Considerations.

2007-05-23 Thread Brad Plecs
> > At the moment, I'm hearing that using h/w raid under my zfs may be
> >better for some workloads and the h/w hot spare would be nice to
> >have across multiple raid groups, but the checksum capabilities in
> >zfs are basically nullified with single/multiple h/w lun's
> >resulting in "reduced protection."  Therefore, it sounds like I
> >should be strongly leaning towards not using the hardware raid in
> >external disk arrays and use them like a JBOD.

> The big reasons for continuing to use hw raid is speed, in some cases, 
> and heterogeneous environments where you can't farm out non-raid 
> protected LUNs and raid protected LUNs from the same storage array. In 
> some cases the array will require a raid protection setting, like the 
> 99x0, before you can even start farming out storage.

Just a data point -- I've had miserable luck with ZFS JBOD drives
failing.  They consistently wedge my machines (Ultra-45, E450, V880,
using SATA, SCSI drives) when one of the drives fails.  The system
recovers okay and without data loss after a reboot, but a total drive
failure (when a drive stops talking to the system) is not handled
well.

Therefore I would recommend a hardware raid for high-availability
applications.

Note, it's not clear that this is a ZFS problem.  I suspect it's a
solaris or hardware controller or driver problem, so this may not be
an issue if you find a controller that doesn't freak on a drive
failure.

BP 

-- 
[EMAIL PROTECTED]
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Puzzling ZFS behavior with COMPRESS option

2007-04-17 Thread Brad Green
Did you find a resoltion to this issue?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS over NFS extra slow?

2007-01-03 Thread Brad Plecs
write cache was enabled on all the ZFS drives, but disabling it gave a 
negligible speed improvement:  (FWIW, the pool has 50 drives) 

(write cache on) 

/bin/time tar xf /tmp/vbulletin_3-6-4.tar

real   51.6
user0.0
sys 1.0

(write cache off) 

/bin/time tar xf /tmp/vbulletin_3-6-4.tar

real   49.2
user0.0
sys 1.0


...this is a production system, so I attribute the 2-second (4%) difference 
more to variable 
system activity than to the write cache.  

I suppose I could test with larger samples, but since this is still ten times 
slower than I want, 
I think this effectively discounts the disk write cache as anything significant.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS over NFS extra slow?

2007-01-02 Thread Brad Plecs
Ah, thanks -- reading that thread did a good job of explaining what I was 
seeing.  I was going
nuts trying to isolate the problem. 

Is work being done to improve this performance? 100% of my users are coming in 
over NFS,
and that's a huge hit.   Even on single large files, writes are slower by a 
factor of 2 to 10 compared
to if I copy via scp or onto a non-zfs filesystem.  

Thanks!
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS over NFS extra slow?

2007-01-02 Thread Brad Plecs
I had a user report extreme slowness on a ZFS filesystem mounted over NFS over 
the weekend. 
After some extensive testing, the extreme slowness appears to only occur when a 
ZFS filesystem is mounted over NFS.  

One example is doing a 'gtar xzvf php-5.2.0.tar.gz'... over NFS onto a ZFS 
filesystem.  this takes: 

real5m12.423s
user0m0.936s
sys 0m4.760s

Locally on the server (to the same ZFS filesystem) takes: 

real0m4.415s
user0m1.884s
sys 0m3.395s

The same job over NFS to a UFS filesystem takes

real1m22.725s
user0m0.901s
sys 0m4.479s

Same job locally on server to same UFS filesystem: 

real0m10.150s
user0m2.121s
sys 0m4.953s


This is easily reproducible even with single large files, but the multiple 
small files 
seems to illustrate some awful sync latency between each file.  

Any idea why ZFS over NFS is so bad?  I saw the threads that talk about an 
fsync penalty, 
but they don't seem relevant since the local ZFS performance is quite good.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


  1   2   >