Re: [zfs-discuss] Moving a pool from FreeBSD 8.0 to opensolaris

2009-12-23 Thread Thomas Burgess
On Wed, Dec 23, 2009 at 10:36 PM, Ian Collins  wrote:

>
>>
>
> An EFI label isn't "OS specific formatting"!
>
>
>
at the risk of sounding really stupidis an EFI label the same as using
guid partions? I think i remember reading about setting GUID partioned disks
in FreeBSD.  If so, i could try to use these for the new pool if it would
help in migration.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Moving a pool from FreeBSD 8.0 to opensolaris

2009-12-23 Thread Ian Collins

Mattias Pantzare wrote:

I'm not sure how to go about it.  Basically, how should i format my
drives in FreeBSD, create a ZPOOL which can be imported into OpenSolaris.
  

I'm not sure about BSD, but Solaris ZFS works with whole devices.  So there 
isn't any OS specific formatting involved.  I assume BSD does the same.



That is not true. ZFS will use a EFI partition table with one
partition if you give it the whole disk.

  

An EFI label isn't "OS specific formatting"!



My guess is that you should put it in an EFI partition. But a normal
partition should work.

  

ZFS will write one if you add whole drives to a pool.

I would test this in virtualbox or vmware if I where you.

  

Sensible idea.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] what is the best way to hook up my drives

2009-12-23 Thread Thomas Burgess
I am planning on building an opensolaris server to replace my NAS.

My case has room for 20 hotswap sata drives and 1 or 2 internal drives.  I
was planning on going with 5 raidz vdevs each with 4 drives, and maybe a hot
spare inside the case in one of the extra slots.

I am going to use 2 Supermicro AOC-SAT2-MV8 cards (pci-x 133 each with 8
sata ports)

The motherboard is going to be either Supermicro mbd-c2sbx or Supermicro
mbd-x7sbe  both have 6 onboard sata ports.  So, I've got a couple things i
could do...If possible i'd like to avoid buying another raidcard UNLESS i
can find one for under 100 dollars..what i'm wondering is, what is the
best way to construct my vdevs.

Should i have all 4 drives on a single raid card, then the next vdev on
another raid card?  Should i try to create them where i put 1 drive on one
card, 2 on one then one on the motherboard for each vdev, alternating which
gets 2 drives? Any suggestions would be helpful.


It's mainly going to be used for high def x264 movie playback over samba or
nfs to 4 or 5 htpc's.

My current machine is FreeBSD 8.0 with ZFS and has 3 raidz vdevs each with 4
drives, but they are all on the same controller (which is a highpoint
rocketraid 2340)

I used FreeBSD because solaris didnt' work with this card, but with me
upgrading, this soon won't be an issue.  If i were to buy a 3rd raid card to
use one of the pci-e slots, would it be best to avoid using the onboard
ports all together or would it then make more sense to create vdevs with one
drive connected to each (3 raid cards, 1 onboard)


Is there a compatible 8 port pci-e card under 150 dollars?

thanks.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] raidz data loss stories?

2009-12-23 Thread Eric D. Mudama

On Tue, Dec 22 at 12:33, James Risner wrote:

As for whether or not to do raidz, for me the issue is performance.
I can't handle the raidz write penalty.  If I needed triple drive
protection, a 3way mirror setup would be the only way I would go.  I
don't yet quite understand why a 4+ drive raidz3 vdev is better than
a 3 drive mirror vdev?  Other than a 6 drive setup is 3 drives of
space when a 6 drive setup using 3 way mirror is only 2 drive space.


That's a pretty big "other than", since the difference is 50% more
space for the raidz3 in your case, and the difference grows as the
number of drives increases.

I concur with some of the other thoughts, that migration towards L2ARC
+ big slow sata pools is becoming a more recommended configuration.
The recent automated pool recovery and ZIL removal improvements makes
that design much more practical.

--eric


--
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] quotas on zfs at solaris 10 update 9 (10/09)

2009-12-23 Thread Jorgen Lundman



Len Zaifman wrote:

Because we have users who will create millions of files in a directory it would 
be nice to report the number of files a user has or a group has in a filesystem.

Is there a way (other than find) to get this?


I don't know if there is a good way, but I have noticed that with ZFS, the 
number in "ls" which used to be for "blocks" actually report the number of 
entries in the directory (-1).


drwxr-xr-x  13 root bin   13 Oct 28 02:58 spool
^^

# ls -la spool | wc -l
  14

Which means you can probably add things up a little faster.



--
Jorgen Lundman   | 
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS- When do you add more memory?

2009-12-23 Thread Bob Friesenhahn

On Wed, 23 Dec 2009, Yanjun (Tiger) Hu wrote:


Hi Jim,

I think Tony was asking a very valid question. It reminds me 
http://developers.sun.com/solaris/articles/sol8memory.html#where.


The question is valid, but the answer will be misleading.  Regardless 
of if a memory page represents part of a memory mapped file, 
traditional filesystem cache, an application heap/stack, or zfs ARC, 
it is still caching data that the application has used and may use 
again.  The zfs ARC is smarter so it is better at discarding the data 
which is least likely to be used if there is memory pressure.  Even 
though the ARC is smarter, there is still an expensive disk access 
required to restore that data if the application accesses it again.


I find the 'arc_summary.pl' script available from 
http://cuddletech.com/arc_summary/ to be a quite useful tool to see a 
memory breakdown, including ARC sizes.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS- When do you add more memory?

2009-12-23 Thread Jim Laurent
Think he's looking for a single, intuitively obvious, easy to acces indicator 
of memory usage along the lines of the vmstat free column (before ZFS) that 
show the current amount of free RAM.


On Dec 23, 2009, at 4:09 PM, Jim Mauro wrote:

> Hi Anthony -
> 
> I don't get this. How does the presence (or absence) of the ARC change
> the methodology for doing memory capacity planning?
> 
> Memory capacity planning is all about identifying and measuring consumers.
> Memory consumers;
> - The kernel.
> - User processes.
> - The ZFS ARC, which is technically part of the kernel, but it should be
>   tracked and measured seperately.
> 
> prstat, pmap, kstat -n arcstats and "echo ::memstat | mdb -k" will tell you
> pretty much everything you need to know about how much memory
> is being consumed by the consumers.
> 
> Now granted, it is harder than I'm making it sound here.
> User processes can be tricky if they share memory, but typically
> that's not a big deal unless it's a database load or an application that
> explicitly uses shared memory as a form of IPC.
> 
> The ARC stuff is also tricky, because it's really hard to determine what
> the active working set is for file system data, you want the ARC big
> enough to deliver acceptable performance, but not so big as to
> potentially cause short term memory shortfalls. It requires some
> observation and period collection of statistics, the most important
> statistic being the level of performance of the customers workload.
> 
> As an aside, there's nothing about this that requires it be posted
> to zfs-discuss-confidential. I posted to zfs-disc...@opensolaris.org.
> 
> 
> Thanks,
> /jim
> 
> 
> Anthony Benenati wrote:
>> Jim,
>> 
>> The issue with using scan rate alone is if you are looking for why you have 
>> significant performance degradation and scan rate is high it's a good 
>> indicator that you may have a memory issue however it doesn't help if you 
>> want to preemptively prevent future system degradation since it's not 
>> predictive. There are no thresholds that can be correlated to memory size 
>> for capacity planning.
>> 
>> I should be more clear with my question. How does a client determine when 
>> and how much memory they need in the future if they can't track memory 
>> utilization without including ARC?  Most customers  monitor their memory 
>> utilization and take action when they see memory utilization at a policy 
>> determined threshold to prevent potential future performance degradation or 
>> for capacity planning. 
>> From what I've read searching the aliases, there doesn't seem to be a good 
>> way to determine how much memory is being used by the system without 
>> including ARC. If that's the case,  it seems to me we either need to give 
>> them that capability or offer an alternative for capacity planning.
>> 
>> Tony
>> 
>> On Dec 23, 2009, at 12:45 PM, Jim Laurent wrote:
>> 
>>> I believe that the SR (scan rate) column in vmstat is still the best 
>>> indicator of when the applications space is limited.  The scan rate 
>>> indicates that the page scanner is running and looking for applications 
>>> pages to move out of physical RAM onto SWAP.
>>> 
>>> More information about ZFS memory usage is available at:
>>> 
>>> http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Memory_and_Dynamic_Reconfiguration_Recommendations
>>> 
>>> 
>>> On Dec 22, 2009, at 6:02 PM, anthony.benen...@sun.com 
>>>  wrote:
>>> 
 My customer asks the following:
 
 "Since we started using ZFS to manage our systems root file systems we've 
 noticed that the memory utilization on those systems is near 100%. We 
 understand that this is ZFS's arc cache taking advantage of the unused 
 system memory however it doesn't look very good for our system monitors & 
 graphs.
 Is there anyway to report the memory utilization of these system without 
 taking into account ZFS's arc cache memory utilization? "
 
 While I have a couple of  not perfect ideas on answering his question as 
 stated, however the reason behind these request are to determine when the 
 system is reaching its maximum memory utilization where by they may need 
 to add more memory,  or to help resolve a performance issue which may or 
 may not be caused by a memory deficiency. Since with ZFS memory 
 utilization is no longer a good indicator for memory deficiency, what 
 should you be looking at to determine if you have a memory deficiency. Is 
 it similar to UFS such as high scan rate, excessive paging, etc? Is so, 
 how do you determine thresholds.
 
 Any  documents, comments or opinions would be welcome.
 
 Thanks,
 Tony
>>> 
>>> * Jim Laurent *
>>> Architect
>>> 
>>> Phone x24859/+1 703 204 4859
>>> Mobile 703-624-7000
>>> Fax 703-208-5858
>>> Email jim.laur...@sun.com  
>>> 

Re: [zfs-discuss] Moving a pool from FreeBSD 8.0 to opensolaris

2009-12-23 Thread Mattias Pantzare
>> I'm not sure how to go about it.  Basically, how should i format my
>> drives in FreeBSD, create a ZPOOL which can be imported into OpenSolaris.
>
> I'm not sure about BSD, but Solaris ZFS works with whole devices.  So there 
> isn't any OS specific formatting involved.  I assume BSD does the same.

That is not true. ZFS will use a EFI partition table with one
partition if you give it the whole disk.

My guess is that you should put it in an EFI partition. But a normal
partition should work.

I would test this in virtualbox or vmware if I where you.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Benchmarks results for ZFS + NFS, using SSD's as slog devices (ZIL)

2009-12-23 Thread Mattias Pantzare
>> UFS is a totally different issue, sync writes are always sync'ed.
>>
>> I don't work for Sun, but it would be unusual for a company to accept
>> willful negligence as a policy.  Ambulance chasing lawyers love that
>> kind of thing.
>
> The Thor replaces a geriatric Enterprise system running Solaris 8 over
> UFS. For these workloads it beat the pants out of our current setup
> and somehow the "but you're safer now" argument doesn't go over very
> well :)
>
> We are under the impression that a setup that server NFS over UFS has
> the same assurance level than a setup using "ZFS without ZIL". Is this
> impression false?

That impression is false!

No ZIL is especially bad for NFS applications.

If you have disabled ZIL and you reboot your NFS server while a client
is writing you will have writes that have "disappeared" and no error
will be logged on the server or the client. Thus the result till be
corrupted files and no way to know that the are corrupted.

UFS (with or without the log) will behave as ZFS with ZIL.

The ZIL is _not_ optional as the log is in UFS.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Benchmarks results for ZFS + NFS, using SSD's as slog devices (ZIL)

2009-12-23 Thread Daniel Carosone
On Thu, Dec 24, 2009 at 12:07:03AM +0100, Jeroen Roodhart wrote:
> We are under the impression that a setup that server NFS over UFS has
> the same assurance level than a setup using "ZFS without ZIL". Is this
> impression false?

Completely.  It's closer to "UFS mount -o async", without the risk of
UFS corruption, but still with the risk of client-side corruption of
the NFS write semantics. 

--
Dan.


pgpDoytlEmnjP.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Troubleshooting dedup performance

2009-12-23 Thread Richard Elling

On Dec 23, 2009, at 3:00 PM, Michael Herf wrote:

For me, arcstat.pl is a slam-dunk predictor of dedup throughput. If  
my "miss%" is in the single digits, dedup write speeds are  
reasonable. When the arc misses go way up, dedup writes get very  
slow. So my guess is that this issue depends entirely on whether or  
not the DDT is in RAM or not. I don't have any L2ARC.


Yep, seems consistent with my tests.  I'm currently seeing 43.6  
million zap-unique
entries consume approximately 12 GBytes of metadata space.  This is  
dog slow to
write on a machine with only 8 GBytes of RAM and a single HDD in the  
pool.  The
writes are relatively fast, but all of the time is spent doing random  
reads, which is

not a recipe for success with HDDs.

I don't know the ARC design exactly, but I can imagine that DDT is  
getting flushed out by other filesystem activity, even though  
keeping it in RAM is very critical to write performance.


e.g., currently I'm doing a big chmod -R, an rsync, and a zfs send/ 
receive (when jobs like this take a week, it piles up.) And right  
now my miss% is consistently >50% on a machine with 6GB ram. My  
writes are terrifically slow, like 3MB/sec.


Can anyone comment if it is possible to tell the kernel/ARC "keep  
more DDT in RAM"?

If not, could it be possible in a future kernel?


I'm leaning to the notion that an SSD cache device will be required  
for any sort
of dedup performance.  Or a huge amount of RAM, of course :-)   
Unfortunately,
the DDT routines are not currently instrumented (b129), so it might  
take a while

to fully understand what instrumentation would be most useful.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Moving a pool from FreeBSD 8.0 to opensolaris

2009-12-23 Thread Thomas Burgess
On Wed, Dec 23, 2009 at 6:07 PM, Ian Collins  wrote:

>
>
>  Is the pool on slices or whole drives?  If the latter, you should be able
> to import the pool (unless BSD introduces any incompatibilities).


It's on whole disks but if i remember right those disks are tied to the
highpoint raid card.  I didn't know about passthrough at the time.

This is why i'm asking about making a NEW pool.   I figure i can make a new
pool connected as passthrough devices.  What i don't know is how i should
partition the drives, or if i should at all.  FreeBSD has it's own type of
partitions, then it has internal slices.  On the current pool, the drives
showed up as /dev/da0  /dev/da1  /dev/da2, so on and so forth.  The only
thing i did was use glabel to make sure the zpool always used the same drive
incase the letters changed.  so my current pool is actually created using
the glabel names and not the /dev/da0 /dev/da1 devices, but i'm fairly sure
it's about the same.

What i'm thinking is, if i add the drives to the current 8 slots i have free
(i have 4 sata ports on the current motherboard and 4 more left on the raid
card)  and set the 4 on the raidcard as passthrough devices, then create a
new zpool and copy the important stuff over, then i should be able to import
that into opensolaris.

What i DONT know iswill opensolaris recognized the drivesdo i need
to use some sort of partioning scheme (Geom, GPT, whatever)  or should i
just try to use raw drives.

I do not want to lose my datai figured this is the best place to ask.

I'm not sure about BSD, but Solaris ZFS works with whole devices.  So there
> isn't any OS specific formatting involved.  I assume BSD does the same.
>
> > Also, is it possible to install opensolaris to compact flash cards?  The
> > reason i ask is that i know the root pool can't be raidz.
>
> Yes, I had a system booting of CD cards (in an IDE/CF adapter).
>
> --
>
> Ian.
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Moving a pool from FreeBSD 8.0 to opensolaris

2009-12-23 Thread Ian Collins

On Thu 24/12/09 10:31 , "Thomas Burgess" wonsl...@gmail.com sent:
> I was wondering what the best method of moving a pool from FreeBSD 8.0 to
> OpenSolaris is.
> 
> When i originally built my system, it was using hardware which wouldn't
> work in opensolairs, but i'm about to do an upgrade so i should be able to
> use Opensolaris when i'm done.
> 

> I know i probably can't import my current pool into opensolaris, but i was
> thinking i could use the 8 drives and create a pool which i COULD import,
> using that to back up my data.

 Is the pool on slices or whole drives?  If the latter, you should be able to 
import the pool (unless BSD introduces any incompatibilities).

> I'm not sure how to go about it.  Basically, how should i format my
> drives in FreeBSD, create a ZPOOL which can be imported into OpenSolaris.

I'm not sure about BSD, but Solaris ZFS works with whole devices.  So there 
isn't any OS specific formatting involved.  I assume BSD does the same.

> Also, is it possible to install opensolaris to compact flash cards?  The
> reason i ask is that i know the root pool can't be raidz.

Yes, I had a system booting of CD cards (in an IDE/CF adapter).
 
-- 

Ian.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Benchmarks results for ZFS + NFS, using SSD's as slog devices (ZIL)

2009-12-23 Thread Jeroen Roodhart
-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160

Hi Richard, ZFS-discuss.

> Message: 2
> Date: Wed, 23 Dec 2009 09:49:18 -0800
> From: Richard Elling 
> To: Auke Folkerts 
> Cc: zfs-discuss@opensolaris.org
> Subject: Re: [zfs-discuss] Benchmarks results for ZFS + NFS,using
> SSD's as slog devices (ZIL)
> Message-ID: <40070921-f894-4146-9e4c-7570d52c8...@gmail.com>
> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
>
> Some questions below...
>
> On Dec 23, 2009, at 8:27 AM, Auke Folkerts wrote:
>


Filling in for Auke here,

>> > The raw data as well as the graphs that I created are available on
>> > request, should people be interested.
>
> Yes, can you post somewhere?

I've put the results here, tests are run under nv129:

http://www.science.uva.nl/~jeroen/solaris11_iozone_nfs2zfs

Original measurements (with iozone headers) are in:

http://www.science.uva.nl/~jeroen/solaris11_iozone_nfs2zfs/originals/


>
> Questions:
> 1. Client wsize?

We usually set these to 342768 but this was tested with CenOS
defaults: 8192 (were doing this over NFSv3)
> 2. Client NFS version?

NFSv3 (earlier tests show about 15% improvement using v4, but we still
use v3 in production).


> 3. logbias settings?

Throughput for runs labeled "throughput" otherwise defaults.


> 4. Did you test with a Solaris NFS client?  If not, why not?

We didn't, because our production environment consists of Solaris
servers and Linux/MS Windows clients.


> UFS is a totally different issue, sync writes are always sync'ed.
>
> I don't work for Sun, but it would be unusual for a company to accept
> willful negligence as a policy.  Ambulance chasing lawyers love that
> kind of thing.

The Thor replaces a geriatric Enterprise system running Solaris 8 over
UFS. For these workloads it beat the pants out of our current setup
and somehow the "but you're safer now" argument doesn't go over very
well :)

We are under the impression that a setup that server NFS over UFS has
the same assurance level than a setup using "ZFS without ZIL". Is this
impression false?

If it isn't then offering a tradeoff between "same assurance level as
you are used to with better performance" or "better assurance level
but for random-IO significant performance hits" doesn't seem too wrong
to me. In the first case you still have the ZFS guarantees once data
is "on disk"...

Thanks in advance for your insights,

With kind regards,

Jeroen

- --
Jeroen Roodhart
IT Consultant
 University of Amsterdam   
j.r.roodh...@uva.nl  Informatiseringscentrum 
Tel. 020 525 7203
- --
See http://www.science.uva.nl/~jeroen for openPGP public key
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iD8DBQFLMqKT37AP1zFtDU0RAxeCAKDcglo2n0Q8Sx0tGyzx+MEGJt90TwCfWm59
JbGdTavhenqSrQEtGUvPZaw=
=K25S
-END PGP SIGNATURE-

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Troubleshooting dedup performance

2009-12-23 Thread Michael Herf
For me, arcstat.pl is a slam-dunk predictor of dedup throughput. If my
"miss%" is in the single digits, dedup write speeds are reasonable. When the
arc misses go way up, dedup writes get very slow. So my guess is that this
issue depends entirely on whether or not the DDT is in RAM or not. I don't
have any L2ARC.

I don't know the ARC design exactly, but I can imagine that DDT is getting
flushed out by other filesystem activity, even though keeping it in RAM is
very critical to write performance.

e.g., currently I'm doing a big chmod -R, an rsync, and a zfs send/receive
(when jobs like this take a week, it piles up.) And right now my miss% is
consistently >50% on a machine with 6GB ram. My writes are terrifically
slow, like 3MB/sec.

Can anyone comment if it is possible to tell the kernel/ARC "keep more DDT
in RAM"?
If not, could it be possible in a future kernel?

mike


On Wed, Dec 23, 2009 at 9:35 AM, Richard Elling wrote:

> On Dec 23, 2009, at 7:45 AM, Markus Kovero wrote:
>
>  Hi, I threw 24GB of ram and couple latest nehalems at it and dedup=on
>> seemed to cripple performance without actually using much cpu or ram. it's
>> quite unusable like this.
>>
>
> What does the I/O look like?  Try "iostat -zxnP 1" and see if there are a
> lot
> of small (2-3 KB) reads.  If so, use "iopattern.ksh -r" to see how random
> the reads are.
>
> http://www.richardelling.com/Home/scripts-and-programs-1/iopattern
>
> If you see 100% small random reads from the pool (ignore writes), then
> that is the problem. Solution TBD.
>  -- richard
>
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] getting decent NFS performance

2009-12-23 Thread Marion Hakanson
erik.trim...@sun.com said:
> The suggestion was to make the SSD on each machine an iSCSI volume, and  add
> the two volumes as a mirrored ZIL into the zpool. 

I've mentioned the following before

For a poor-person's slog which gives decent NFS performance, we have had
good results with allocating a slice on (e.g.) an X4150's internal disk,
behind the internal Adaptec RAID controller.  Said controller has only
256MB of NVRAM, but it made a big difference with NFS performance (look
for the "tar unpack" results at the bottom of the page):

http://acc.ohsu.edu/~hakansom/j4400_bench.html

You can always replace them when funding for your Zeus SSD's comes in (:-).

Regards,

-- 
Marion Hakanson 
OHSU Advanced Computing Center


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Moving a pool from FreeBSD 8.0 to opensolaris

2009-12-23 Thread Thomas Burgess
I was wondering what the best method of moving a pool from FreeBSD 8.0 to
OpenSolaris is.

When i originally built my system, it was using hardware which wouldn't work
in opensolairs, but i'm about to do an upgrade so i should be able to use
Opensolaris when i'm done.

My current system uses a Highpoint RocketRaid 2340.  It has 12 1TB hard
drives an intel core2 quad Q9550 and 8gb of ddr2 800 desktop ram.

I'm going to buy 8 more 1TB drives, a Supermicro MBD-X7SBE motherboard 2
Supermicro AOC-SAT2-MV8 raid cards and 8 gb of ddr2 800 registered ecc
memory

I know i probably can't import my current pool into opensolaris, but i was
thinking i could use the 8 drives and create a pool which i COULD import,
using that to back up my data.

I'm not sure how to go about it.  Basically, how should i format my drives
in FreeBSD, create a ZPOOL which can be imported into OpenSolaris.

Once i can do that, i'd like to create create a new pool in opensolaris
using the old drives copy the data back, then destory the pool i used to
migrate the data, so i can add those drives to my new opensolaris pool


Also, is it possible to install opensolaris to compact flash cards?  The
reason i ask is that i know the root pool can't be raidz.

My FreeBSD install is on 2 compact flash cards which i used to boot the
system.  I used 2 sata to compact flash adapters.
http://www.newegg.com/Product/Product.aspx?Item=N82E16812186051&cm_re=sata_to_compact_flash-_-12-186-051-_-Product

I would like to use these 2 cards as my root pool if possible.  They are 8
gb each.

Thanks for any help/suggestions
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS- When do you add more memory?

2009-12-23 Thread Yanjun (Tiger) Hu

Hi Jim,

I think Tony was asking a very valid question. It reminds me 
http://developers.sun.com/solaris/articles/sol8memory.html#where.


Regards,
Tiger

Jim Mauro wrote:

Hi Anthony -

I don't get this. How does the presence (or absence) of the ARC change
the methodology for doing memory capacity planning?

Memory capacity planning is all about identifying and measuring 
consumers.

Memory consumers;
- The kernel.
- User processes.
- The ZFS ARC, which is technically part of the kernel, but it should be
   tracked and measured seperately.

prstat, pmap, kstat -n arcstats and "echo ::memstat | mdb -k" will 
tell you

pretty much everything you need to know about how much memory
is being consumed by the consumers.

Now granted, it is harder than I'm making it sound here.
User processes can be tricky if they share memory, but typically
that's not a big deal unless it's a database load or an application that
explicitly uses shared memory as a form of IPC.

The ARC stuff is also tricky, because it's really hard to determine what
the active working set is for file system data, you want the ARC big
enough to deliver acceptable performance, but not so big as to
potentially cause short term memory shortfalls. It requires some
observation and period collection of statistics, the most important
statistic being the level of performance of the customers workload.

As an aside, there's nothing about this that requires it be posted
to zfs-discuss-confidential. I posted to zfs-disc...@opensolaris.org.


Thanks,
/jim


Anthony Benenati wrote:

Jim,

The issue with using scan rate alone is if you are looking for why 
you have significant performance degradation and scan rate is high 
it's a good indicator that you may have a memory issue however it 
doesn't help if you want to preemptively prevent future system 
degradation since it's not predictive. There are no thresholds that 
can be correlated to memory size for capacity planning.


I should be more clear with my question. How does a client determine 
when and how much memory they need in the future if they can't track 
memory utilization without including ARC?  Most customers  monitor 
their memory utilization and take action when they see memory 
utilization at a policy determined threshold to prevent potential 
future performance degradation or for capacity planning.
From what I've read searching the aliases, there doesn't seem to be a 
good way to determine how much memory is being used by the system 
without including ARC. If that's the case,  it seems to me we either 
need to give them that capability or offer an alternative for 
capacity planning.


Tony

On Dec 23, 2009, at 12:45 PM, Jim Laurent wrote:

I believe that the SR (scan rate) column in vmstat is still the best 
indicator of when the applications space is limited.  The scan rate 
indicates that the page scanner is running and looking for 
applications pages to move out of physical RAM onto SWAP.


More information about ZFS memory usage is available at:

http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Memory_and_Dynamic_Reconfiguration_Recommendations 




On Dec 22, 2009, at 6:02 PM, anthony.benen...@sun.com 
 wrote:



My customer asks the following:

"Since we started using ZFS to manage our systems root file systems 
we've noticed that the memory utilization on those systems is near 
100%. We understand that this is ZFS's arc cache taking advantage 
of the unused system memory however it doesn't look very good for 
our system monitors & graphs.
Is there anyway to report the memory utilization of these system 
without taking into account ZFS's arc cache memory utilization? "


While I have a couple of  not perfect ideas on answering his 
question as stated, however the reason behind these request are to 
determine when the system is reaching its maximum memory 
utilization where by they may need to add more memory,  or to help 
resolve a performance issue which may or may not be caused by a 
memory deficiency. Since with ZFS memory utilization is no longer a 
good indicator for memory deficiency, what should you be looking at 
to determine if you have a memory deficiency. Is it similar to UFS 
such as high scan rate, excessive paging, etc? Is so, how do you 
determine thresholds.


Any  documents, comments or opinions would be welcome.

Thanks,
Tony


 * Jim Laurent *
Architect

Phone x24859/+1 703 204 4859
Mobile 703-624-7000
Fax 703-208-5858
Email jim.laur...@sun.com  


*Sun Microsystems, Inc.*


7900 Westpark Dr, A110
McLean, VA 22102 US





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



--
Yanjun (Tiger) Hu - Sun Professional Services Canada
Cell: 416-892-0999 


___
zf

Re: [zfs-discuss] ZFS- When do you add more memory?

2009-12-23 Thread Jim Mauro

Hi Anthony -

I don't get this. How does the presence (or absence) of the ARC change
the methodology for doing memory capacity planning?

Memory capacity planning is all about identifying and measuring consumers.
Memory consumers;
- The kernel.
- User processes.
- The ZFS ARC, which is technically part of the kernel, but it should be
   tracked and measured seperately.

prstat, pmap, kstat -n arcstats and "echo ::memstat | mdb -k" will tell you
pretty much everything you need to know about how much memory
is being consumed by the consumers.

Now granted, it is harder than I'm making it sound here.
User processes can be tricky if they share memory, but typically
that's not a big deal unless it's a database load or an application that
explicitly uses shared memory as a form of IPC.

The ARC stuff is also tricky, because it's really hard to determine what
the active working set is for file system data, you want the ARC big
enough to deliver acceptable performance, but not so big as to
potentially cause short term memory shortfalls. It requires some
observation and period collection of statistics, the most important
statistic being the level of performance of the customers workload.

As an aside, there's nothing about this that requires it be posted
to zfs-discuss-confidential. I posted to zfs-disc...@opensolaris.org.


Thanks,
/jim


Anthony Benenati wrote:

Jim,

The issue with using scan rate alone is if you are looking for why you 
have significant performance degradation and scan rate is high it's a 
good indicator that you may have a memory issue however it doesn't 
help if you want to preemptively prevent future system degradation 
since it's not predictive. There are no thresholds that can be 
correlated to memory size for capacity planning.


I should be more clear with my question. How does a client determine 
when and how much memory they need in the future if they can't track 
memory utilization without including ARC?  Most customers  monitor 
their memory utilization and take action when they see memory 
utilization at a policy determined threshold to prevent potential 
future performance degradation or for capacity planning. 

From what I've read searching the aliases, there doesn't seem to be a 
good way to determine how much memory is being used by the system 
without including ARC. If that's the case,  it seems to me we either 
need to give them that capability or offer an alternative for capacity 
planning.


Tony

On Dec 23, 2009, at 12:45 PM, Jim Laurent wrote:

I believe that the SR (scan rate) column in vmstat is still the best 
indicator of when the applications space is limited.  The scan rate 
indicates that the page scanner is running and looking for 
applications pages to move out of physical RAM onto SWAP.


More information about ZFS memory usage is available at:

http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Memory_and_Dynamic_Reconfiguration_Recommendations


On Dec 22, 2009, at 6:02 PM, anthony.benen...@sun.com 
 wrote:



My customer asks the following:

"Since we started using ZFS to manage our systems root file systems 
we've noticed that the memory utilization on those systems is near 
100%. We understand that this is ZFS's arc cache taking advantage of 
the unused system memory however it doesn't look very good for our 
system monitors & graphs.
Is there anyway to report the memory utilization of these system 
without taking into account ZFS's arc cache memory utilization? "


While I have a couple of  not perfect ideas on answering his 
question as stated, however the reason behind these request are to 
determine when the system is reaching its maximum memory utilization 
where by they may need to add more memory,  or to help resolve a 
performance issue which may or may not be caused by a memory 
deficiency. Since with ZFS memory utilization is no longer a good 
indicator for memory deficiency, what should you be looking at to 
determine if you have a memory deficiency. Is it similar to UFS such 
as high scan rate, excessive paging, etc? Is so, how do you 
determine thresholds.


Any  documents, comments or opinions would be welcome.

Thanks,
Tony


  * Jim Laurent *
Architect

Phone x24859/+1 703 204 4859
Mobile 703-624-7000
Fax 703-208-5858
Email jim.laur...@sun.com  


*Sun Microsystems, Inc.*


7900 Westpark Dr, A110
McLean, VA 22102 US





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] quotas on zfs at solaris 10 update 9 (10/09)

2009-12-23 Thread Len Zaifman
2 Things

1) solaris 09/10 is solaris 10 update 8 , not 9 - sorry for the confusion
2) setting userqu...@user at solaris10u8 and looking from a linux nfs client 
with quotas installed:


Disk quotas for user leonardz (uid 1006):
 Filesystem  blocks   quota   limit   grace   files   quota   limit   grace
hpffs04:/cfgwas/home
   1030*   10241024   0   0   0
hpffs26:/zfs_hpf/home
3251842  4194304 4194304   0   0   0


I can see the quotas set under zfs.

That is great.

Because we have users who will create millions of files in a directory it would 
be nice to report the number of files a user has or a group has in a filesystem.

Is there a way (other than find) to get this?

Len Zaifman
Systems Manager, High Performance Systems
The Centre for Computational Biology
The Hospital for Sick Children
555 University Ave.
Toronto, Ont M5G 1X8

tel: 416-813-5513
email: leona...@sickkids.ca

From: zfs-discuss-boun...@opensolaris.org [zfs-discuss-boun...@opensolaris.org] 
On Behalf Of Len Zaifman [leona...@sickkids.ca]
Sent: December 10, 2009 2:49 PM
To: zfs-discuss@opensolaris.org
Subject: [zfs-discuss] quotas on zfs at solaris 10 update 9 (10/09)

We have just update a major file server to solaris 10 update 9 so that we can 
control user and group disk usage on a single filesystem.

We were using qfs and one nice thing about samquota was that it told you your 
soft limit, your hard limit and your usage on disk space and on the number of 
files.

Is there , on solaris 10 U9 a command which will report

usera  filespace (kb)  number of files
limit 1048576   10
used1024   3

or something like that? I can get the space limit, but not the space usage or 
the number of files using zfs get userqu...@u rpool/home



Len Zaifman
Systems Manager, High Performance Systems
The Centre for Computational Biology
The Hospital for Sick Children
555 University Ave.
Toronto, Ont M5G 1X8

tel: 416-813-5513
email: leona...@sickkids.ca

This e-mail may contain confidential, personal and/or health 
information(information which may be subject to legal restrictions on use, 
retention and/or disclosure) for the sole use of the intended recipient. Any 
review or distribution by anyone other than the person for whom it was 
originally intended is strictly prohibited. If you have received this e-mail in 
error, please contact the sender and delete all copies.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

This e-mail may contain confidential, personal and/or health 
information(information which may be subject to legal restrictions on use, 
retention and/or disclosure) for the sole use of the intended recipient. Any 
review or distribution by anyone other than the person for whom it was 
originally intended is strictly prohibited. If you have received this e-mail in 
error, please contact the sender and delete all copies.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recovering ZFS stops after syseventconfd can't fork

2009-12-23 Thread Carson Gaspar

Paul Armstrong wrote:

I'm surprised at the number as well.

Running it again, I'm seeing it jump fairly high just before the fork errors:
bash-4.0# ps -ef | grep zfsdle | wc -l
   20930

(the next run of ps failed due to the fork error).
So maybe it is running out of processes.

ZFS file data from ::memstat just went down to 29MiB (from 22GiB) too which may 
or may not be related.

Message was edited by: psa


Note that I saw the exact same thing when my pool got trashed. My fix 
was to rename 
/etc/sysevent/config/SUNW,EC_dev_status,ESC_dev_dle,sysevent.conf


I _suspect_ the problem is that the developers don't expect zfsdle to 
hang. So they don't bother to use a lock or check if one is already 
running. They just spawn more, and more, and more...


It would be lovely if someone who understands what this creature is were 
to fix this rather catastrophic bug.


--
Carson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Benchmarks results for ZFS + NFS, using SSD's as slog devices (ZIL)

2009-12-23 Thread Richard Elling

Some questions below...

On Dec 23, 2009, at 8:27 AM, Auke Folkerts wrote:


Hello,


We have performed several tests to measure the performance
using SSD drives for the ZIL.

Tests are performed using a X4540 "Thor" with a zpool consisting of
3 14-disk RaidZ2 vdevs. This fileserver is connected to a Centos 5.4
machine which mounts a filesystem on the zpool via NFS, over a
dedicated, direct, 1Gb ethernet link.

The issue we're trying to resolve by using SSD's, is the much- 
discussed

slow NFS performance when using synchronous IO. Unfortunately,
asynchronous IO is not possible, since the Solaris NFS server is
synchronous by default, and the linux clients are unable to request
asynchronous NFS traffic.

The SSD devices we've used are OCZ Vertex Turbo 30Gb disks.

Data was gathered using iozone from the centos machine:
(iozone -c -e -i 0 -i 1 -i 2 -o -a).

The raw data as well as the graphs that I created are available on
request, should people be interested.


Yes, can you post somewhere?


Since we are interested in using the Thor as an NFS file server
for homedirectories, we are mostly concerned about random write
performance.

We have made the following observations:

- Using SSD devices as ZIL logs yields a factor 2 improvement in  
throughput

  when using a recordsize <= 128k, in comparison to using the internal
  ZIL devices of the pool (ie. not setting up slog devices).

- With recordsizes of 1MB and up, having the ZIL reside on the raw  
disks of
  pool (no separate slog devices) outperforms using SSD's as a slog  
device.


- Disabling the ZIL altogether yields significantly better performance
  (at least a factor 10).

We had hoped that using SSD's would yield better performance.  It is
possible we will see an improvement with Intel X25-E series SSD's,
but those haven't arrived yet so we can't test that.

An alternative test we performed was extracting a 138Mb tarfile
consisting of ~2000 small files. With the ZIL disabled, extracting
the file took 4 seconds.

With the ZIL enabled, but with no specific slog devices in the pool
(thus using the disks in the pool), extraction took 72seconds. Using
the SSD's as log devices, the time required was reduced to 34 seconds.
This corresponds to the ~factor 2 improvement we noticed using our
iozone benchmark.  For this specific workload, we noticed no  
difference

in using 1 or 2 (striped) slog SSD's.


Questions:
1. Client wsize?
2. Client NFS version?
3. logbias settings?
4. Did you test with a Solaris NFS client?  If not, why not?


At the bottom line, lets end up with a few specific questions:

1. Is this performance using SSD's as expected? Can we expect better  
performance

  using Intel X25-E SSD's?
2. Disabling the ZIL looks like a serious option, after these  
performance
  benchmarks. I would expect to see Disabling the ZIL as an  
"officially supported
  option", given that we all have used UFS for years, which is no  
better in

  terms of reliability. Is there an "Official Sun Response" to this?


UFS is a totally different issue, sync writes are always sync'ed.

I don't work for Sun, but it would be unusual for a company to accept
willful negligence as a policy.  Ambulance chasing lawyers love that
kind of thing.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Troubleshooting dedup performance

2009-12-23 Thread Richard Elling

On Dec 23, 2009, at 7:45 AM, Markus Kovero wrote:

Hi, I threw 24GB of ram and couple latest nehalems at it and  
dedup=on seemed to cripple performance without actually using much  
cpu or ram. it's quite unusable like this.


What does the I/O look like?  Try "iostat -zxnP 1" and see if there  
are a lot
of small (2-3 KB) reads.  If so, use "iopattern.ksh -r" to see how  
random

the reads are.

http://www.richardelling.com/Home/scripts-and-programs-1/iopattern

If you see 100% small random reads from the pool (ignore writes), then
that is the problem. Solution TBD.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS pool is in a degraded state and I need some suggestions

2009-12-23 Thread Len Zaifman
from zpool history
zpool create -f zfs_hpf c6t600A0B8000495A51081F492C644Dd0 
c6t600A0B8000495B1C053148B41F54d0 c6t600A0B8000495B1C053248B42036d0 
c6t600A0B8000495B1C05B948CA87A2d0

these are raid5 devices from a 2540 disk controller : we did not us raidz on top

we cleaned as follows:

2009-12-17.20:47:41 zpool scrub zfs_hpf


and our errors went away

the fc connection the above array failed 2x and each time we needed a scrub to 
clean up the corrupt file error.

Len Zaifman
Systems Manager, High Performance Systems
The Centre for Computational Biology
The Hospital for Sick Children
555 University Ave.
Toronto, Ont M5G 1X8

tel: 416-813-5513
email: leona...@sickkids.ca

From: Chris Williams [chris.d.willi...@gmail.com]
Sent: December 23, 2009 12:22 PM
To: Len Zaifman
Subject: Re: [zfs-discuss] ZFS pool is in a degraded state and I need some  
suggestions

Was your ZFS pool a mirror or raidz?  I didn't think doing a scrub
would help much since it is neither.

Thanks
Chris

On Wed, Dec 23, 2009 at 12:12 PM, Len Zaifman  wrote:
> Chris:
>
> This happened to us recently due to some hardware failures.
>
> zpool scrub poolname
>
> cleared this up for us. We did not try rm damaged file at all.
> Len Zaifman
> Systems Manager, High Performance Systems
> The Centre for Computational Biology
> The Hospital for Sick Children
> 555 University Ave.
> Toronto, Ont M5G 1X8
>
> tel: 416-813-5513
> email: leona...@sickkids.ca
> 
> From: zfs-discuss-boun...@opensolaris.org 
> [zfs-discuss-boun...@opensolaris.org] On Behalf Of Chris Williams 
> [chris.d.willi...@gmail.com]
> Sent: December 23, 2009 12:04 PM
> To: zfs-discuss@opensolaris.org
> Subject: [zfs-discuss] ZFS pool is in a degraded state and I need some  
> suggestions
>
> I have a system that took a RAID6 hardware array and created a ZFS pool on 
> top of it (pool only has one device in it which is the entire RAID6 HW 
> array).  A few weeks ago, the Sun v440 somehow got completely wrapped around 
> the axle and the operating system had to be rebuilt.  Once the system was 
> rebuilt, I did a zfs import on the pool. (BTW, I didn't build the 
> systemjust a engineer trying to help out)
>
> Doing a zpool status -v, I saw some files that were damaged.  The issue I am 
> seeing now is that when I delete the damaged files in question, this is how 
> it shows up in the output from zpool status -v
>
> pool1/data1:<0xba2c7>
>
> So my zpool status -v is still showing 1958 errors but instead of showing the 
> paths to the files, I am seeing similar messages to the one above for the 
> files that I deleted.
>
> Other that rebuilding the pool from scratch (which might happen), is there 
> any way to get rid of the this error?  It doesn't look like any new errors 
> are occurring, just the original damaged files from when the system died.  I 
> thought about running a scrub but I don't know if that will do much since it 
> isn't a mirror or raidz.
>
> Any ideas would be great
> Thanks!
> Chris
>
> PS: I know this array probably should of been built treating the disks as a 
> JBOD and have ZFS do the raiding.  Unfortunately, nobody asked me when it was 
> built :)
> --
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
> This e-mail may contain confidential, personal and/or health 
> information(information which may be subject to legal restrictions on use, 
> retention and/or disclosure) for the sole use of the intended recipient. Any 
> review or distribution by anyone other than the person for whom it was 
> originally intended is strictly prohibited. If you have received this e-mail 
> in error, please contact the sender and delete all copies.
>

This e-mail may contain confidential, personal and/or health 
information(information which may be subject to legal restrictions on use, 
retention and/or disclosure) for the sole use of the intended recipient. Any 
review or distribution by anyone other than the person for whom it was 
originally intended is strictly prohibited. If you have received this e-mail in 
error, please contact the sender and delete all copies.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS pool is in a degraded state and I need some suggestions

2009-12-23 Thread Len Zaifman
Chris:

This happened to us recently due to some hardware failures.

zpool scrub poolname

cleared this up for us. We did not try rm damaged file at all.
Len Zaifman
Systems Manager, High Performance Systems
The Centre for Computational Biology
The Hospital for Sick Children
555 University Ave.
Toronto, Ont M5G 1X8

tel: 416-813-5513
email: leona...@sickkids.ca

From: zfs-discuss-boun...@opensolaris.org [zfs-discuss-boun...@opensolaris.org] 
On Behalf Of Chris Williams [chris.d.willi...@gmail.com]
Sent: December 23, 2009 12:04 PM
To: zfs-discuss@opensolaris.org
Subject: [zfs-discuss] ZFS pool is in a degraded state and I need some  
suggestions

I have a system that took a RAID6 hardware array and created a ZFS pool on top 
of it (pool only has one device in it which is the entire RAID6 HW array).  A 
few weeks ago, the Sun v440 somehow got completely wrapped around the axle and 
the operating system had to be rebuilt.  Once the system was rebuilt, I did a 
zfs import on the pool. (BTW, I didn't build the systemjust a engineer 
trying to help out)

Doing a zpool status -v, I saw some files that were damaged.  The issue I am 
seeing now is that when I delete the damaged files in question, this is how it 
shows up in the output from zpool status -v

pool1/data1:<0xba2c7>

So my zpool status -v is still showing 1958 errors but instead of showing the 
paths to the files, I am seeing similar messages to the one above for the files 
that I deleted.

Other that rebuilding the pool from scratch (which might happen), is there any 
way to get rid of the this error?  It doesn't look like any new errors are 
occurring, just the original damaged files from when the system died.  I 
thought about running a scrub but I don't know if that will do much since it 
isn't a mirror or raidz.

Any ideas would be great
Thanks!
Chris

PS: I know this array probably should of been built treating the disks as a 
JBOD and have ZFS do the raiding.  Unfortunately, nobody asked me when it was 
built :)
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

This e-mail may contain confidential, personal and/or health 
information(information which may be subject to legal restrictions on use, 
retention and/or disclosure) for the sole use of the intended recipient. Any 
review or distribution by anyone other than the person for whom it was 
originally intended is strictly prohibited. If you have received this e-mail in 
error, please contact the sender and delete all copies.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS pool is in a degraded state and I need some suggestions

2009-12-23 Thread Chris Williams
I have a system that took a RAID6 hardware array and created a ZFS pool on top 
of it (pool only has one device in it which is the entire RAID6 HW array).  A 
few weeks ago, the Sun v440 somehow got completely wrapped around the axle and 
the operating system had to be rebuilt.  Once the system was rebuilt, I did a 
zfs import on the pool. (BTW, I didn't build the systemjust a engineer 
trying to help out)

Doing a zpool status -v, I saw some files that were damaged.  The issue I am 
seeing now is that when I delete the damaged files in question, this is how it 
shows up in the output from zpool status -v

pool1/data1:<0xba2c7>

So my zpool status -v is still showing 1958 errors but instead of showing the 
paths to the files, I am seeing similar messages to the one above for the files 
that I deleted.

Other that rebuilding the pool from scratch (which might happen), is there any 
way to get rid of the this error?  It doesn't look like any new errors are 
occurring, just the original damaged files from when the system died.  I 
thought about running a scrub but I don't know if that will do much since it 
isn't a mirror or raidz.

Any ideas would be great
Thanks!
Chris

PS: I know this array probably should of been built treating the disks as a 
JBOD and have ZFS do the raiding.  Unfortunately, nobody asked me when it was 
built :)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Benchmarks results for ZFS + NFS, using SSD's as slog devices (ZIL)

2009-12-23 Thread Auke Folkerts
Hello, 


We have performed several tests to measure the performance 
using SSD drives for the ZIL. 

Tests are performed using a X4540 "Thor" with a zpool consisting of
3 14-disk RaidZ2 vdevs. This fileserver is connected to a Centos 5.4
machine which mounts a filesystem on the zpool via NFS, over a 
dedicated, direct, 1Gb ethernet link. 

The issue we're trying to resolve by using SSD's, is the much-discussed
slow NFS performance when using synchronous IO. Unfortunately,
asynchronous IO is not possible, since the Solaris NFS server is 
synchronous by default, and the linux clients are unable to request
asynchronous NFS traffic.

The SSD devices we've used are OCZ Vertex Turbo 30Gb disks.

Data was gathered using iozone from the centos machine: 
(iozone -c -e -i 0 -i 1 -i 2 -o -a).

The raw data as well as the graphs that I created are available on
request, should people be interested.

Since we are interested in using the Thor as an NFS file server
for homedirectories, we are mostly concerned about random write
performance.

We have made the following observations:

 - Using SSD devices as ZIL logs yields a factor 2 improvement in throughput
   when using a recordsize <= 128k, in comparison to using the internal
   ZIL devices of the pool (ie. not setting up slog devices).

 - With recordsizes of 1MB and up, having the ZIL reside on the raw disks of
   pool (no separate slog devices) outperforms using SSD's as a slog device.

 - Disabling the ZIL altogether yields significantly better performance 
   (at least a factor 10).

We had hoped that using SSD's would yield better performance.  It is
possible we will see an improvement with Intel X25-E series SSD's,
but those haven't arrived yet so we can't test that.

An alternative test we performed was extracting a 138Mb tarfile
consisting of ~2000 small files. With the ZIL disabled, extracting
the file took 4 seconds.

With the ZIL enabled, but with no specific slog devices in the pool
(thus using the disks in the pool), extraction took 72seconds. Using
the SSD's as log devices, the time required was reduced to 34 seconds.
This corresponds to the ~factor 2 improvement we noticed using our
iozone benchmark.  For this specific workload, we noticed no difference
in using 1 or 2 (striped) slog SSD's.


At the bottom line, lets end up with a few specific questions: 

1. Is this performance using SSD's as expected? Can we expect better performance
   using Intel X25-E SSD's?
2. Disabling the ZIL looks like a serious option, after these performance 
   benchmarks. I would expect to see Disabling the ZIL as an "officially 
supported 
   option", given that we all have used UFS for years, which is no better in 
   terms of reliability. Is there an "Official Sun Response" to this?



with kind regards,
Auke Folkerts
University of Amsterdam


pgpdRrzeL41WP.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS crypto video

2009-12-23 Thread David Magda
Deirdre has posted a video of the presentation Darren Muffat gave at  
the November 2009 Solaris Security Summit:


http://blogs.sun.com/video/entry/zfs_crypto_data_encryption_for

Slides (470 KB PDF):

http://wikis.sun.com/download/attachments/164725359/osol-sec-sum-09-zfs.pdf

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] raidz data loss stories?

2009-12-23 Thread Bob Friesenhahn

On Tue, 22 Dec 2009, Marty Scholes wrote:


If there is a RAIDZ write penalty over mirroring, I am unaware of 
it.  In fact, sequential writes are faster under RAIDZ.


There is always an IOPS penalty for raidz when writing or reading, 
given a particular zfs block size.  There may be a write penalty for 
mirroring, but this depends heavily on whether the I/O paths are 
saturated or operate in parallel.  It is true that a mirror requires a 
write for each mirror device, but if the I/O subsystem has the 
bandwidth for it, the cost of this can be astonishingly insignificant. 
It becomes significant when the I/O path is shared with limited 
bandwidth and the writes are large.


As to whether sequential writes are faster under raidz, I have yet to 
see any actual evidence of that.  Perhaps someone can provide some 
actual evidence?


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Troubleshooting dedup performance

2009-12-23 Thread Markus Kovero
Hi, I threw 24GB of ram and couple latest nehalems at it and dedup=on seemed to 
cripple performance without actually using much cpu or ram. it's quite unusable 
like this.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] getting decent NFS performance

2009-12-23 Thread Erik Trimble

Andrey Kuzmin wrote:

And how do you expect the mirrored iSCSI volume to work after
failover, with secondary (ex-primary) unreachable?

Regards,
Andrey
  
As a normal Degraded mirror.   No problem. 

The suggestion was to make the SSD on each machine an iSCSI volume, and 
add the two volumes as a mirrored ZIL into the zpool.



It's a (relatively) simple and ingenious suggestion. 


-Erik



On Wed, Dec 23, 2009 at 9:40 AM, Erik Trimble  wrote:
  

Charles Hedrick wrote:


Is ISCSI reliable enough for this?

  

YES.

The original idea is a good one, and one that I'd not thought of.  The (old)
iSCSI implementation is quite mature, if not anywhere as nice
(feature/flexibility-wise) as the new COMSTAR stuff.

I'm thinking that just putting in a straight-through cable between the two
machine is the best idea here, rather than going through a switch.

--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss





--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] getting decent NFS performance

2009-12-23 Thread Andrey Kuzmin
And how do you expect the mirrored iSCSI volume to work after
failover, with secondary (ex-primary) unreachable?

Regards,
Andrey




On Wed, Dec 23, 2009 at 9:40 AM, Erik Trimble  wrote:
> Charles Hedrick wrote:
>>
>> Is ISCSI reliable enough for this?
>>
>
> YES.
>
> The original idea is a good one, and one that I'd not thought of.  The (old)
> iSCSI implementation is quite mature, if not anywhere as nice
> (feature/flexibility-wise) as the new COMSTAR stuff.
>
> I'm thinking that just putting in a straight-through cable between the two
> machine is the best idea here, rather than going through a switch.
>
> --
> Erik Trimble
> Java System Support
> Mailstop:  usca22-123
> Phone:  x17195
> Santa Clara, CA
> Timezone: US/Pacific (GMT-0800)
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss