Re: [zfs-discuss] Has anyone used a Dell with a PERC H310?

2012-05-06 Thread Greg Mason
I am currently trying to get two of these things running Illumian. I don't have 
any particular performance requirements, so I'm thinking of using some sort of 
supported hypervisor, (either RHEL and KVM or VMware ESXi) to get around the 
driver support issues, and passing the disks through to an Illumian guest.

The H310 does indeed support pass-through (the non-raid mode), but one thing to 
keep in mind is that I was only able to configure a single boot disk. I 
configured the rear two drives into a hardware raid 1 and set the virtual disk 
as the boot disk so that I can still boot the system if an OS disk fails.

Once Illumos is better supported on the R720 and the PERC H310, I plan to get 
rid of the hypervisor silliness and run Illumos on bare metal.

-Greg

Sent from my iPhone
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Group Quotas

2010-08-18 Thread Greg Mason
> 
> Also the linux NFSv4 client is bugged (as in hang-the-whole-machine bugged).
> I am deploying a new osol fileserver for home directories and I'm using NFSv3 
> + automounter (because I am also using one dataset per user, and thus I have 
> to mount each home dir separately).

We are also in the same boat here. I have about 125TB of ZFS storage in 
production currently, running OSOL, across 5 X4540s. We tried the NFSv4 route, 
and crawled back to NFSv3 and the linux automounter because NFSv4 on Linux is 
*that* broken. As in hung-disk-io-that-wedges-the-whole-box broken. We know 
that NFSv3 was never meant for the scale we're using it at, but we have no 
choice in the matter.

On the topic of Linux clients, NFS and ZFS: We've also found that Linux is bad 
at handling lots of mounts/umounts. We will occasionally find a client where 
the automounter requested a mount, but it never actually completed. It'll show 
as mounted in /proc/mounts, but won't *actually* be mounted. A umount -f for 
the affected filesystem fixes this. On ~250 clients in an HPC environment, 
we'll see such an error every week or so.

I'm hoping that recent versions of linux (i.e. RHEL 6) are a bit better at 
NFSv4, but i'm not holding my breath.

--
Greg Mason
HPC Administrator
Michigan State University
Institute for Cyber Enabled Research
High Performance Computing Center

web: www.icer.msu.edu
email: gma...@msu.edu




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS flar image.

2009-09-14 Thread Greg Mason
As an alternative, I've been taking a snapshot of rpool on the golden 
system, sending it to a file, and creating a boot environment from the 
archived snapshot on target systems. After fiddling with the snapshots a 
little, I then either appropriately anonymize the system or provide it 
with its identity. When it boots up, it's ready to go.


The only downfall to my method is that I still have to run the full 
OpenSolaris installer, and I can't exclude anything in the archive.


Essentially, it's a poor man's flash archive.

-Greg

cindy.swearin...@sun.com wrote:

Hi RB,

We have a draft of the ZFS/flar image support here:

http://opensolaris.org/os/community/zfs/boot/flash/

Make sure you review the Solaris OS requirements.

Thanks,

Cindy

On 09/14/09 11:45, RB wrote:
Is it possible to create flar image of ZFS root filesystem to install 
it to other macines?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Ssd for zil on a dell 2950

2009-08-20 Thread Greg Mason



How about the bug "removing slog not possible"? What if this slog fails? Is 
there a plan for such situation (pool becomes inaccessible in this case)?
  

You can "zpool replace" a bad slog device now.

-Greg
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Ssd for zil on a dell 2950

2009-08-20 Thread Greg Mason
Something our users do quite a bit of is untarring archives with a lot 
of small files. Also, many small, quick writes are also one of the many 
workloads our users have.


Real-world test: our old Linux-based NFS server allowed us to unpack a 
particular tar file (the source for boost 1.37) in around 2-4 minutes, 
depending on load. This machine wasn't special at all, but it had fancy 
SGI disk on the back end, and was using the Linux-specific async NFS option.


We turned up our X4540s, and this same tar unpack took over 17 minutes! 
We disabled the ZIL for testing, and we dropped this to under 1 minute. 
With the X25-E as a slog, we were able to run this test in 2-4 minutes, 
same as the old storage.


That said, I strongly recommend using Richard Elling's zilstat. He's 
posted about it previously on this list. It will help you determine if 
adding a slog device will help your workload or not. I didn't know about 
this script at the time of our testing, so it ended up being some trial 
and error, running various tests on different hardware setups (which 
means creating and destroying quite a few pools).


-Greg

Jorgen Lundman wrote:


Does un-taring something count? It is what I used for our tests.

I tested with ZIL disable, zil cache on /tmp/zil, CF-card (300x) and 
cheap SSD. Waiting for X-25E SSDs to arrive for testing those:


http://mail.opensolaris.org/pipermail/zfs-discuss/2009-July/030183.html

If you want a quick answer, disable ZIL (you need to unmount/mount, 
export/import or reboot) on your ZFS volume and try it. That is the 
theoretical maximum. You can get close to this using various 
technologies, SSD and all that.


I am no expert on this, I knew nothing about it 2 weeks ago.

But for our provisioning engine to untar Movable-Types for customers, 
5 mins to 45secs is quite an improvement. I can get that to 11seconds 
theoretically. (ZIL disable)


Lund


Monish Shah wrote:

Hello Greg,

I'm curious how much performance benefit you gain from the ZIL 
accelerator. Have you measured that?  If not, do you have a gut feel 
about how much it helped?  Also, for what kind of applications does 
it help?


(I know it helps with synchronous writes.  I'm looking for real world 
answers like: "Our XYZ application was running like a dog and we 
added an SSD for ZIL and the response time improved by X%.")


Of course, I would welcome a reply from anyone who has experience 
with this, not just Greg.


Monish

- Original Message - From: "Greg Mason" 
To: "HUGE | David Stahl" 
Cc: "zfs-discuss" 
Sent: Thursday, August 20, 2009 4:04 AM
Subject: Re: [zfs-discuss] Ssd for zil on a dell 2950


Hi David,

We are using them in our Sun X4540 filers. We are actually using 2 SSDs
per pool, to improve throughput (since the logbias feature isn't in an
official release of OpenSolaris yet). I kind of wish they made an 8G or
16G part, since the 32G capacity is kind of a waste.

We had to go the NewEgg route though. We tried to buy some Sun-branded
disks from Sun, but that's a different story. To summarize, we had to
buy the NewEgg parts to ensure a project stayed on-schedule.

Generally, we've been pretty pleased with them. Occasionally, we've had
an SSD that wasn't behaving well. Looks like you can replace log devices
now though... :) We use the 2.5" to 3.5" SATA adapter from IcyDock, in a
Sun X4540 drive sled. If you can attach a standard sata disk to a Dell
sled, this approach would most likely work for you as well. Only issue
with using the third-party parts is that the involved support
organizations for the software/hardware will make it very clear that
such a configuration is quite unsupported. That said, we've had pretty
good luck with them.

-Greg




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Ssd for zil on a dell 2950

2009-08-19 Thread Greg Mason

Hi David,

We are using them in our Sun X4540 filers. We are actually using 2 SSDs 
per pool, to improve throughput (since the logbias feature isn't in an 
official release of OpenSolaris yet). I kind of wish they made an 8G or 
16G part, since the 32G capacity is kind of a waste.


We had to go the NewEgg route though. We tried to buy some Sun-branded 
disks from Sun, but that's a different story. To summarize, we had to 
buy the NewEgg parts to ensure a project stayed on-schedule.


Generally, we've been pretty pleased with them. Occasionally, we've had 
an SSD that wasn't behaving well. Looks like you can replace log devices 
now though... :) We use the 2.5" to 3.5" SATA adapter from IcyDock, in a 
Sun X4540 drive sled. If you can attach a standard sata disk to a Dell 
sled, this approach would most likely work for you as well. Only issue 
with using the third-party parts is that the involved support 
organizations for the software/hardware will make it very clear that 
such a configuration is quite unsupported. That said, we've had pretty 
good luck with them.


-Greg

--
Greg Mason
System Administrator
High Performance Computing Center
Michigan State University


HUGE | David Stahl wrote:
We have a setup with ZFS/ESX/NFS and I am looking to move our zil to a 
solid state drive.
So far I am looking into this one 
http://www.newegg.com/Product/Product.aspx?Item=N82E16820167013

Does anyone have any experience with this drive as a poorman’s logzilla?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] unexpected behavior with "nbmand=on" set

2009-08-19 Thread Greg Mason

It's not too often we see good news on the zfs-discuss list, so here's some:

We at the High Performance Computing Center at MSU have finally worked 
out the root cause of a long-standing issue with our OpenSolaris NFS 
servers. It was a minor configuration issue, involving a ZFS file system 
property.


A little backstory: We chose to go with Sun X4540s, running OpenSolaris 
and ZFS for our home directory space. We initially implemented 100TB of 
usable space. All was well for a while, but then some mostly annoying 
issues started popping up:


1. 0-byte files named '4913' were appearing in user directories. We 
discovered that vi was doing:


open("4913")
close("4913")
remove("4913")

The remove() operation would fail intermittently. With assistance from 
the helpful folks at SGI (because we originally thought this was a Linux 
NFSv4 client problem), testing revealed that this behavior is caused by 
the NFS server on Solaris occasionally returning NFS4ERR_FILE_OPEN, 
which is not handled by the client. According to a Linux NFS kernel 
developer, "the error is usually due to ordering issues with 
asynchronous RPC calls." 
http://www.linux-nfs.org/Linux-2.6.x/2.6.18/linux-2.6.18-068-handle_nfs4err_file_open.dif 
We applied a patch to the Linux NFSv4 client, which told the client to 
wait and retry when the client received that error.


2. There was also an issue with gedit. When opening then saving an 
already existing file, it did:


open("file")
rename("file","file~")

rename() returned "Input/Output Error." After applying the fix for #1, 
rename() hung indefinitely. We also noticed a similar problem with gcc.


Interestingly, running this test locally on the OpenSolaris server on 
same file system, this test resulted in a "permission denied" error. If 
we mounted this same file system over NFSv4 on another OpenSolaris 
system, we received the same "permission denied" error.



Yesterday, we discovered the property 'nbmand' was set on the ZFS file 
systems in question. This was a leftover from our initial testing with 
Solaris CIFS. It was set because the documentation at 
http://dlc.sun.com/osol/docs/content/SSMBAG/managingsmbsharestm.html and 
http://204.152.191.100/wiki/index.php/Getting_Started_With_the_Solaris_CIFS_Service 
instructed that nbmand should be turned on when using CIFS. What isn't 
mentioned, however, is that nbmand can adversely affect the behavior of 
NFSv4 and even local file systems. The ZFS admin guide also states that 
nbmand applies only to CIFS clients, when it actually applies to NFSv4 
clients as well as local file system access.


I think nbmand is also a bit slow in releasing its locks, which explains 
the behavior of bug number 1. The only tests we've run so far show that 
the "slow" locking behavior goes away when nbmand is turned off. Would 
filing a bug about this slow behavior of nbmand be the correct thing to 
do at this point? If so, where is the proper place to file this bug? The 
OpenSolaris BugZilla is where I've been told these bug reports go to, 
but I'm not sure if this should be filed in bugs.opensolaris.org or not.


Disabling nbmand on a test file system resolved both bugs, as well as 
other known issues that our users have been running into. All the 
various known issues this caused can be found at the MSU HPCC wiki: 
https://wiki.hpcc.msu.edu/display/Issues/Known+Issues, under "Home 
Directory file system."


-Greg


--
Greg Mason
System Administrator
High Performance Computing Center
Michigan State University
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Shrinking a zpool?

2009-08-06 Thread Greg Mason



What is the downtime for doing a send/receive? What is the downtime
for zpool export, reconfigure LUN, zpool import?
  
We have a similar situation. Our home directory storage is based on many 
X4540s. Currently, we use rsync to migrate volumes between systems, but 
our process could very easily be switched over to zfs send/receive (and 
very well may be in the near future).


What this looks like, if using zfs send/receive, is we perform an 
initial send (get the bulk of the data over), and then at a planned 
downtime, do an incremental send to "catch up" the destination. This 
"catch up" phase is usually a very small fraction of the overall size of 
the volume. The only downtime required is from just before the final 
snapshot you send (the last incremental), and when the send finishes, 
and turning up whatever service(s) on the destination system. If the 
filesystem a lot of write activity, you can run multiple incrementals to 
decrease the size of that last snapshot. As far as backing out goes, you 
can simply destroy the destination filesystem, and continue running on 
the original system, if all hell breaks loose (of course that never 
happens, right? :)


When everything checks out (which you can safely assume when the recv 
finishes, thanks to how ZFS send/recv works), you then just have to 
destroy the original fileystem. It is correct in that this doesn't 
shrink the pool, but it's at least a workaround to be able to swing 
filesystems around to different systems. If you had only one filesystem 
in the pool, you could then safely destroy the original pool. This does 
mean you'd need 2x the size of the LUN during the transfer though.


For replication of ZFS filesystems, we a similar process, with just a 
lot of incremental sends.


Greg Mason
System Administrator
High Performance Computing Center
Michigan State University
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD's and ZFS...

2009-07-23 Thread Greg Mason
> >> I think it is a great idea, assuming the SSD has good write performance.
> > This one claims up to 230MB/s read and 180MB/s write and it's only $196.
> >
> > http://www.newegg.com/Product/Product.aspx?Item=N82E16820609393
> >
> > Compared to this one (250MB/s read and 170MB/s write) which is $699.
> >
> Oops. Forgot the link:
> 
> http://www.newegg.com/Product/Product.aspx?Item=N82E16820167014
> > Are those claims really trustworthy? They sound too good to be true!
> >
> >  -Kyle

Kyle-

The less expensive SSD is an MLC device. The Intel SSD is an SLC device.
That right there accounts for the cost difference. The SLC device (Intel
X25-E) will last quite a bit longer than the MLC device.

-Greg

-- 
Greg Mason
System Administrator
Michigan State University
High Performance Computing Center

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Question about user/group quotas

2009-07-09 Thread Greg Mason
Thanks for the link Richard,

I guess the next question is, how safe would it be to run snv_114 in
production? Running something that would be technically "unsupported"
makes a few folks here understandably nervous...

-Greg

On Thu, 2009-07-09 at 10:13 -0700, Richard Elling wrote:
> Greg Mason wrote:
> > I'm trying to find documentation on how to set and work with user and
> > group quotas on ZFS. I know it's quite new, but googling around I'm just
> > finding references to a ZFS quota and refquota, which are
> > filesystem-wide settings, not per user/group.
> >   
> 
> Cindy does an excellent job of keeping the ZFS Admin Guide up to date.
> http://opensolaris.org/os/community/zfs/docs/zfsadmin.pdf
> See the section titled "Setting User or Group Quotas on a ZFS File System"
>  -- richard
> > Also, after reviewing a few bugs, I'm a bit confused about which build
> > has user quota support. I recall that snv_111 has user quota support,
> > but not in rquotad. According to bug 6501037, ZFS user quota support is
> > in snv_114. 
> >
> > We're preparing to roll out OpenSolaris 2009.06 (snv_111b), and we're
> > also curious about being able to utilize ZFS user quotas, as we're
> > having problems with NFSv4 on our clients (SLES 10 SP2). We'd like to be
> > able to use NFSv3 for now (one large ZFS filesystem, with user quotas
> > set), until the flaws with our Linux NFS clients can be addressed.
> >
> >   
> 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Question about user/group quotas

2009-07-09 Thread Greg Mason
I'm trying to find documentation on how to set and work with user and
group quotas on ZFS. I know it's quite new, but googling around I'm just
finding references to a ZFS quota and refquota, which are
filesystem-wide settings, not per user/group.

Also, after reviewing a few bugs, I'm a bit confused about which build
has user quota support. I recall that snv_111 has user quota support,
but not in rquotad. According to bug 6501037, ZFS user quota support is
in snv_114. 

We're preparing to roll out OpenSolaris 2009.06 (snv_111b), and we're
also curious about being able to utilize ZFS user quotas, as we're
having problems with NFSv4 on our clients (SLES 10 SP2). We'd like to be
able to use NFSv3 for now (one large ZFS filesystem, with user quotas
set), until the flaws with our Linux NFS clients can be addressed.

-- 
Greg Mason
System Administrator
Michigan State University
High Performance Computing Center

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] importing pool with missing slog followup

2009-06-09 Thread Greg Mason
In my testing, I've seen that trying to duplicate zpool disks with dd
often results in a disk that's unreadable. I believe it has something to
do with the block sizes of dd.

In order to make my own slog backups, I just used cat instead. I plugged
the slog SSD into another system (not a necessary step, but easier in my
case), catted the disk to a file, then put the slog SSD back. I imagine
this needs to be done with the zpool in a cleanly-exported state, i
haven't tested it otherwise. I've also tested replacing an SSD with my
method, just cat the file back to the disk. I've tested this method of
replacing a slog, and the zpool is imported on boot, like nothing
happened, even though the physical hardware has changed.

A question I have is, does "zpool replace" now work for slog devices as
of snv_111b?

-Greg

On Fri, 2009-06-05 at 20:57 -0700, Paul B. Henson wrote:
> My research into recovering from a pool whose slog goes MIA while the pool
> is off-line resulted in two possible methods, one requiring prior
> preparation and the other a copy of the zpool.cache including data for the
> failed pool.
> 
> The first method is to simply dump a copy of the slog device right after
> you make it (just dd if=/dev/dsk/ of=slog.dump>). If the device ever
> failed, theoretically you could restore the image onto a replacement (dd
> if=slog.dump of=/dev/dsk/) and import the pool.
> 
> My initial testing of that method was promising, however that testing was
> performed by intentionally corrupting the slog device, and restoring the
> copy back onto the original device. However, when I tried restoring the
> slog dump onto a different device, that didn't work out so well. zpool
> import recognized the different device as a log device for the pool, but
> still complained there were unknown missing devices and refused to import
> the pool. It looks like the device serial number is stored as part of the
> zfs label, resulting in confusion when that label is restored onto a
> different device. As such, this method is only usable if the underlying
> fault is simply corruption, and the original device is available to restore
> onto.
> 
> The second method is described at:
> 
>   http://opensolaris.org/jive/thread.jspa?messageID=377018
> 
> Unfortunately, the included binary does not run under S10U6, and after half
> an hour or so of trying to get the source code to compile under S10U6 I
> gave up (I found some of the missing header files in the S10U6 grub source
> code package which presumably match the actual data structures in use under
> S10, but there was additional stuff missing which as I started copying it
> out of opensolaris code just started getting messier and messier). Unless
> someone with more zfs-fu than me creates a binary for S10, this approach is
> not going to be viable.
> 
> Unofficially I was told that there is expected to be a fix for this issue
> putback into Nevada around July, but whether or not that might be available
> in U8 wasn't said. So, barring any official release of a fix or unofficial
> availability of a workaround for S10, in the (admittedly unlikely) failure
> mode of a slog device failure on an inactive pool, have good backups :).
> 
> 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [storage-discuss] Supermicro SAS/SATA controllers?

2009-04-15 Thread Greg Mason



And it looks like the Intel fragmentation issue is fixed as well:
http://techreport.com/discussions.x/16739


FYI, Intel recently had a new firmware release. IMHO, odds are that
this will be as common as HDD firmware releases, at least for the
next few years.
http://news.cnet.com/8301-13924_3-10218245-64.html?tag=mncol


It should also be noted that the Intel X25-M != the Intel X25-E. The 
X25-E hasn't had any of the performance and fragmentation issues.


The X25-E is an SLC SSD, the X25-M is an MLC SSD, hence the more complex 
firmware.


-Greg
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Data size grew.. with compression on

2009-04-09 Thread Greg Mason

Harry,

ZFS will only compress data if it is able to gain more than 12% of space 
by compressing the data (I may be wrong on the exact percentage). If ZFS 
can't get get that 12% compression at least, it doesn't bother and will 
just store the block uncompressed.


Also, the default ZFS compression algorithm isn't gzip, so you aren't 
going to get the greatest compression possible, but it is quite fast.


Depending on the type of data, it may not compress well at all, leading 
ZFS to store that data completely uncompressed.


-Greg



All good info thanks.  Still one thing doesn't quite work in your line
of reasoning.   The data on the gentoo linux end is uncompressed.
Whereas it is compressed on the zfs side.

A number of the files are themselves compressed formats such as jpg
mpg avi pdf maybe a few more, which aren't going to compress further
to speak of, but thousands of the files are text files (html).  So
compression should show some downsize.

Your calculation appears to be based on both ends being uncompressed.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs as a cache server

2009-04-09 Thread Greg Mason

Francois,

Your best bet is probably a stripe of mirrors. i.e. a zpool made of many 
mirrors.


This way you have redundancy, and fast reads as well. You'll also enjoy 
pretty quick resilvering in the event of a disk failure as well.


For even faster reads, you can add dedicated L2ARC cache devices (folks 
typically use SSDs for very fast (15k RPM) SAS drives for this).


-Greg

Francois wrote:

Hello list,

What would be the best zpool configuration for a cache/proxy server
(probably based on squid) ?

In other words with which zpool configuration I could expect best
reading performance ? (there'll be some writes too but much less).


Thanks.

--
Francois

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] GSoC 09 zfs ideas?

2009-03-03 Thread Greg Mason

Just my $0.02, but would pool shrinking be the same as vdev evacuation?

I'm quite interested in vdev evacuation as an upgrade path for 
multi-disk pools. This would be yet another reason to for folks to use 
ZFS at home (you only have to buy cheap disks), but it would also be a 
good to have that ability from an enterprise perspective, as I'm sure 
we've all engineered ourselves into a corner one time or another...


It's a much cleaner, safer, and possibly much faster alternative to 
systematically pulling drives and letting zfs resilver onto a larger 
disk, in order to upgrade a pool in-place, and in production.


basically, what I'm thinking is:

zpool remove mypool 

Allow time for ZFS to vacate the vdev(s), and then light up the "OK to 
remove" light on each evacuated disk.


-Greg

Blake Irvin wrote:

Shrinking pools would also solve the right-sizing dilemma.

Sent from my iPhone

On Feb 28, 2009, at 3:37 AM, Joe Esposito  wrote:


I'm using opensolaris and zfs at my house for my photography storage
as well as for an offsite backup location for my employer and several
side web projects.

I have an 80g drive as my root drive.  I recently took posesion of 2
74g 10k drives which I'd love to add as a mirror to replace the 80 g
drive.

From what I gather it is only possible if I zfs export my storage
array and reinstall solaris on the new disks.

So I guess I'm hoping zfs shrink and grow commands show up sooner or 
later.


Just a data point.

Joe Esposito
www.j-espo.com

On 2/28/09, "C. Bergström"  wrote:

Blake wrote:

Gnome GUI for desktop ZFS administration



On Fri, Feb 27, 2009 at 9:13 PM, Blake  wrote:


zfs send is great for moving a filesystem with lots of tiny files,
since it just handles the blocks :)



I'd like to see:

pool-shrinking (and an option to shrink disk A when i want disk B to
become a mirror, but A is a few blocks bigger)


This may be interesting... I'm not sure how often you need to shrink a
pool though?  Could this be classified more as a Home or SME level 
feature?

install to mirror from the liveCD gui


I'm not working on OpenSolaris at all, but for when my projects
installer is more ready /we/ can certainly do this..

zfs recovery tools (sometimes bad things happen)


Agreed.. part of what I think keeps zfs so stable though is the complete
lack of dependence on any recovery tools..  It forces customers to bring
up the issue instead of dirty hack and nobody knows.

automated installgrub when mirroring an rpool


This goes back to an installer option?

./C

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Write caches on X4540

2009-02-12 Thread Greg Mason
well, since the write cache flush command is disabled, I would like this 
to happen as early as practically possible in the bootup process, as ZFS 
will not be issuing the cache flush commands to the disks.


I'm not really sure what happens in the case where the write flush 
command is disabled, something makes its way into the write cache, then 
the cache is disabled. Does this mean the write cache is flushed to disk 
when the cache is disabled? If so, then I guess it's less critical when 
it happens in the bootup process or if it's permanent...


-Greg

A Darren Dunham wrote:

On Thu, Feb 12, 2009 at 10:33:40AM -0500, Greg Mason wrote:
What I'm looking for is a faster way to do this than format -e -d   
-f 

Re: [zfs-discuss] Write caches on X4540

2009-02-12 Thread Greg Mason


Are you sure thar write cache is back on after restart?



Yes, I've checked with format -e, on each drive.

When disabling the write cache with format, it also gives a warning 
stating this is the case.


What I'm looking for is a faster way to do this than format -e -d  
-f 

Re: [zfs-discuss] Write caches on X4540

2009-02-12 Thread Greg Mason

We use several X4540's over here as well, what type of workload do you
have, and how much performance increase did you see by disabling the
write caches?



We see the difference between our tests completing in around 2.5 minutes 
(with write caches) to around a minute an and a half without them, in 
one instance.


I'm trying to optimize our machines for a write-heavy environment, as 
our users will undoubtedly hit this limitation of the machines.


-Greg

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Write caches on X4540

2009-02-11 Thread Greg Mason

We're using some X4540s, with OpenSolaris 2008.11.

According to my testing, to optimize our systems for our specific 
workload, I've determined that we get the best performance with the 
write cache disabled on every disk, and with zfs:zfs_nocacheflush=1 set 
in /etc/system.


The only issue is setting the write cache permanently, or at least quickly.

Right now, as it is, I've scripted up format to run on boot, disabling 
the write cache of all disks. This takes around two minutes. I'd like to 
avoid needing to take this time on every bootup (which is more often 
than you'd think, we've got quite a bit of construction happening, which 
necessitates bringing everything down periodically). This would also be 
painful in the event of unplanned downtime for one of our Thors.


so, basically, my question is: Is there a way to quickly or permanently 
disable the write cache on every disk in an X4540?


Thanks,

-Greg
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Send & Receive (and why does 'ls' modify a snapshot?)

2009-02-04 Thread Greg Mason
Tony,

I believe you want to use "zfs recv -F" to force a rollback on the 
receiving side.

I'm wondering if your ls is updating the atime somewhere, which would 
indeed be a change...

-Greg
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Add SSD drive as L2ARC(?) cache to existing ZFS raid?

2009-02-03 Thread Greg Mason
Orvar,

With my testing, i've seen a 5x improvement with small file creation 
when working specifically with NFS. This is after I added an SSD for the 
ZIL.

I recommend Richard Elling's zilstat (he posted links earlier). It'll 
let you see if a dedicated device for the ZIL will help your specific 
workload.

My understanding is that you'll get "more bang for the buck" using an 
SSD for the ZIL rather than the L2ARC. Performing some of your own 
benchmarks is really the only way see what will help improve performance 
for your specific workload. I recommend reading up on the ZFS ARC and 
L2ARC, to help try to determine if testing a dedicated L2ARC device is 
even worthwhile for your uses. I know it wasn't really helpful for me, 
as our read performance is already great.

As for a specific SSD, I've tested the Intel X25E. It's around $600 or 
so. It's got about half the performance of the snazzy, pricey STEC Zeus 
drives. With the specific workload I was trying to accelerate, I wasn't 
hitting any of the limits of the Intel SSDs (but I was definitely WAY 
past the performance limits of a standard hard disk). Again, all of this 
was for accelerating the ZIL, not for use on the L2ARC, so YMMV.

Fishworks does this. They use an SSD both for the read cache as well as 
the ZIL.

-Greg

Orvar Korvar wrote:
> So are there no guide lines how to add a SSD disk as a home user? Which is 
> the best SSD disk to add? What percentage improvements are typical? Or, will 
> a home user not benefit from adding a SSD drive? It is only enterprise SSD 
> drives that works, together with some esoteric software from Fishworks? It 
> requires Enterprise hardware to get a boost from SSD? Not possible? Or?
> 
> No one has done this yet? What does the Fishworks team say?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] write cache and cache flush

2009-01-30 Thread Greg Mason
I'll give this a script a shot a little bit later today.

For ZIL sizing, I'm using either 1 or 2 32G Intel X25-E SSDs in my 
tests, which, according to what I've read, is 2-4 times larger than the 
maximum that ZFS can possibly use. We've got 32G of system memory in 
these Thors, and (if I'm not mistaken) the maximum amount of in-play 
data can be 16G, 1/2 the system memory.

Also, because I know people will be asking, has anybody ever tried to 
recover from something like a system crash with a ZFS pool that has the 
ZIL disabled? What kind of nightmares would I be facing in such a 
situation? Would I simply just risk losing that in-play data, or could 
more serious things happen? I know disabling the ZIL is an Extremely Bad 
Idea, but I need to tell people exactly why...

-Greg

Jim Mauro wrote:
 > You have SSD's for the ZIL (logzilla) enabled, and ZIL IO
 > is what is hurting your performance...Hmmm
 >
 > I'll ask the stupid question (just to get it out of the way) - is
 > it possible that the logzilla is undersized?
 >
 > Did you gather data using Richard Elling's zilstat (included below)?
 >
 > Thanks,
 > /jim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] write cache and cache flush

2009-01-30 Thread Greg Mason
> If there was a latency issue, we would see such a problem with our 
> existing file server as well, which we do not. We'd also have much 
> greater problems than just file server performance.
> 
> So, like I've said, we've ruled out the network as an issue.

I should also add that I've tested these Thors with the ZIL disabled, 
and they scream! With the cache flush disabled, they also do quite well.

The specific issue i'm trying to solve is the ZIL being slow when using NFS.

I really don't want to have to do something drastic like disabling the 
ZIL to get the performance I need...
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] write cache and cache flush

2009-01-30 Thread Greg Mason
Jim Mauro wrote:
> 
>> This problem only manifests itself when dealing with many small files 
>> over NFS. There is no throughput problem with the network.
> But there could be a _latency_ issue with the network.

If there was a latency issue, we would see such a problem with our 
existing file server as well, which we do not. We'd also have much 
greater problems than just file server performance.

So, like I've said, we've ruled out the network as an issue.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] write cache and cache flush

2009-01-30 Thread Greg Mason
7200 RPM SATA disks.

Tim wrote:
> 
> 
> On Fri, Jan 30, 2009 at 8:24 AM, Greg Mason  <mailto:gma...@msu.edu>> wrote:
> 
> A Linux NFS file server, with a few terabytes of fibre-attached disk,
> using XFS.
> 
> I'm trying to get these Thors to perform at least as well as the current
> setup. A performance hit is very hard to explain to our users.
> 
> 
> What type of spindles were in the FC attached disk?
> 
> --Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] write cache and cache flush

2009-01-30 Thread Greg Mason
I should also add that this "creating many small files" issue is the 
ONLY case where the Thors are performing poorly, which is why I'm 
focusing on it.

Greg Mason wrote:
> A Linux NFS file server, with a few terabytes of fibre-attached disk, 
> using XFS.
> 
> I'm trying to get these Thors to perform at least as well as the current 
> setup. A performance hit is very hard to explain to our users.
> 
>> Perhaps I missed something, but what was your previous setup?
>> I.e. what did you upgrade from?
>> Neil.
>>
>>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 
> 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] write cache and cache flush

2009-01-30 Thread Greg Mason
A Linux NFS file server, with a few terabytes of fibre-attached disk, 
using XFS.

I'm trying to get these Thors to perform at least as well as the current 
setup. A performance hit is very hard to explain to our users.

> Perhaps I missed something, but what was your previous setup?
> I.e. what did you upgrade from?
> Neil.
> 
> 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] write cache and cache flush

2009-01-29 Thread Greg Mason
This problem only manifests itself when dealing with many small files 
over NFS. There is no throughput problem with the network.

I've run tests with the write cache disabled on all disks, and the cache 
flush disabled. I'm using two Intel SSDs for ZIL devices.

This setup is faster than using the two Intel SSDs with write caches 
enabled on all disks, and with the cache flush enabled.

My test would run around 3.5 to 4 minutes, now it is completing in 
abound 2.5 minutes. I still think this is a bit slow, but I still have 
quite a bit of testing to perform. I'll keep the list updated with my 
findings.

I've already established both via this list and through other research 
that ZFS has performance issues over NFS when dealing with many small 
files. This seems to maybe be an issue with NFS itself, where 
NVRAM-backed storage is needed for decent performance with small files. 
Typically such an NVRAM cache is supplied by a hardware raid controller 
in a disk shelf.

I find it very hard to explain to a user why an "upgrade" is a step down 
in performance. For the users these Thors are going to serve, such a 
drastic performance hit is a deal breaker...

I've done my homework on this issue, I've ruled out the network as an 
issue, as well as the NFS clients. I've narrowed my particular 
performance issue down to the ZIL, and how well ZFS plays with NFS.

-Greg

Jim Mauro wrote:
> Multiple Thors (more than 2?), with performance problems.
> Maybe it's the common demnominator - the network.
> 
> Can you run local ZFS IO loads and determine if performance
> is expected when NFS and the network are out of the picture?
> 
> Thanks,
> /jim
> 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] write cache and cache flush

2009-01-29 Thread Greg Mason
the funny thing is that I'm showing a performance improvement over write 
caches + cache flushes.

The only way these pools are being accessed is over NFS. Well, at least 
the only way I care about when it comes to high performance.

I'm pretty sure it would give a performance hit locally, but I don't 
care about local disk performance, I only care about the performance 
over NFS.

Anton B. Rang wrote:
> If all write caches are truly disabled, then disabling the cache flush won't 
> affect the safety of your data.
> 
> It will change your performance characteristics, almost certainly for the 
> worse.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] write cache and cache flush

2009-01-29 Thread Greg Mason
So, I'm still beating my head against the wall, trying to find our 
performance bottleneck with NFS on our Thors.

We've got a couple Intel SSDs for the ZIL, using 2 SSDs as ZIL devices. 
Cache flushing is still enabled, as are the write caches on all 48 disk 
devices.

What I'm thinking of doing is disabling all write caches, and disabling 
the cache flushing.

What would this mean for the safety of data in the pool?

And, would this even do anything to address the performance issue?

-Greg
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Add SSD drive as L2ARC(?) cache to existing ZFS raid?

2009-01-29 Thread Greg Mason
How were you running this test?

were you running it locally on the machine, or were you running it over 
something like NFS?

What is the rest of your storage like? just direct-attached (SAS or 
SATA, for example) disks, or are you using a higher-end RAID controller?

-Greg

kristof wrote:
> Kebabber,
> 
> You can't expose zfs filesystems over iSCSI.
> 
> You only can expose ZFS volumes (raw volumes) over iscsi.
> 
> PS: 2 weeks ago I did a few tests, using filebench.
> 
> I saw little to no improvement using a 32GB Intel X25E SSD.
> 
> Maybe this is because filebench is flushing the cache in between tests.
> 
> I also compared iscsi boot time (using gpxe as boot loader) ,
> 
> We are using raidz storagepool (4disks). here again, adding the X25E as cache 
> device did not speedup the boot proccess. So I did not see real improvement. 
> 
> PS: We have 2 master volumes (xp and vista) which we clone to provision 
> additional guests. 
> 
> I'm now waiting for new SSD disks (STEC Zeus 18GB en STEC Mach 100GB.), since 
> those are used in SUN 7000 product. I hope they perform better.
> 
> Kristof
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD drives in Sun Fire X4540 or X4500 for dedicated ZIL device

2009-01-23 Thread Greg Mason
If i'm not mistaken (and somebody please correct me if i'm wrong), the 
Sun 7000 series storage appliances (the Fishworks boxes) use enterprise 
SSDs, with dram caching. One such product is made by STEC.

My understanding is that the Sun appliances use one SSD for the ZIL, and 
one as a read cache. For the 7210 (which is basically a Sun Fire X4540), 
that gives you 46 disks and 2 SSDs.

-Greg


Bob Friesenhahn wrote:
> On Thu, 22 Jan 2009, Ross wrote:
> 
>> However, now I've written that, Sun use SATA (SAS?) SSD's in their 
>> high end fishworks storage, so I guess it definately works for some 
>> use cases.
> 
> But the "fishworks" (Fishworks is a development team, not a product) 
> write cache device is not based on FLASH.  It is based on DRAM.  The 
> difference is like night and day. Apparently there can also be a read 
> cache which is based on FLASH.
> 
> Bob
> ==
> Bob Friesenhahn
> bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 
> 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] SSD drives in Sun Fire X4540 or X4500 for dedicated ZIL device

2009-01-22 Thread Greg Mason
We're evaluating the possibility of speeding up NFS operations of our 
X4540s with dedicated log devices. What we are specifically evaluating 
is replacing 1 or two of our spare sata disks with sata SSDs.

Has anybody tried using SSD device(s) as dedicated ZIL devices in a 
X4540? Are there any known technical issues with using a SSD in a X4540?

-Greg
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over NFS, poor performance with many small files

2009-01-19 Thread Greg Mason
>
> Good idea.  Thor has a CF slot, too, if you can find a high speed
> CF card.
> -- richard

We're already using the CF slot for the OS. We haven't really found  
any CF cards that would be fast enough anyways :)


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over NFS, poor performance with many small files

2009-01-19 Thread Greg Mason
So, what we're looking for is a way to improve performance, without  
disabling the ZIL, as it's my understanding that disabling the ZIL  
isn't exactly a safe thing to do.

We're looking for the best way to improve performance, without  
sacrificing too much of the safety of the data.

The current solution we are considering is disabling the cache  
flushing (as per a previous response in this thread), and adding one  
or two SSD log devices, as this is similar to the Sun storage  
appliances based on the Thor. Thoughts?

-Greg

On Jan 19, 2009, at 6:24 PM, Richard Elling wrote:
>>
>> We took a rough stab in the dark, and started to examine whether or  
>> not it was the ZIL.
>
> It is. I've recently added some clarification to this section in the
> Evil Tuning Guide which might help you to arrive at a better solution.
> http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Disabling_the_ZIL_.28Don.27t.29
> Feedback is welcome.
> -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS over NFS, poor performance with many small files

2009-01-19 Thread Greg Mason
We're running into a performance problem with ZFS over NFS. When working 
with many small files (i.e. unpacking a tar file with source code), a 
Thor (over NFS) is about 4 times slower than our aging existing storage 
solution, which isn't exactly speedy to begin with (17 minutes versus 3 
minutes).

We took a rough stab in the dark, and started to examine whether or not 
it was the ZIL.

Performing IO tests locally on the Thor shows no real IO problems, but 
running IO tests over NFS, specifically, with many smaller files we see 
a significant performance hit.

Just to rule in or out the ZIL as a factor, we disabled it, and ran the 
test again. It completed in just under a minute, around 3 times faster 
than our existing storage. This was more like it!

Are there any tunables for the ZIL to try to speed things up? Or would 
it be best to look into using a high-speed SSD for the log device?

And, yes, I already know that turning off the ZIL is a Really Bad Idea. 
We do, however, need to provide our users with a certain level of 
performance, and what we've got with the ZIL on the pool is completely 
unacceptable.

Thanks for any pointers you may have...

--

Greg Mason
Systems Administrator
Michigan State University
High Performance Computing Center
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Using ZFS for replication

2009-01-15 Thread Greg Mason
zfs-auto-snapshot (SUNWzfs-auto-snapshot) is what I'm using. Only trick 
is that on the other end, we have to manage our own retention of the 
snapshots we send to our offsite/backup boxes.

zfs-auto-snapshot can handle the sending of snapshots as well.

We're running this in OpenSolaris 2008.11 (snv_100).

Another use I've seen is using zfs-auto-snapshot to take and manage 
snapshots on both ends, using rsync to replicate the data, but that's 
less than ideal for most folks...

-Greg

Ian Mather wrote:
> Fairly new to ZFS. I am looking to replicate data between two thumper boxes.
> Found quite a few articles about using zfs incremental snapshot send/receive. 
> Just a cheeky question to see if anyone has anything working in a live 
> environment and are happy to share the scripts,  save me reinventing the 
> wheel. thanks in advance.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss