Re: [zfs-discuss] VM's on ZFS - 7210

2010-08-27 Thread Paul Choi


 No. From what I've seen, ZFS will periodically flush writes from the 
ZIL to disk. You may run into a read starvation situation where ZFS is 
so busy flushing to disk that you won't get reads. If you have VMs where 
developers expect low latency interactivity, they get unhappy. Trust me. :)


One way to address this is either have an ARC that's large enough, or 
add a cache-device for the zpool.


I have a config where ~20 ESX VMs share a single OpenSolaris NFS server. 
It has an Intel X25E for ZIL and X25M for cache. It seems to be doing 
ok. There are actually two of these setups. For one of them, the cache 
SSD died recently, and you can feel it when ZFS goes to disk for some 
uncached piece of data. I'll be replacing the cache SSD next week.


-Paul


On 8/27/10 1:22 PM, John wrote:

Wouldn't it be possible to saturate the SSD ZIL with enough backlogged sync 
writes?

What I mean is, doesn't the ZIL eventually need to make it to the pool, and if 
the pool as a whole (spinning disks) can't keep up with 30+ vm's of write 
requests, couldn't you fill up the ZIL that way?


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Opensolaris is apparently dead

2010-08-19 Thread Paul Choi


 Apparently, I must not be using the right web form...
I would update the case sometimes via the web, and it seems like no one 
actually saw it. Or, some other engineer comes along and asks me the 
same set of questions that were already answered (and recorded in the 
case records!).


Another story. I had a bad DIMM in an X4240. The support tech was almost 
dismissive that we had a bad DIMM. Provided him with explorer outputs, 
IPMI outputs, reseated the DIMM, rebooted, etc. Didn't hear from him for 
like a week. I complained. He said I forgot to give him the full output 
of prtdiag -v to verify the size of each DIMM... as if you can't tell 
by the explorer file. Silence for another week, I complained again, then 
I heard from the parts department that the part was being shipped. Not 
exactly friendly support.


When it was just Sun, their support was pretty good. Around the time it 
was announced that Oracle was going to acquire Sun, Sun's support just 
went south. I wouldn't recommend Sun servers on the basis of the quality 
of the support I've been getting.


-Paul

On 8/18/10 2:39 PM, John D Groenveld wrote:

In message4c6c4e30.7060...@ianshome.com, Ian Collins writes:

If you count Monday this week as lately, we have never had to wait more
than 24 hours for replacement drives for our 45x0 or 7000 series

Same here, but two weeks ago for a failed drive in an X4150.

Last week SunSolve was sending my service order requests to
/dev/null, but someone manually entered after I submitted
web feedback.

John
groenv...@acm.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Is dedupe ready for prime time?

2010-05-18 Thread Paul Choi
I've been reading this list for a while, there's lots of discussion 
about b134 and deduplication. I see some stuff about snapshots not being 
destroyed, and maybe some recovery issues. What I'd like to know is, is 
ZFS with deduplication stable enough to use?


I have two NFS servers, each running OpenSolaris 2009.06 (111b), as 
datastores for VMWare ESX hosts. It works great right now, with ZIL 
offload and L2ARC SSDs. I still get occasional complaints from 
developers saying the storage is slow - which I'm guessing is that read 
latency is not stellar on a shared storage. Write latency is probably 
not an issue due to the ZIL offload. I'm guessing deduplication would 
solve a lot of this read latency problem, having to do fewer read IOs.


But is it stable? Can I do nightly recursive snapshots and periodically 
destroy old snapshots without worrying about a dozen VMs suddenly losing 
their datastore? I'd love to hear from your experience.


Thanks,

-Paul Choi
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is dedupe ready for prime time?

2010-05-18 Thread Paul Choi

Roy,

Thanks for the info. Yeah, the bug you mentioned is pretty critical. In 
terms of SSDs, I have Intel X25-M for L2ARC and X25-E for ZIL. And the 
host has 24G RAM. I'm just waiting for that 2010.03 release or 
whatever we want to call it when it's released...


-Paul

On 5/18/10 12:49 PM, Roy Sigurd Karlsbakk wrote:

- Paul Choipaulc...@plaxo.com  skrev:

   

I've been reading this list for a while, there's lots of discussion
about b134 and deduplication. I see some stuff about snapshots not
being
destroyed, and maybe some recovery issues. What I'd like to know is,
is
ZFS with deduplication stable enough to use?
 

No, currently ZFS dedup is not ready for production. There are several bugs are 
filed, and the most problematic ones are that the system can be rendered 
unusable for days in some situations. Also, if using dedup, plan well your 
memory and spend money on l2arc, since it _will_ require either massive amounts 
of RAM or some good SSDs for l2arc.

Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Is it possible to replicate an entire zpool with AVS?

2009-08-18 Thread Paul Choi

Hello,

Is it possible to replicate an entire zpool with AVS? From what I see, 
you can replicate a zvol, because AVS is filesystem agnostic. I can 
create zvols within a pool, and AVS can replicate replicate those, but 
that's not really what I want.


If I create a zpool called disk1,

paulc...@nfs01b:/dev/zvol# find /dev/zvol
/dev/zvol
/dev/zvol/dsk
/dev/zvol/dsk/rpool
/dev/zvol/dsk/rpool/dump
/dev/zvol/dsk/rpool/swap
/dev/zvol/rdsk
/dev/zvol/rdsk/rpool
/dev/zvol/rdsk/rpool/dump
/dev/zvol/rdsk/rpool/swap
paulc...@nfs01b:/dev/zvol#

The only zvol entries I see are for zvols that have been explicitly created.
Any tricks to using AVS with a zpool? Or should I just opt for periodic 
zfs snapshot and zfs send/recieve?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Does zpool clear delete corrupted files

2009-06-02 Thread Paul Choi
Hm. That's odd. zpool clear should've cleared the list of errors. 
Unless you were accessing files at the same time, so there were more 
checksum errors being reported upon reads.
As for zpool scrub, there's no benefit in your case. Since you are 
reading from the zpool, and there's checksums being done as you read - 
and I assume you're going to read every single file there is. zpool 
scrub is good when you want to ensure that checksum is good for the 
whole zpool, including files you haven't read recently.


Well, good luck with your recovery efforts.

-Paul


Jonathan Loran wrote:


Well, I tried to clear the errors, but zpool clear didn't clear them. 
 I think the errors are in the metadata in such a way that they can't 
be cleared.  I'm actually a bit scared to scrub it before I grab a 
backup, so I'm going to do that first.  After the backup, I need to 
break the mirror to pull the x4540 out, and I just hope that can 
succeed.  If not, we'll be loosing some data between the time the 
backup is taken and I roll out the new storage.  

Let this be a double warning to all you zfs-ers out there:  Make sure 
you have redundancy at the zfs layer, and also do backups. 
 Unfortunately for me, penny pinching has precluded both for us until 
now.


Jon

On Jun 1, 2009, at 4:19 PM, A Darren Dunham wrote:


On Mon, Jun 01, 2009 at 03:19:59PM -0700, Jonathan Loran wrote:


Kinda scary then.  Better make sure we delete all the bad files before  
I back it up.


That shouldn't be necessary.  Clearing the error count doesn't disable
checksums.  Every read is going to verify checksums on the file data
blocks.  If it can't find at least one copy with a valid checksum,
you should just get an I/O error trying to read the file, not invalid
data.

What's odd is we've checked a few hundred files, and most of them  
don't seem to have any corruption.  I'm thinking what's wrong is the  
metadata for these files is corrupted somehow, yet we can read them  
just fine.


Are you still getting errors?

--
Darren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org mailto:zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




- _/ _/  /   - Jonathan Loran -   -
-/  /   /IT Manager   -
-  _  /   _  / / Space Sciences Laboratory, UC Berkeley
-/  / /  (510) 643-5146 
jlo...@ssl.berkeley.edu mailto:jlo...@ssl.berkeley.edu

- __/__/__/   AST:7731^29u18e3
 






___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Does zpool clear delete corrupted files

2009-06-01 Thread Paul Choi
zpool clear just clears the list of errors (and # of checksum errors) 
from its stats. It does not modify the filesystem in any manner. You run 
zpool clear to make the zpool forget that it ever had any issues.


-Paul

Jonathan Loran wrote:


Hi list,

First off:  


# cat /etc/release
   Solaris 10 6/06 s10x_u2wos_09a X86
  Copyright 2006 Sun Microsystems, Inc.  All Rights Reserved.
   Use is subject to license terms.
Assembled 09 June 2006

Here's an (almost) disaster scenario that came to life over the past 
week.  We have a very large zpool containing over 30TB, composed 
(foolishly) of three concatenated iSCSI SAN devices.  There's no 
redundancy in this pool at the zfs level.  We are actually in the 
process of migrating this to a x4540 + j4500 setup, but since the 
x4540 is part of the existing pool, we need to mirror it, 
then detach it so we can build out the replacement storage.  

What happened was some time after I had attached the mirror to the 
x4540, the scsi_vhci/network connection went south, and the 
server panicked.  Since this system has been up, over the past 2.5 
years, this has never happened before.  When we got the thing glued 
back together, it immediately started resilvering from the beginning, 
and reported about 1.9 million data errors.  The list from zpool 
status -v gave over 883k bad files.  This is a small percentage of the 
total number of files in this volume: over 80 million (1%).  

My question is this:  When we clear the pool with zpool clear, what 
happens to all of the bad files?  Are they deleted from the pool, or 
do the error counters just get reset, leaving the bad files in tact? 
 I'm going to perform a full backup of this guy (not so easy on my 
budget), and I would rather only get the good files.


Thanks,

Jon


- _/ _/  /   - Jonathan Loran -   -
-/  /   /IT Manager   -
-  _  /   _  / / Space Sciences Laboratory, UC Berkeley
-/  / /  (510) 643-5146 
jlo...@ssl.berkeley.edu mailto:jlo...@ssl.berkeley.edu

- __/__/__/   AST:7731^29u18e3
 






___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Does zpool clear delete corrupted files

2009-06-01 Thread Paul Choi
If you run zpool scrub on the zpool, it'll do its best to identify the 
file(s) or filesystems/snapshots that have issues. Since you're on a 
single zpool, it won't self-heal any checksum errors... It'll take a 
long time, though, to scrub 30TB...


-Paul

Jonathan Loran wrote:


Kinda scary then.  Better make sure we delete all the bad files before 
I back it up.


What's odd is we've checked a few hundred files, and most of them 
don't seem to have any corruption.  I'm thinking what's wrong is the 
metadata for these files is corrupted somehow, yet we can read them 
just fine.  I wish I could tell which ones are really bad, so we 
wouldn't have to recreate them unnecessarily.  They are mirrored in 
various places, or can be recreated via reprocessing, but 
recreating/restoring that many files is no easy task.


Thanks,

Jon

On Jun 1, 2009, at 2:41 PM, Paul Choi wrote:

zpool clear just clears the list of errors (and # of checksum 
errors) from its stats. It does not modify the filesystem in any 
manner. You run zpool clear to make the zpool forget that it ever 
had any issues.


-Paul

Jonathan Loran wrote:


Hi list,

First off:
# cat /etc/release
  Solaris 10 6/06 s10x_u2wos_09a X86
 Copyright 2006 Sun Microsystems, Inc.  All Rights Reserved.
  Use is subject to license terms.
   Assembled 09 June 2006

Here's an (almost) disaster scenario that came to life over the past 
week.  We have a very large zpool containing over 30TB, composed 
(foolishly) of three concatenated iSCSI SAN devices.  There's no 
redundancy in this pool at the zfs level.  We are actually in the 
process of migrating this to a x4540 + j4500 setup, but since the 
x4540 is part of the existing pool, we need to mirror it, then 
detach it so we can build out the replacement storage.
What happened was some time after I had attached the mirror to the 
x4540, the scsi_vhci/network connection went south, and the server 
panicked.  Since this system has been up, over the past 2.5 years, 
this has never happened before.  When we got the thing glued back 
together, it immediately started resilvering from the beginning, and 
reported about 1.9 million data errors.  The list from zpool status 
-v gave over 883k bad files.  This is a small percentage of the 
total number of files in this volume: over 80 million (1%).
My question is this:  When we clear the pool with zpool clear, what 
happens to all of the bad files?  Are they deleted from the pool, or 
do the error counters just get reset, leaving the bad files in 
tact?  I'm going to perform a full backup of this guy (not so easy 
on my budget), and I would rather only get the good files.


Thanks,

Jon


- _/ _/  /   - Jonathan Loran -   -
-/  /   /IT Manager   -
-  _  /   _  / / Space Sciences Laboratory, UC Berkeley
-/  / /  (510) 643-5146 
jlo...@ssl.berkeley.edu mailto:jlo...@ssl.berkeley.edu

- __/__/__/   AST:7731^29u18e3



 



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss







- _/ _/  /   - Jonathan Loran -   -
-/  /   /IT Manager   -
-  _  /   _  / / Space Sciences Laboratory, UC Berkeley
-/  / /  (510) 643-5146 jlo...@ssl.berkeley.edu
- __/__/__/   AST:7731^29u18e3




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Monitoring ZFS host memory use

2009-05-06 Thread Paul Choi

Ben Rockwood's written a very useful util called arc_summary:
http://www.cuddletech.com/blog/pivot/entry.php?id=979
It's really good for looking at ARC usage (including memory usage).

You might be able to make some guesses based on kstat -n zfs_file_data 
and kstat -n zfs_file_data_buf. Look for mem_inuse.


Running ::memstat in mdb -k also shows Kernel memory usage (probably 
includes ZFS overhead) and ZFS File Data memory usage. But it's 
painfully slow to run. kstat is probably better.


-Paul Choi


Richard Elling wrote:



Bob Friesenhahn wrote:

On Wed, 6 May 2009, Troy Nancarrow (MEL) wrote:


Please forgive me if my searching-fu has failed me in this case, but
I've been unable to find any information on how people are going about
monitoring and alerting regarding memory usage on Solaris hosts using
ZFS.

The problem is not that the ZFS ARC is using up the memory, but that 
the

script Nagios is using to check memory usage simply sees, say 96% RAM
used, and alerts.


Memory is meant to be used.  96% RAM use is good since it represents 
an effective use of your investment.


Actually, I think a percentage of RAM is a bogus metric to measure.
For example, on a 2TBytes system, you would be wasting 80 GBytes.
Perhaps you should look for a more meaningful threshold.
-- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss