[zfs-discuss] ZFS and Linux

2008-05-01 Thread Mertol Ozyoney
Hi All ;

 

What is the status of ZFS on linux and what are the kernel's supported?

 

Regards

Mertol

 

 


 http://www.sun.com/ http://www.sun.com/emrkt/sigs/6g_top.gif

Mertol Ozyoney 
Storage Practice - Sales Manager

Sun Microsystems, TR
Istanbul TR
Phone +902123352200
Mobile +905339310752
Fax +90212335
Email  mailto:[EMAIL PROTECTED] [EMAIL PROTECTED]

 

 

attachment: image001.gif___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Issue with simultaneous IO to lots of ZFS pools

2008-05-01 Thread Darren J Moffat
Chris Siebenmann wrote:
 | Still, I'm curious -- why lots of pools?  Administration would be
 | simpler with a single pool containing many filesystems.
 
  The short answer is that it is politically and administratively easier
 to use (at least) one pool per storage-buying group in our environment.

I think the root cause of the issue is that multiple groups are buying 
physical rather than virtual storage yet it is all being attached to a 
single system.  I will likely be a huge up hill battle but: if all the 
physical storage could be purchased by one group and a combination of 
ZFS reservations and quotas used on top level (eg one level down from 
the pool) datasets to allocate the virtual storage, and appropriate 
amounts charged to the groups, you could technical be able to use ZFS 
how it was intended with much fewer (hopefully 1 or 2) pools.

-- 
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Linux

2008-05-01 Thread Mario Goebbels
 What is the status of ZFS on linux and what are the kernel’s supported?

There's sort of an experimental port to FUSE. Last I heard about it, it
isn't exactly stable and the ARC's missing too, or at least gimped.
There won't be in kernel ZFS due to license issues (CDDL vs. GPL).

-mg
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Linux

2008-05-01 Thread Darren J Moffat
Mario Goebbels wrote:
 What is the status of ZFS on linux and what are the kernel’s supported?
 
 There's sort of an experimental port to FUSE. Last I heard about it, it
 isn't exactly stable and the ARC's missing too, or at least gimped.
 There won't be in kernel ZFS due to license issues (CDDL vs. GPL).

As I pointed out last time this discussion came up there are parts of 
the ZFS code base in GRUB already under the GPL.

Also if ZFS can be implemented completely outside of the Linux kernel 
source tree as a plugin module then it falls into the same category of 
modules as proprietary binary device drivers.

-- 
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Linux

2008-05-01 Thread Mario Goebbels
 Also if ZFS can be implemented completely outside of the Linux kernel
 source tree as a plugin module then it falls into the same category of
 modules as proprietary binary device drivers.

The Linux community has a strange attitude about proprietary drivers.
Otherwise I wouldn't have to put up with the restricted driver manager
and its antics on my laptop that runs Ubuntu (no Solaris until hard disk
APM can be managed). Or read by proxy about Linus' (and his buddies)
endless bickering about it.

Also, they're still spreading FUD regarding these supposed layer
violations. I wouldn't hold my hopes for now.

-mg
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Linux

2008-05-01 Thread Joerg Schilling
Mario Goebbels [EMAIL PROTECTED] wrote:

  What is the status of ZFS on linux and what are the kernel???s supported?

 There's sort of an experimental port to FUSE. Last I heard about it, it
 isn't exactly stable and the ARC's missing too, or at least gimped.
 There won't be in kernel ZFS due to license issues (CDDL vs. GPL).

A license issue would would only exist if someone claimed that ZFS has been 
developed as a part of Linux. But then these people would create a Copyright 
problem ;-)


Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Issue with simultaneous IO to lots of ZFS pools

2008-05-01 Thread David Collier-Brown
Darren J Moffat [EMAIL PROTECTED] wrote:
 Chris Siebenmann wrote:
| Still, I'm curious -- why lots of pools?  Administration would be
| simpler with a single pool containing many filesystems.

 The short answer is that it is politically and administratively easier
to use (at least) one pool per storage-buying group in our environment.
 
 
 I think the root cause of the issue is that multiple groups are buying 
 physical rather than virtual storage yet it is all being attached to a 
 single system.  I will likely be a huge up hill battle but: if all the 
 physical storage could be purchased by one group and a combination of 
 ZFS reservations and quotas used on top level (eg one level down from 
 the pool) datasets to allocate the virtual storage, and appropriate 
 amounts charged to the groups, you could technical be able to use ZFS 
 how it was intended with much fewer (hopefully 1 or 2) pools.

The scenario Chris describes is one I see repeatedly at customers
buying SAN storage (as late as last month!) and is considered
a best practice on the business side.

We may want to make this issue and it's management visible, as
people moving from SAN to ZFS are likely to trip over it.

In particular, I'd like to see a blueprint or at least a 
wiki discussion by someone from the SAN world on how to 
map those kinds of purchases to ZFS pools, how few one 
wants to have, what happens when it goes wrong, and how 
to mitigate it (;-))

--dave
ps: as always, having asked for something, I'm also volunteering to
help provide it: I'm not a storage or ZFS guy, but I am an author,
and will happily help my Smarter Colleagues[tm] to write it up.

-- 
David Collier-Brown| Always do right. This will gratify
Sun Microsystems, Toronto  | some people and astonish the rest
[EMAIL PROTECTED] |  -- Mark Twain
(905) 943-1983, cell: (647) 833-9377, (800) 555-9786 x56583
bridge: (877) 385-4099 code: 506 9191#



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Issue with simultaneous IO to lots of ZFS pools

2008-05-01 Thread Chris Siebenmann
| I think the root cause of the issue is that multiple groups are buying
| physical rather than virtual storage yet it is all being attached to a
| single system.

 They're actually buying constant-sized chunks of virtual storage, which
is provided through a pool of SAN-based disk space. This means that
we're always going to have a certain number of logical pools of storage
space to manage that are expanded in fixed-size chunks; the question is
whether to manage them as separate ZFS pools or to aggregate them into
fewer ZFS pools and then use quotas on sub-hierarchies.

(With local storage you wouldn't have much choice; the physical disk
size is not likely to map nicely into the constant-sized chunks you sell
to people. With SAN storage you can pretty much make the 'disks' that
Solaris sees map straight to the chunk size.)

- cks
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-01 Thread Simon Breden
Sorry for the delay. Here is the output for a couple of seconds:

# iostat -xce 1
 extended device statistics  errors --- 
 cpu
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b s/w h/w trn tot  us 
sy wt id
cmdk0 1.50.7   20.84.2  0.0  0.09.0   0   1   0   0   0   0   2 
 1  0 98
sd0   0.30.2   29.4   25.1  0.1  0.0  174.5   0   0   0   0   0   0 
sd1   0.30.2   33.2   25.0  0.1  0.0  166.8   0   0   0   0   0   0 
sd2   0.20.2   26.8   24.8  0.1  0.0  180.3   0   0   0   0   0   0 
sd3   0.00.00.00.0  0.0  0.00.0   0   0   0   5   0   5 
 extended device statistics  errors --- 
 cpu
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b s/w h/w trn tot  us 
sy wt id
cmdk0 0.00.00.00.0  0.0  0.00.0   0   0   0   0   0   0   0 
 0  0 100
sd0   0.00.00.00.0 35.0  0.00.0 100   0   0   0   0   0 
sd1   0.00.00.00.0 35.0  0.00.0 100   0   0   0   0   0 
sd2   0.00.00.00.0 35.0  0.00.0 100   0   0   0   0   0 
sd3   0.00.00.00.0  0.0  0.00.0   0   0   0   5   0   5 
 extended device statistics  errors --- 
 cpu
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b s/w h/w trn tot  us 
sy wt id
cmdk0 0.00.00.00.0  0.0  0.00.0   0   0   0   0   0   0   1 
 1  0 98
sd0   0.00.00.00.0 35.0  0.00.0 100   0   0   0   0   0 
sd1   0.00.00.00.0 35.0  0.00.0 100   0   0   0   0   0 
sd2   0.00.00.00.0 35.0  0.00.0 100   0   0   0   0   0 
sd3   0.00.00.00.0  0.0  0.00.0   0   0   0   5   0   5 
 extended device statistics  errors --- 
 cpu
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b s/w h/w trn tot  us 
sy wt id
cmdk0 0.00.00.00.0  0.0  0.00.0   0   0   0   0   0   0   2 
 1  0 98
sd0   0.00.00.00.0 35.0  0.00.0 100   0   0   0   0   0 
sd1   0.00.00.00.0 35.0  0.00.0 100   0   0   0   0   0 
sd2   0.00.00.00.0 35.0  0.00.0 100   0   0   0   0   0 
sd3   0.00.00.00.0  0.0  0.00.0   0   0   0   5   0   5 
 extended device statistics  errors --- 
 cpu
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b s/w h/w trn tot  us 
sy wt id
cmdk0 0.00.00.00.0  0.0  0.00.0   0   0   0   0   0   0  26 
 3  0 70
sd0   0.00.00.00.0 35.0  0.00.0 100   0   0   0   0   0 
sd1   0.00.00.00.0 35.0  0.00.0 100   0   0   0   0   0 
sd2   0.00.00.00.0 35.0  0.00.0 100   0   0   0   0   0 
sd3   0.00.00.00.0  0.0  0.00.0   0   0   0   5   0   5 
 extended device statistics  errors --- 
 cpu
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b s/w h/w trn tot  us 
sy wt id
cmdk0 0.00.00.00.0  0.0  0.00.0   0   0   0   0   0   0   2 
 0  0 98
sd0   0.00.00.00.0 35.0  0.00.0 100   0   0   0   0   0 
sd1   0.00.00.00.0 35.0  0.00.0 100   0   0   0   0   0 
sd2   0.00.00.00.0 35.0  0.00.0 100   0   0   0   0   0 
sd3   0.00.00.00.0  0.0  0.00.0   0   0   0   5   0   5 
 extended device statistics  errors --- 
 cpu
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b s/w h/w trn tot  us 
sy wt id
cmdk0 0.01.00.02.0  0.0  0.00.1   0   0   0   0   0   0   1 
 1  0 98
sd0   0.00.00.00.0 35.0  0.00.0 100   0   0   0   0   0 
sd1   0.00.00.00.0 35.0  0.00.0 100   0   0   0   0   0 
sd2   0.00.00.00.0 35.0  0.00.0 100   0   0   0   0   0 
sd3   0.00.00.00.0  0.0  0.00.0   0   0   0   5   0   5 
 extended device statistics  errors --- 
 cpu
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b s/w h/w trn tot  us 
sy wt id
cmdk0 0.00.00.00.0  0.0  0.00.0   0   0   0   0   0   0  18 
 1  0 81
sd0   0.00.00.00.0 35.0  0.00.0 100   0   0   0   0   0 
sd1   0.00.00.00.0 35.0  0.00.0 100   0   0   0   0   0 
sd2   0.00.00.00.0 35.0  0.00.0 100   0   0   0   0   0 
sd3   0.00.00.00.0  0.0  0.00.0   0   0   0   5   0   5 
 extended device statistics  errors --- 
 cpu
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b s/w h/w trn tot  us 
sy wt id
cmdk0 0.00.00.00.0  0.0  0.00.0   0   0   0   0   0   0   6 
 1  0 94
sd0   0.00.00.00.0 35.0  0.00.0 100   

Re: [zfs-discuss] Issue with simultaneous IO to lots of ZFS pools

2008-05-01 Thread Richard Elling
David Collier-Brown wrote:
 Darren J Moffat [EMAIL PROTECTED] wrote:
   
 Chris Siebenmann wrote:
 
 | Still, I'm curious -- why lots of pools?  Administration would be
 | simpler with a single pool containing many filesystems.

 The short answer is that it is politically and administratively easier
 to use (at least) one pool per storage-buying group in our environment.
   
 I think the root cause of the issue is that multiple groups are buying 
 physical rather than virtual storage yet it is all being attached to a 
 single system.  I will likely be a huge up hill battle but: if all the 
 physical storage could be purchased by one group and a combination of 
 ZFS reservations and quotas used on top level (eg one level down from 
 the pool) datasets to allocate the virtual storage, and appropriate 
 amounts charged to the groups, you could technical be able to use ZFS 
 how it was intended with much fewer (hopefully 1 or 2) pools.
 

 The scenario Chris describes is one I see repeatedly at customers
 buying SAN storage (as late as last month!) and is considered
 a best practice on the business side.

 We may want to make this issue and it's management visible, as
 people moving from SAN to ZFS are likely to trip over it.

 In particular, I'd like to see a blueprint or at least a 
 wiki discussion by someone from the SAN world on how to 
 map those kinds of purchases to ZFS pools, how few one 
 wants to have, what happens when it goes wrong, and how 
 to mitigate it (;-))
   

There are two issues here.  One is the number of pools, but the other
is the small amount of RAM in the server.  To be honest, most laptops
today come with 2 GBytes, and most servers are in the 8-16 GByte
range (hmmm... I suppose I could look up the average size we sell...)

 --dave
 ps: as always, having asked for something, I'm also volunteering to
 help provide it: I'm not a storage or ZFS guy, but I am an author,
 and will happily help my Smarter Colleagues[tm] to write it up.

   

Should be relatively straightforward... we would need some help from
someone in this situation to provide some sort of performance results
(on a system with plenty of RAM).
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Issue with simultaneous IO to lots of ZFS pools

2008-05-01 Thread Chris Siebenmann
| There are two issues here.  One is the number of pools, but the other
| is the small amount of RAM in the server.  To be honest, most laptops
| today come with 2 GBytes, and most servers are in the 8-16 GByte range
| (hmmm... I suppose I could look up the average size we sell...)

 Speaking as a sysadmin (and a Sun customer), why on earth would I have
to provision 8 GB+ of RAM on my NFS fileservers? I would much rather
have that memory in the NFS client machines, where it can actually be
put to work by user programs.

(If I have decently provisioned NFS client machines, I don't expect much
from the NFS fileserver's cache. Given that the clients have caches too,
I believe that the server's cache will mostly be hit for things that the
clients cannot cache because of NFS semantics, like NFS GETATTR requests
for revalidation and the like.)

- cks
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-01 Thread Richard Elling
Simon Breden wrote:
 Sorry for the delay. Here is the output for a couple of seconds:
   

This is the smoking gun...

 # iostat -xce 1
  extended device statistics  errors ---   
cpu
 devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b s/w h/w trn tot  
 us sy wt id
 cmdk0 1.50.7   20.84.2  0.0  0.09.0   0   1   0   0   0   0   
 2  1  0 98
 sd0   0.30.2   29.4   25.1  0.1  0.0  174.5   0   0   0   0   0   0 
 sd1   0.30.2   33.2   25.0  0.1  0.0  166.8   0   0   0   0   0   0 
 sd2   0.20.2   26.8   24.8  0.1  0.0  180.3   0   0   0   0   0   0 
 sd3   0.00.00.00.0  0.0  0.00.0   0   0   0   5   0   5 
   

I/O is moving, but not much activity...

  extended device statistics  errors ---   
cpu
 devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b s/w h/w trn tot  
 us sy wt id
 cmdk0 0.00.00.00.0  0.0  0.00.0   0   0   0   0   0   0   
 0  0  0 100
 sd0   0.00.00.00.0 35.0  0.00.0 100   0   0   0   0   0 
 sd1   0.00.00.00.0 35.0  0.00.0 100   0   0   0   0   0 
 sd2   0.00.00.00.0 35.0  0.00.0 100   0   0   0   0   0 
 sd3   0.00.00.00.0  0.0  0.00.0   0   0   0   5   0   5 
   

I/O is stuck at the device.  %w[ait] of 100 means that something is
stuck on the device.  This is probably not a ZFS issue, per se, but
somehow the ZFS workload triggers it.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Issue with simultaneous IO to lots of ZFS pools

2008-05-01 Thread Bart Smaalders
Chris Siebenmann wrote:
 | There are two issues here.  One is the number of pools, but the other
 | is the small amount of RAM in the server.  To be honest, most laptops
 | today come with 2 GBytes, and most servers are in the 8-16 GByte range
 | (hmmm... I suppose I could look up the average size we sell...)
 
  Speaking as a sysadmin (and a Sun customer), why on earth would I have
 to provision 8 GB+ of RAM on my NFS fileservers? I would much rather
 have that memory in the NFS client machines, where it can actually be
 put to work by user programs.

This depends entirely on the amount of disk  CPU on the fileserver...

A Thumper w/ 48 TB of disk and two dual-core CPUS is prob. somewhat 
under-provisioned
w/ 8 GB of RAM.

- Bart


-- 
Bart Smaalders  Solaris Kernel Performance
[EMAIL PROTECTED]   http://blogs.sun.com/barts
You will contribute more with mercurial than with thunderbird.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Issue with simultaneous IO to lots of ZFS pools

2008-05-01 Thread David Collier-Brown
Chris Siebenmann [EMAIL PROTECTED] wrote:
|  Speaking as a sysadmin (and a Sun customer), why on earth would I have
| to provision 8 GB+ of RAM on my NFS fileservers? I would much rather
| have that memory in the NFS client machines, where it can actually be
| put to work by user programs.
|
| (If I have decently provisioned NFS client machines, I don't expect much
| from the NFS fileserver's cache. Given that the clients have caches too,
| I believe that the server's cache will mostly be hit for things that the
| clients cannot cache because of NFS semantics, like NFS GETATTR requests
| for revalidation and the like.)

That's certainly true for the NFS part of the NFS fileserver, but to get
the ZFS feature-set, you trade off cycles and memory.  If we investigate
this a bit, we should be able to figure out a rule of thumb for how
little memory we need for an NFS-home-directories workload without 
cutting into performance.

--dave
-- 
David Collier-Brown| Always do right. This will gratify
Sun Microsystems, Toronto  | some people and astonish the rest
[EMAIL PROTECTED] |  -- Mark Twain
(905) 943-1983, cell: (647) 833-9377, (800) 555-9786 x56583
bridge: (877) 385-4099 code: 506 9191#
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-01 Thread Rob Logan

hmm, three drives with 35 io requests in the queue
and none active? remind me not to buy a drive
with that FW..

1) upgrade the FW in the drives or

2) turn off NCQ with:
echo set sata:sata_max_queue_depth = 0x1  /etc/system


Rob
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Issue with simultaneous IO to lots of ZFS pools

2008-05-01 Thread Richard Elling
Bart Smaalders wrote:
 Chris Siebenmann wrote:
   
 | There are two issues here.  One is the number of pools, but the other
 | is the small amount of RAM in the server.  To be honest, most laptops
 | today come with 2 GBytes, and most servers are in the 8-16 GByte range
 | (hmmm... I suppose I could look up the average size we sell...)

  Speaking as a sysadmin (and a Sun customer), why on earth would I have
 to provision 8 GB+ of RAM on my NFS fileservers? I would much rather
 have that memory in the NFS client machines, where it can actually be
 put to work by user programs.
 

Chris, thanks for the comment.  We will have to be very specific here.
There is a minimum expected RAM per pool of 10 MBytes.  So it is
more of a function of the number of pools than the services rendered.
I can see that crafting the message clearly will take some time.


 This depends entirely on the amount of disk  CPU on the fileserver...

 A Thumper w/ 48 TB of disk and two dual-core CPUS is prob. somewhat 
 under-provisioned
 w/ 8 GB of RAM.
   

Except the smallest thumper we sell has 16 GBytes... :-)
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS boot (post build 88) questions

2008-05-01 Thread Wyllys Ingersoll
Are there any updated guides/blogs on how to configure ZFS boot on a build 88 
or later system?

If I already have an existing zpool, will I be able to just add a root/boot 
dataset or does the root/boot dataset have to have it's own pool?

I have several working systems that have small UFS partitions for booting and 
LiveUpgrading and then the rest of the disk(s) are in a large zpool.  I'm 
hoping I can migrate smoothly to a ZFS boot system and have all my disks in the 
same zpool without destroying everything and starting over.

-Wyllys
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-01 Thread Simon Breden
Thanks a lot Richard. To give a bit more info, I've copied my /var/adm/messages 
from booting up the machine:

And @picker: I guess the 35 requests are stacked up waiting for the hanging 
request to be serviced?

The question I have is where do I go from now, to get some more info on what is 
causing cp to have problems.

I will now try another tack: use rsync to copy the directory to a disk outside 
the pool (i.e. my home directory on the boot drive), to see if it is happy 
doing that.


May  1 17:04:15 zfsbox su: [ID 810491 auth.crit] 'su root' failed for simon on 
/dev/pts/5
May  1 17:48:15 zfsbox genunix: [ID 540533 kern.notice] ^MSunOS Release 5.11 
Version snv_85 64-bit
May  1 17:48:15 zfsbox genunix: [ID 172908 kern.notice] Copyright 1983-2008 Sun 
Microsystems, Inc.  All rights reserved.
May  1 17:48:15 zfsbox Use is subject to license terms.
May  1 17:48:15 zfsbox unix: [ID 126719 kern.info] features: 
13f6fffcpuid,tscp,cmp,cx16,sse3,nx,asysc,sse2,sse,pat,cx8,pae,mca,mmx,cmov,de,pge,mtrr,msr,tsc,lgpg
May  1 17:48:15 zfsbox unix: [ID 168242 kern.info] mem = 4192764K (0xffe7f000)
May  1 17:48:15 zfsbox rootnex: [ID 466748 kern.info] root nexus = i86pc
May  1 17:48:15 zfsbox rootnex: [ID 349649 kern.info] pseudo0 at root
May  1 17:48:15 zfsbox genunix: [ID 936769 kern.info] pseudo0 is /pseudo
May  1 17:48:15 zfsbox rootnex: [ID 349649 kern.info] scsi_vhci0 at root
May  1 17:48:15 zfsbox genunix: [ID 936769 kern.info] scsi_vhci0 is /scsi_vhci
May  1 17:48:15 zfsbox rootnex: [ID 349649 kern.info] isa0 at root
May  1 17:48:15 zfsbox pci_autoconfig: [ID 139057 kern.info] NOTICE: reprogram 
io-range on ppb[0/f/0]: 0x2000 ~ 0x2fff
May  1 17:48:15 zfsbox pci_autoconfig: [ID 595143 kern.info] NOTICE: add 
io-range on subtractive ppb[0/6/0]: 0x3000 ~ 0x3fff
May  1 17:48:15 zfsbox pcplusmp: [ID 658230 kern.info] NOTICE: apic: local nmi: 
0 1 1 1
May  1 17:48:15 zfsbox pcplusmp: [ID 658230 kern.info] NOTICE: apic: local nmi: 
1 1 1 1
May  1 17:48:15 zfsbox pcplusmp: [ID 658230 kern.info] NOTICE: apic: local nmi: 
2 1 1 1
May  1 17:48:15 zfsbox pcplusmp: [ID 658230 kern.info] NOTICE: apic: local nmi: 
3 1 1 1
May  1 17:48:15 zfsbox pcplusmp: [ID 177709 kern.info] pcplusmp: vector 0x9 
ioapic 0x4 intin 0x9 is bound to cpu 1
May  1 17:48:15 zfsbox pseudo: [ID 129642 kern.info] pseudo-device: acpippm0
May  1 17:48:15 zfsbox genunix: [ID 936769 kern.info] acpippm0 is 
/pseudo/[EMAIL PROTECTED]
May  1 17:48:15 zfsbox pseudo: [ID 129642 kern.info] pseudo-device: ppm0
May  1 17:48:15 zfsbox genunix: [ID 936769 kern.info] ppm0 is /pseudo/[EMAIL 
PROTECTED]
May  1 17:48:15 zfsbox rootnex: [ID 349649 kern.info] npe0 at root: space 0 
offset 0
May  1 17:48:15 zfsbox genunix: [ID 936769 kern.info] npe0 is /[EMAIL 
PROTECTED],0
May  1 17:48:15 zfsbox pcplusmp: [ID 803547 kern.info] pcplusmp: ide (ata) 
instance 0 vector 0xe ioapic 0x4 intin 0xe is bound to cpu 0
May  1 17:48:15 zfsbox genunix: [ID 640982 kern.info]   IDE device at targ 0, 
lun 0 lastlun 0x0
May  1 17:48:15 zfsbox genunix: [ID 846691 kern.info]   model HDS722516VLAT80
May  1 17:48:15 zfsbox genunix: [ID 479077 kern.info]   ATA/ATAPI-6 supported, 
majver 0x7c minver 0x19
May  1 17:48:15 zfsbox genunix: [ID 640982 kern.info]   ATAPI device at targ 1, 
lun 0 lastlun 0x0
May  1 17:48:15 zfsbox genunix: [ID 846691 kern.info]   model TSSTcorpDVD-ROM 
SH-D162C
May  1 17:48:15 zfsbox npe: [ID 236367 kern.info] PCI Express-device: [EMAIL 
PROTECTED], ata0
May  1 17:48:15 zfsbox genunix: [ID 936769 kern.info] ata0 is /[EMAIL 
PROTECTED],0/[EMAIL PROTECTED]/[EMAIL PROTECTED]
May  1 17:48:15 zfsbox genunix: [ID 773945 kern.info]   UltraDMA mode 2 selected
May  1 17:48:15 zfsbox genunix: [ID 773945 kern.info]   UltraDMA mode 5 selected
May  1 17:48:15 zfsbox gda: [ID 243001 kern.info] Disk0:Vendor 
'Gen-ATA ' Product 'HDS722516VLAT80 '
May  1 17:48:17 zfsbox ata: [ID 496167 kern.info] cmdk0 at ata0 target 0 lun 0
May  1 17:48:17 zfsbox genunix: [ID 936769 kern.info] cmdk0 is /[EMAIL 
PROTECTED],0/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED],0
May  1 17:48:20 zfsbox unix: [ID 190185 kern.info] SMBIOS v2.4 loaded (2169 
bytes)
May  1 17:48:20 zfsbox genunix: [ID 408114 kern.info] /cpus (cpunex0) online
May  1 17:48:20 zfsbox pseudo: [ID 129642 kern.info] pseudo-device: dld0
May  1 17:48:20 zfsbox genunix: [ID 936769 kern.info] dld0 is /pseudo/[EMAIL 
PROTECTED]
May  1 17:48:20 zfsbox pcplusmp: [ID 803547 kern.info] pcplusmp: 
pciexclass,060400 (pcie_pci) instance 1 vector 0x18 ioapic 0xff intin 0xff is 
bound to cpu 1
May  1 17:48:20 zfsbox npe: [ID 236367 kern.info] PCI Express-device: 
pci10de,[EMAIL PROTECTED], pcie_pci1
May  1 17:48:20 zfsbox genunix: [ID 936769 kern.info] pcie_pci1 is /[EMAIL 
PROTECTED],0/pci10de,[EMAIL PROTECTED]
May  1 17:48:21 zfsbox pcplusmp: [ID 803547 kern.info] pcplusmp: pci10de,163 
(nvidia) instance 0 vector 0x10 ioapic 0x4 intin 0x10 is bound to cpu 0
May  1 17:48:21 zfsbox pcie_pci: [ID 586369 kern.info] PCIE-device: 

Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-01 Thread [EMAIL PROTECTED]
Hi Simon,

Simon Breden wrote:
 Thanks a lot Richard. To give a bit more info, I've copied my 
 /var/adm/messages from booting up the machine:

 And @picker: I guess the 35 requests are stacked up waiting for the hanging 
 request to be serviced?

 The question I have is where do I go from now, to get some more info on what 
 is causing cp to have problems.

 I will now try another tack: use rsync to copy the directory to a disk 
 outside the pool (i.e. my home directory on the boot drive), to see if it is 
 happy doing that.
   
What does truss show the cp doing? 
max


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-01 Thread Simon Breden
This list seems out of sync (delayed) with email messages I receive.

Why is that?

Which are the best tools to use when reading / replying to these posts? 

Anyway from my email I can see that Max has sent me a question about truss -- 
here is my reply:

Hi Max,

I haven't used truss before, but give me the command line + switches and I'll 
be happy to run it.

Simon
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-01 Thread [EMAIL PROTECTED]
Hi Simon,


Simon Breden wrote:
 Hi Max,

 I haven't used truss before, but give me the command line + switches 
 and I'll be happy to run it.

 Simon
# truss -p pid_from_cp

where pid_from_cp is... the pid of the cp process that is hung.  The 
pid you can get from ps.

I am curious if the cp is stuck on a specific file, or is just very 
slow, or is hung in the kernel.
Also, can you kill the cp when it hangs?

thanks,
max

 2008/5/1 [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] 
 [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]:

 Hi Simon,

 Simon Breden wrote:

 Thanks a lot Richard. To give a bit more info, I've copied my
 /var/adm/messages from booting up the machine:

 And @picker: I guess the 35 requests are stacked up waiting
 for the hanging request to be serviced?

 The question I have is where do I go from now, to get some
 more info on what is causing cp to have problems.

 I will now try another tack: use rsync to copy the directory
 to a disk outside the pool (i.e. my home directory on the boot
 drive), to see if it is happy doing that.
  

 What does truss show the cp doing? max




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thumper / X4500 marvell driver issues

2008-05-01 Thread Carson Gaspar
Doug wrote:
 When we installed the Marvell driver patch 125205-07 on our X4500 a few 
 months ago and it started crashing, Sun support just told us to back out that 
 patch.  The system has been stable since then.
 
 We are still running Solaris 10 11/06 on that system.  Is there an advantage 
 to using 125205-07 and the IDR you mention compared to just not using NCQ?  
 Better performance?  If so, how much better?

Everything depends on your I/O workload. We are updating thousands of 
RRD files, so it's _extremely_ random, relatively small I/Os. In our 
case, NCQ is a huge win (about a 25% improvement, as I recall). If you 
do mostly sequential I/O, It will probably make only a small difference.

-- 
Carson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-01 Thread Simon Breden
This mailing list seems broken and out of sync -- your post is as 'Guest' and 
appears as a new post in the main zfs-discuss list -- and the main thread is 
out of sync with the replies, and I just got a java exception trying to post to 
the main thread -- what's going on here?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-01 Thread Simon Breden
Hi Max,

I re-ran the cp command and when it hanged I ran 'ps -el' looked up the cp 
command, got it's PID and then ran:

# truss -p PID_of_cp

and it output nothing at all -- i.e. it hanged too -- just showing a flashing 
cursor.

The system is still operational as I am typing into the browser.

Before I ran the cp command I did a 'tail -f /var/adm/messages' and there is no 
output. I also did a 'tail -f /var/log/syslog' and there is also no output.

If I try 'kill -15 PID_of_cp' and then 'ps -el' cp is still running.
And if I try 'kill -9 PID_of_cp' and then 'ps -el' cp is still running.

What next ?

For what it's worth, here is further output from iostat, the first line looks 
interesting, but it appears to be buffered, as it's the same each time I run it 
-- possibly the last info from cp before it died???  (see below)

So, any ideas what I can do next? Just going to reboot to kill the hanging cp 
-- wow, this is like Windows now :(  (attempt to keep humor)

bash-3.2$ iostat -xce 1
 extended device statistics  errors --- 
 cpu
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b s/w h/w trn tot  us 
sy wt id
cmdk021.42.7  191.4   32.4  0.4  0.1   20.8   3   5   0   0   0   0   4 
 2  0 94
sd0   9.9   12.6 1193.8 1478.2 11.0  0.0  489.0  32   4   0   0   0   0 
sd1   7.5   12.6  836.2 1479.1 10.9  0.0  543.7  32   4   0   0   0   0 
sd2   7.7   12.6  877.1 1479.4 10.9  0.0  537.1  32   4   0   0   0   0 
sd3   0.00.00.00.0  0.0  0.00.0   0   0   0   5   0   5 
 extended device statistics  errors --- 
 cpu
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b s/w h/w trn tot  us 
sy wt id
cmdk0 0.00.00.00.0  0.0  0.00.0   0   0   0   0   0   0   1 
 1  0 98
sd0   0.00.00.00.0 35.0  0.00.0 100   0   0   0   0   0 
sd1   0.00.00.00.0 35.0  0.00.0 100   0   0   0   0   0 
sd2   0.00.00.00.0 35.0  0.00.0 100   0   0   0   0   0 
sd3   0.00.00.00.0  0.0  0.00.0   0   0   0   5   0   5 
 extended device statistics  errors --- 
 cpu
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b s/w h/w trn tot  us 
sy wt id
cmdk0 0.00.00.00.0  0.0  0.00.0   0   0   0   0   0   0   2 
 0  0 98
sd0   0.00.00.00.0 35.0  0.00.0 100   0   0   0   0   0 
sd1   0.00.00.00.0 35.0  0.00.0 100   0   0   0   0   0 
sd2   0.00.00.00.0 35.0  0.00.0 100   0   0   0   0   0 
sd3   0.00.00.00.0  0.0  0.00.0   0   0   0   5   0   5 
 extended device statistics  errors --- 
 cpu
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b s/w h/w trn tot  us 
sy wt id
cmdk0 0.00.00.00.0  0.0  0.00.0   0   0   0   0   0   0   1 
 1  0 98
sd0   0.00.00.00.0 35.0  0.00.0 100   0   0   0   0   0 
sd1   0.00.00.00.0 35.0  0.00.0 100   0   0   0   0   0 
sd2   0.00.00.00.0 35.0  0.00.0 100   0   0   0   0   0 
sd3   0.00.00.00.0  0.0  0.00.0   0   0   0   5   0   5 
 extended device statistics  errors --- 
 cpu
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b s/w h/w trn tot  us 
sy wt id
cmdk0 0.00.00.00.0  0.0  0.00.0   0   0   0   0   0   0   2 
 0  0 98
sd0   0.00.00.00.0 35.0  0.00.0 100   0   0   0   0   0 
sd1   0.00.00.00.0 35.0  0.00.0 100   0   0   0   0   0 
sd2   0.00.00.00.0 35.0  0.00.0 100   0   0   0   0   0 
sd3   0.00.00.00.0  0.0  0.00.0   0   0   0   5   0   5 
 extended device statistics  errors --- 
 cpu
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b s/w h/w trn tot  us 
sy wt id
cmdk0 0.00.00.00.0  0.0  0.00.0   0   0   0   0   0   0   4 
 1  0 96
sd0   0.00.00.00.0 35.0  0.00.0 100   0   0   0   0   0 
sd1   0.00.00.00.0 35.0  0.00.0 100   0   0   0   0   0 
sd2   0.00.00.00.0 35.0  0.00.0 100   0   0   0   0   0 
sd3   0.00.00.00.0  0.0  0.00.0   0   0   0   5   0   5 
^C
bash-3.2$ 



and I just saw this:  (seems to be from yesterday -- I don't know if it's 
relevant to this though ??)



# fmdump -e
TIME CLASS
Apr 30 01:12:14.9802 ereport.fs.zfs.checksum 
Apr 30 01:12:14.9896 ereport.fs.zfs.checksum 
Apr 30 01:12:14.9896 ereport.fs.zfs.checksum 
Apr 30 01:12:14.9896 ereport.fs.zfs.checksum 
Apr 30 01:12:14.9896 ereport.fs.zfs.checksum 
Apr 30 01:12:14.9896 ereport.fs.zfs.data 
Apr 30 01:12:14.9802 ereport.fs.zfs.checksum 
Apr 30 

Re: [zfs-discuss] ZFS still crashing after patch

2008-05-01 Thread Rustam
Today my production server crashed  4 times. THIS IS NIGHTMARE!
Self-healing file system?! For me ZFS is SELF-KILLING filesystem. 

I cannot fsck it, there's no such tool.
I cannot scrub it, it crashes 30-40 minutes after scrub starts.
I cannot use it, it crashes a number of times every day! And with every crash 
number of checksum failures is growing:

NAMESTATE READ WRITE CKSUM
box5ONLINE   0 0 0
...after a few hours...
box5ONLINE   0 0 4
...after a few hours...
box5ONLINE   0 0 62
...after another few hours...
box5ONLINE   0 0 120
...crash! and we start again...
box5ONLINE   0 0 0
...etc...

actually 120 is record, sometimes it crashed as soon as it boots.

and always there's a permanent error:
errors: Permanent errors have been detected in the following files:
box5:0x0

and very wise self-healing advice:
http://www.sun.com/msg/ZFS-8000-8A
Restore the file in question if possible.  Otherwise restore the entire pool 
from backup.

Thanks, but if I restore it from backup it won't be ZFS anymore, that's for 
sure.

It's not I/O problem. AFAIK, default ZFS I/O error behavior is wait to repair 
(i've 10U4, non-configurable). Then why it panics?

Recently there were discussions on failure of OpenSolaris community. Now it's 
been more than half a month since I reported such an error. Nobody even posted 
something like RTFM. Come on guys, I know you are there and busy with 
enterprise customers... but at least give me some troubleshooting ideas. i'm 
totally lost.

just to remind, it's heavily loaded fs with 3-4 million files and folders.

Link to original post:
http://www.opensolaris.org/jive/thread.jspa?threadID=57425
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS still crashing after patch

2008-05-01 Thread Bob Friesenhahn
On Thu, 1 May 2008, Rustam wrote:

 Today my production server crashed  4 times. THIS IS NIGHTMARE!
 Self-healing file system?! For me ZFS is SELF-KILLING filesystem.

 I cannot fsck it, there's no such tool.
 I cannot scrub it, it crashes 30-40 minutes after scrub starts.
 I cannot use it, it crashes a number of times every day! And with every crash 
 number of checksum failures is growing:

Is your ZFS pool configured with redundancy (e.g mirrors, raidz) or is 
it non-redundant?  If non-redundant, then there is not much that ZFS 
can really do if a device begins to fail.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-01 Thread [EMAIL PROTECTED]
Hi Simon,

Simon Breden wrote:
 Hi Max,

 I re-ran the cp command and when it hanged I ran 'ps -el' looked up the cp 
 command, got it's PID and then ran:

 # truss -p PID_of_cp

 and it output nothing at all -- i.e. it hanged too -- just showing a flashing 
 cursor.

 The system is still operational as I am typing into the browser.

 Before I ran the cp command I did a 'tail -f /var/adm/messages' and there is 
 no output. I also did a 'tail -f /var/log/syslog' and there is also no output.

 If I try 'kill -15 PID_of_cp' and then 'ps -el' cp is still running.
 And if I try 'kill -9 PID_of_cp' and then 'ps -el' cp is still running.

 What next ?
   
You can try the following:

# mdb -k
::pgrep cp   -- this should give you a line with the cp you are 
running.  Next to cp is an address, use this address in the next line:

address_from_pgrep::walk thread | ::threadlist -v

This will give you a stack trace.  Please post it.

$q  -- this gets you out of mdb

max

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-01 Thread Simon Breden
Keep getting Java exceptions posting to the proper thread for this -- just lost 
an hour --- WTF???

Had to reply to my own post as Max's reply (which I saw in my email inbox) has 
not appeared here. Again, what is wrong with this forum software -- it seems so 
buggy, or am I missing something here (I'm quite prepared to be wrong, but I'm 
getting really p*ssed off with this forum engine as I'm wasting time and 
getting Java errors about missing parent which I will post if I get more errors 
trying to post with this broken forum software -- if it allows me to)


Thanks for your advice Max, and here is my reply to your suggestion:


# mdb -k
Loading modules: [ unix genunix specfs dtrace cpu.generic 
cpu_ms.AuthenticAMD.15 uppc pcplusmp scsi_vhci ufs ip hook neti sctp arp usba 
s1394 nca lofs zfs random md sppp smbsrv nfs ptm ipc crypto ]
 ::pgrep cp
SPID   PPID   PGIDSIDUID  FLAGS ADDR NAME
R889868889868501 0x4a004000 ff01deca9048 cp
 ff01deca9048::walk thread | ::threadlist -v
ADDR PROC  LWP CLS PRIWCHAN
ff01e0045840 ff01deca9048 ff01de9d9210   2  60 ff01d861ca80
  PC: _resume_from_idle+0xf1CMD: cp -pr testdir z1
  stack pointer for thread ff01e0045840: ff0007fcdf00
  [ ff0007fcdf00 _resume_from_idle+0xf1() ]
swtch+0x17f()
cv_wait+0x61()
zio_wait+0x5f()
dbuf_read+0x1b5()
dbuf_findbp+0xe8()
dbuf_prefetch+0x9b()
dmu_zfetch_fetch+0x43()
dmu_zfetch_dofetch+0xc2()
dmu_zfetch_find+0x3a1()
dmu_zfetch+0xa5()
dbuf_read+0xe3()
dmu_buf_hold_array_by_dnode+0x1c4()
dmu_read+0xd4()
zfs_fillpage+0x15e()
zfs_getpage+0x187()
fop_getpage+0x9f()
segvn_fault+0x9ef()
as_fault+0x5ae()
pagefault+0x95()
trap+0x1286() 
0xfb8001d9()  
fuword8+0x21()
zfs_write+0x147() 
fop_write+0x69()  
write+0x2af() 
write32+0x1e()
sys_syscall32+0x101() 
  

 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-01 Thread Simon Breden
Just to reduce my stress levels and to give the webmaster some useful info to 
help fix this broken forum:

I tried posting a reply to the main thread for 'cp -r hanged copying a 
directory' and got the following error -- seems like it can't find the parent 
thread/message's id in the database at a guess:

Got this multiple times when trying to post, and as no information is given on 
how to contact teh system administrator in the message -- here is the message 
and exception stack trace:


 An error in the system has occurred. Please contact the system administrator 
if the problem persists.

type: java.lang.IllegalArgumentException

java.lang.IllegalArgumentException: Parent key 230968 not found when adding 
child 231029.
at com.jivesoftware.util.LongTree.addChild(Unknown Source)
at com.jivesoftware.forum.database.DbTreeWalker.addChild(Unknown Source)
at com.jivesoftware.forum.database.DbForumThread.addMessage(Unknown 
Source)
at com.jivesoftware.forum.proxy.ForumThreadProxy.addMessage(Unknown 
Source)
at org.opensolaris.jive.action.ForumPostAction.createMessage(Unknown 
Source)
at com.jivesoftware.forum.action.PostAction.execute(Unknown Source)
at com.jivesoftware.forum.action.PostAction.doPost(Unknown Source)
at sun.reflect.GeneratedMethodAccessor881.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at 
com.opensymphony.xwork.DefaultActionInvocation.invokeAction(DefaultActionInvocation.java:300)
at 
com.opensymphony.xwork.DefaultActionInvocation.invoke(DefaultActionInvocation.java:166)
at 
com.opensymphony.xwork.interceptor.AroundInterceptor.intercept(AroundInterceptor.java:35)
at 
com.opensymphony.xwork.DefaultActionInvocation.invoke(DefaultActionInvocation.java:164)
at 
com.opensymphony.xwork.interceptor.AroundInterceptor.intercept(AroundInterceptor.java:35)
at 
com.opensymphony.xwork.DefaultActionInvocation.invoke(DefaultActionInvocation.java:164)
at 
com.opensymphony.xwork.interceptor.AroundInterceptor.intercept(AroundInterceptor.java:35)
at 
com.opensymphony.xwork.DefaultActionInvocation.invoke(DefaultActionInvocation.java:164)
at 
com.jivesoftware.forum.action.JiveExceptionInterceptor.intercept(Unknown Source)
at 
com.opensymphony.xwork.DefaultActionInvocation.invoke(DefaultActionInvocation.java:164)
at 
com.jivesoftware.base.action.JiveObjectLoaderInterceptor.intercept(Unknown 
Source)
at 
com.opensymphony.xwork.DefaultActionInvocation.invoke(DefaultActionInvocation.java:164)
at 
com.opensymphony.xwork.interceptor.AroundInterceptor.intercept(AroundInterceptor.java:35)
at 
com.opensymphony.xwork.DefaultActionInvocation.invoke(DefaultActionInvocation.java:164)
at 
com.opensymphony.xwork.interceptor.AroundInterceptor.intercept(AroundInterceptor.java:35)
at 
com.opensymphony.xwork.DefaultActionInvocation.invoke(DefaultActionInvocation.java:164)
at 
com.opensymphony.webwork.interceptor.FileUploadInterceptor.intercept(FileUploadInterceptor.java:71)
at 
com.opensymphony.xwork.DefaultActionInvocation.invoke(DefaultActionInvocation.java:164)
at 
com.opensymphony.xwork.interceptor.AroundInterceptor.intercept(AroundInterceptor.java:35)
at 
com.opensymphony.xwork.DefaultActionInvocation.invoke(DefaultActionInvocation.java:164)
at 
com.opensymphony.xwork.interceptor.AroundInterceptor.intercept(AroundInterceptor.java:35)
at 
com.opensymphony.xwork.DefaultActionInvocation.invoke(DefaultActionInvocation.java:164)
at 
com.opensymphony.xwork.interceptor.AroundInterceptor.intercept(AroundInterceptor.java:35)
at 
com.opensymphony.xwork.DefaultActionInvocation.invoke(DefaultActionInvocation.java:164)
at 
com.opensymphony.xwork.DefaultActionProxy.execute(DefaultActionProxy.java:116)
at 
com.opensymphony.webwork.dispatcher.ServletDispatcher.serviceAction(ServletDispatcher.java:272)
at com.jivesoftware.base.util.JiveWebWorkServlet.service(Unknown Source)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
at sun.reflect.GeneratedMethodAccessor156.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at 
org.apache.catalina.security.SecurityUtil$1.run(SecurityUtil.java:243)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAsPrivileged(Subject.java:517)
at 
org.apache.catalina.security.SecurityUtil.execute(SecurityUtil.java:275)
at 
org.apache.catalina.security.SecurityUtil.doAsPrivilege(SecurityUtil.java:161)
at 

Re: [zfs-discuss] ZFS still crashing after patch

2008-05-01 Thread Phillip Wagstrom -- Area SSE MidAmerica
Rustam wrote:
 Today my production server crashed  4 times. THIS IS NIGHTMARE! 
 Self-healing file system?! For me ZFS is SELF-KILLING filesystem.
 
 I cannot fsck it, there's no such tool. I cannot scrub it, it crashes
 30-40 minutes after scrub starts. I cannot use it, it crashes a
 number of times every day! And with every crash number of checksum
 failures is growing:
 
 NAMESTATE READ WRITE CKSUM box5ONLINE   0
 0 0 ...after a few hours... box5ONLINE   0 0
 4 ...after a few hours... box5ONLINE   0 0 62 
 ...after another few hours... box5ONLINE   0 0
 120 ...crash! and we start again... box5ONLINE   0 0
 0 ...etc...
 
 actually 120 is record, sometimes it crashed as soon as it boots.
 
 and always there's a permanent error: errors: Permanent errors have
 been detected in the following files: box5:0x0
 
 and very wise self-healing advice: http://www.sun.com/msg/ZFS-8000-8A
  Restore the file in question if possible.  Otherwise restore the
 entire pool from backup.
 
 Thanks, but if I restore it from backup it won't be ZFS anymore,
 that's for sure.

That's a bit harsh.  ZFS is telling you that you have corrupted data 
based on the checksums.  Other types of filesystems would likely simply 
pass the corrupted data on silently.

 It's not I/O problem. AFAIK, default ZFS I/O error behavior is wait
 to repair (i've 10U4, non-configurable). Then why it panics?

Do you have the panic messages?  ZFS won't cause panics based on bad 
checksums.  It will by default cause panic if it can't write data out to 
any device or if it completely loses access to non-redundant devices or 
loses both redundant devices at the same time.

 Recently there were discussions on failure of OpenSolaris community.
 Now it's been more than half a month since I reported such an error.
 Nobody even posted something like RTFM. Come on guys, I know you
 are there and busy with enterprise customers... but at least give me
 some troubleshooting ideas. i'm totally lost.
 
 just to remind, it's heavily loaded fs with 3-4 million files and
 folders.
 
 Link to original post: 
 http://www.opensolaris.org/jive/thread.jspa?threadID=57425

Since this seems to show the same number of checksum errors across 2 
different channels and 4 different drives.  Given that, I'd assume that 
this is likely a dual-channel HBA of some sort.  It would appear that 
you either have bad hardware or some sort of driver issue.

Regards,
Phil

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-01 Thread [EMAIL PROTECTED]
Hi Simon,
Simon Breden wrote:

 Thanks for your advice Max, and here is my reply to your suggestion:


 # mdb -k
 Loading modules: [ unix genunix specfs dtrace cpu.generic 
 cpu_ms.AuthenticAMD.15 uppc pcplusmp scsi_vhci ufs ip hook neti sctp arp usba 
 s1394 nca lofs zfs random md sppp smbsrv nfs ptm ipc crypto ]
   
 ::pgrep cp
 
 SPID   PPID   PGIDSIDUID  FLAGS ADDR NAME
 R889868889868501 0x4a004000 ff01deca9048 cp
   
 ff01deca9048::walk thread | ::threadlist -v
 
 ADDR PROC  LWP CLS PRIWCHAN
 ff01e0045840 ff01deca9048 ff01de9d9210   2  60 ff01d861ca80
   PC: _resume_from_idle+0xf1CMD: cp -pr testdir z1
   stack pointer for thread ff01e0045840: ff0007fcdf00
   [ ff0007fcdf00 _resume_from_idle+0xf1() ]
 swtch+0x17f()
 cv_wait+0x61()
 zio_wait+0x5f()
 dbuf_read+0x1b5()
 dbuf_findbp+0xe8()
 dbuf_prefetch+0x9b()
 dmu_zfetch_fetch+0x43()
 dmu_zfetch_dofetch+0xc2()
 dmu_zfetch_find+0x3a1()
 dmu_zfetch+0xa5()
 dbuf_read+0xe3()
 dmu_buf_hold_array_by_dnode+0x1c4()
 dmu_read+0xd4()
 zfs_fillpage+0x15e()
 zfs_getpage+0x187()
 fop_getpage+0x9f()
 segvn_fault+0x9ef()
 as_fault+0x5ae()
 pagefault+0x95()
 trap+0x1286() 
 0xfb8001d9()  
 fuword8+0x21()
 zfs_write+0x147() 
 fop_write+0x69()  
 write+0x2af() 
 write32+0x1e()
 sys_syscall32+0x101() 
   
   
So, a write has been issued, zfs is retrieving a page and is waiting for 
the pagein  to complete.  I'll take a further look tomorrow,
but maybe someone else reading this has an idea.  (It is midnight here).

max

  
  
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-01 Thread Richard Elling
Simon Breden wrote:
 Thanks a lot Richard. To give a bit more info, I've copied my 
 /var/adm/messages from
 booting up the machine:
   

 And @picker: I guess the 35 requests are stacked up waiting for the hanging 
 request to be serviced?

 The question I have is where do I go from now, to get some more info on what 
 is causing cp to have problems.

 I will now try another tack: use rsync to copy the directory to a disk 
 outside the pool (i.e. my home directory on the boot drive), to see if it is 
 happy doing that.


 May  1 17:04:15 zfsbox su: [ID 810491 auth.crit] 'su root' failed for simon 
 on /dev/pts/5
 May  1 17:48:15 zfsbox genunix: [ID 540533 kern.notice] ^MSunOS Release 5.11 
 Version snv_85 64-bit
 May  1 17:48:15 zfsbox genunix: [ID 172908 kern.notice] Copyright 1983-2008 
 Sun Microsystems, Inc.  All rights reserved.
 May  1 17:48:15 zfsbox Use is subject to license terms.
 May  1 17:48:15 zfsbox unix: [ID 126719 kern.info] features: 
 13f6fffcpuid,tscp,cmp,cx16,sse3,nx,asysc,sse2,sse,pat,cx8,pae,mca,mmx,cmov,de,pge,mtrr,msr,tsc,lgpg
 May  1 17:48:15 zfsbox unix: [ID 168242 kern.info] mem = 4192764K (0xffe7f000)
 May  1 17:48:15 zfsbox rootnex: [ID 466748 kern.info] root nexus = i86pc
 May  1 17:48:15 zfsbox rootnex: [ID 349649 kern.info] pseudo0 at root
 May  1 17:48:15 zfsbox genunix: [ID 936769 kern.info] pseudo0 is /pseudo
 May  1 17:48:15 zfsbox rootnex: [ID 349649 kern.info] scsi_vhci0 at root
 May  1 17:48:15 zfsbox genunix: [ID 936769 kern.info] scsi_vhci0 is /scsi_vhci
 May  1 17:48:15 zfsbox rootnex: [ID 349649 kern.info] isa0 at root
 May  1 17:48:15 zfsbox pci_autoconfig: [ID 139057 kern.info] NOTICE: 
 reprogram io-range on ppb[0/f/0]: 0x2000 ~ 0x2fff
 May  1 17:48:15 zfsbox pci_autoconfig: [ID 595143 kern.info] NOTICE: add 
 io-range on subtractive ppb[0/6/0]: 0x3000 ~ 0x3fff
 May  1 17:48:15 zfsbox pcplusmp: [ID 658230 kern.info] NOTICE: apic: local 
 nmi: 0 1 1 1
 May  1 17:48:15 zfsbox pcplusmp: [ID 658230 kern.info] NOTICE: apic: local 
 nmi: 1 1 1 1
 May  1 17:48:15 zfsbox pcplusmp: [ID 658230 kern.info] NOTICE: apic: local 
 nmi: 2 1 1 1
 May  1 17:48:15 zfsbox pcplusmp: [ID 658230 kern.info] NOTICE: apic: local 
 nmi: 3 1 1 1
 May  1 17:48:15 zfsbox pcplusmp: [ID 177709 kern.info] pcplusmp: vector 0x9 
 ioapic 0x4 intin 0x9 is bound to cpu 1
 May  1 17:48:15 zfsbox pseudo: [ID 129642 kern.info] pseudo-device: acpippm0
 May  1 17:48:15 zfsbox genunix: [ID 936769 kern.info] acpippm0 is 
 /pseudo/[EMAIL PROTECTED]
 May  1 17:48:15 zfsbox pseudo: [ID 129642 kern.info] pseudo-device: ppm0
 May  1 17:48:15 zfsbox genunix: [ID 936769 kern.info] ppm0 is /pseudo/[EMAIL 
 PROTECTED]
 May  1 17:48:15 zfsbox rootnex: [ID 349649 kern.info] npe0 at root: space 0 
 offset 0
 May  1 17:48:15 zfsbox genunix: [ID 936769 kern.info] npe0 is /[EMAIL 
 PROTECTED],0
 May  1 17:48:15 zfsbox pcplusmp: [ID 803547 kern.info] pcplusmp: ide (ata) 
 instance 0 vector 0xe ioapic 0x4 intin 0xe is bound to cpu 0
 May  1 17:48:15 zfsbox genunix: [ID 640982 kern.info] IDE device at 
 targ 0, lun 0 lastlun 0x0
 May  1 17:48:15 zfsbox genunix: [ID 846691 kern.info] model 
 HDS722516VLAT80
 May  1 17:48:15 zfsbox genunix: [ID 479077 kern.info] ATA/ATAPI-6 
 supported, majver 0x7c minver 0x19
 May  1 17:48:15 zfsbox genunix: [ID 640982 kern.info] ATAPI device at 
 targ 1, lun 0 lastlun 0x0
 May  1 17:48:15 zfsbox genunix: [ID 846691 kern.info] model 
 TSSTcorpDVD-ROM SH-D162C
 May  1 17:48:15 zfsbox npe: [ID 236367 kern.info] PCI Express-device: [EMAIL 
 PROTECTED], ata0
 May  1 17:48:15 zfsbox genunix: [ID 936769 kern.info] ata0 is /[EMAIL 
 PROTECTED],0/[EMAIL PROTECTED]/[EMAIL PROTECTED]
 May  1 17:48:15 zfsbox genunix: [ID 773945 kern.info] UltraDMA mode 2 
 selected
 May  1 17:48:15 zfsbox genunix: [ID 773945 kern.info] UltraDMA mode 5 
 selected
 May  1 17:48:15 zfsbox gda: [ID 243001 kern.info] Disk0:  Vendor 
 'Gen-ATA ' Product 'HDS722516VLAT80 '
 May  1 17:48:17 zfsbox ata: [ID 496167 kern.info] cmdk0 at ata0 target 0 lun 0
 May  1 17:48:17 zfsbox genunix: [ID 936769 kern.info] cmdk0 is /[EMAIL 
 PROTECTED],0/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED],0
 May  1 17:48:20 zfsbox unix: [ID 190185 kern.info] SMBIOS v2.4 loaded (2169 
 bytes)
 May  1 17:48:20 zfsbox genunix: [ID 408114 kern.info] /cpus (cpunex0) online
 May  1 17:48:20 zfsbox pseudo: [ID 129642 kern.info] pseudo-device: dld0
 May  1 17:48:20 zfsbox genunix: [ID 936769 kern.info] dld0 is /pseudo/[EMAIL 
 PROTECTED]
 May  1 17:48:20 zfsbox pcplusmp: [ID 803547 kern.info] pcplusmp: 
 pciexclass,060400 (pcie_pci) instance 1 vector 0x18 ioapic 0xff intin 0xff is 
 bound to cpu 1
 May  1 17:48:20 zfsbox npe: [ID 236367 kern.info] PCI Express-device: 
 pci10de,[EMAIL PROTECTED], pcie_pci1
 May  1 17:48:20 zfsbox genunix: [ID 936769 kern.info] pcie_pci1 is /[EMAIL 
 PROTECTED],0/pci10de,[EMAIL PROTECTED]
 May  1 17:48:21 zfsbox pcplusmp: [ID 803547 kern.info] pcplusmp: 

Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-01 Thread Richard Elling
[forget the BUI forum, e-mail works better, IMHO]

Simon Breden wrote:
 Thanks a lot Richard. To give a bit more info, I've copied my 
 /var/adm/messages from booting up the machine:
   

I don't see any major issues related to this problem in the messages.

 And @picker: I guess the 35 requests are stacked up waiting for the hanging 
 request to be serviced?
   

The 35 iops are queued in the device driver waiting to be sent to the
device (wait).  The iops queued to the device is in the active queue
(actv).

 The question I have is where do I go from now, to get some more info on what 
 is causing cp to have problems.
   

As your later mdb test confirmed, ZFS is patiently waiting on an I/O
to complete.  Solaris has sent that I/O to a device and hasn't heard a
reply yet.  By default, the sd driver will retry every 60 seconds, and
eventually fail the I/O which will bounce back up to ZFS as an I/O
error.  Please double check your drive firmware.
 -- richard


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS still crashing after patch

2008-05-01 Thread Rustam
 Is your ZFS pool configured with redundancy (e.g mirrors, raidz) or is
 it non-redundant? If non-redundant, then there is not much that ZFS
 can really do if a device begins to fail.

It's RAID 10 (more info here: 
http://www.opensolaris.org/jive/thread.jspa?threadID=57425):

NAME STATE READ WRITE CKSUM
box5 ONLINE 0 0 4
mirror ONLINE 0 0 2
c1d0 ONLINE 0 0 4
c2d0 ONLINE 0 0 4
mirror ONLINE 0 0 2
c2d1 ONLINE 0 0 4
c1d1 ONLINE 0 0 4

Actually, there's no damaged data so far. I don't get any unable to 
read/write kind of errors. It's just very strange checksum errors synchronized 
over all disks.

 That's a bit harsh.  ZFS is telling you that you u have corrupted data 
 based on the checksums.  Other types of filesystems would likely simply 
 pass the corrupted data on silently.

Checksums are good, no complaints about that.

 Do you have the panic messages?  ZFS won't cause panics based on bad 
 checksums.  It will by default cause panic if it can't write data out to 
 any device or if it completely loses access to non-redundant devices or 
 loses both redundant devices at the same time.

A number of panic messages and crash dump stack trace are attached to the 
original post (http://www.opensolaris.org/jive/thread.jspa?threadID=57425). 
Here is the short snip:

 ::status
debugging crash dump vmcore.5 (64-bit) from core
operating system: 5.10 Generic_127112-07 (i86pc)
panic message: BAD TRAP: type=e (#pf Page fault) rp=fe800017f8d0 addr=238 
occurred in module unix due to a NULL pointer dereference
dump content: kernel pages only

 ::stack
mutex_enter+0xb()
zio_buf_alloc+0x1a()
zio_read+0xba()
spa_scrub_io_start+0xf1()
spa_scrub_cb+0x13d()
traverse_callback+0x6a()
traverse_segment+0x118()
traverse_more+0x7b()
spa_scrub_thread+0x147()
thread_start+8()

 Since this seems to show the same number of checksum errors across 2 
 different channels and 4 different drives.  Given that, I'd assume that 
 this is likely a dual-channel HBA of some sort.  It would appear that 
 you either have bad hardware or some sort of driver issue.

You right, this is the dual-channel Intel's ICH6 SATA controller. 10U4 has 
native support/drivers for this SATA controller (AHCI drivers afaik). The thing 
is that this hardware and ZFS were in production for almost 2 years (ok, not 
the best argument). However this problem occurred recently (20 days). It's even 
more strange because I didn't made any OS/diver upgrade or patch during last 
2-3 months.

However, this is good point. I've seen some new SATA/AHCI drivers available in 
10U5. Maybe I should try to upgrade and see if it helps. Thanks Phil.

--
Rustam
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS still crashing after patch

2008-05-01 Thread Bob Friesenhahn
On Thu, 1 May 2008, Rustam wrote:

 operating system: 5.10 Generic_127112-07 (i86pc)

Seems kind of old.  I am using Generic_127112-11 here.

Probably many hundreds of nasty bugs have been eliminated since the 
version you are using.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss