date:20080103

Re: [zfs-discuss] ZFS Not Offlining Disk on SCSI Sense Error (X4500)

2008-01-03 Thread Jason J. W. Williams

Hi Eric,

Hard to say. I'll use MDB next time it happens for more info. The
applications using any zpool lock up.

-J

On Jan 3, 2008 3:33 PM, Eric Schrock <[EMAIL PROTECTED]> wrote:
> When you say "starts throwing sense errors," does that mean every I/O to
> the drive will fail, or some arbitrary percentage of I/Os will fail?  If
> it's the latter, ZFS is trying to do the right thing by recognizing
> these as transient errors, but eventually the ZFS diagnosis should kick
> in.  What does '::spa -ve' in 'mdb -k' show in one of these situations?
> How about '::zio_state'?
>
> - Eric
>
>
> On Thu, Jan 03, 2008 at 03:11:39PM -0700, Jason J. W. Williams wrote:
> > Hi Albert,
> >
> > Thank you for the link. ZFS isn't offlining the disk in b77.
> >
> > -J
> >
> > On Jan 3, 2008 3:07 PM, Albert Chin
> > <[EMAIL PROTECTED]> wrote:
> > >
> > > On Thu, Jan 03, 2008 at 02:57:08PM -0700, Jason J. W. Williams wrote:
> > > > There seems to be a persistent issue we have with ZFS where one of the
> > > > SATA disk in a zpool on a Thumper starts throwing sense errors, ZFS
> > > > does not offline the disk and instead hangs all zpools across the
> > > > system. If it is not caught soon enough, application data ends up in
> > > > an inconsistent state. We've had this issue with b54 through b77 (as
> > > > of last night).
> > > >
> > > > We don't seem to be the only folks with this issue reading through the
> > > > archives. Are there any plans to fix this behavior? It really makes
> > > > ZFS less than desirable/reliable.
> > >
> > > http://blogs.sun.com/eschrock/entry/zfs_and_fma
> > >
> > > FMA For ZFS Phase 2 (PSARC/2007/283) was integrated in b68:
> > >   http://www.opensolaris.org/os/community/arc/caselog/2007/283/
> > >   http://www.opensolaris.org/os/community/on/flag-days/all/
> > >
> > > --
> > > albert chin ([EMAIL PROTECTED])
> > > ___
> > > zfs-discuss mailing list
> > > zfs-discuss@opensolaris.org
> > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> > >
> > ___
> > zfs-discuss mailing list
> > zfs-discuss@opensolaris.org
> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
> --
> Eric Schrock, FishWorkshttp://blogs.sun.com/eschrock
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] [osol-help] ZFS woes

2008-01-03 Thread Ian Collins

Scott L. Burson wrote:
> Hi,
>
> This is in build 74, on x64, on a Tyan S2882-D with dual Opteron 275 and 24GB 
> of ECC DRAM.
>
>   
Not an answer, but zfs-discuss is probably the best place to ask, so
I've taken the liberty of CCing that list.

> I seem to have lost the entire contents of a ZFS raidz pool.  The pool is in 
> a state where, if ZFS looks at it, I get a kernel panic.  To make it possible 
> to boot the machine, I had to boot into safe mode and rename 
> `/etc/zfs/zpool.cache' (fortunately, this was my only pool on the machine).
>
> Okay, from the beginning.  I bought the drives in October: three 500GB 
> Western Digital WD5000ABYS SATA drives, installed them in the box in place of 
> three 250GB Seagates I had been using, and created the raidz pool.  For the 
> first couple of months everything was hunky dory.  Then, a couple of weeks 
> ago, I moved the machine to a different location in the building, which 
> wouldn't even be worth mentioning except that that's when I started to have 
> problems.  The first time I powered it up, one of the SATA drives didn't show 
> up; I reseated the drive connectors and tried again, and it seemed fine.  I 
> thought that was odd, since I hadn't had one of those connectors come loose 
> on me before, but I scrubbed the pool, cleared the errors on the drive, and 
> thought that was the end of it.
>
> It wasn't.  `zpool status' continued to report errors, only now they were 
> write and read errors, and spread across all three drives.  I started to copy 
> the most critical parts of the filesystem contents onto other machines (very 
> fortunately, as it turned out).  After a while, the drive that had previously 
> not shown up was marked faulted, and the other two were marked degraded.  
> Then, yesterday, there was a much larger number of errors -- over 3000 read 
> errors -- on a different drive, and that drive was marked faulted and the 
> other two (i.e. including the one that had previously been faulted) were 
> marked degraded.  Also, `zpool status' told me I had lost some "files"; these 
> turned out to be all, or mostly, directories, some containing substantial 
> trees.
>
> By this point I had already concluded I was going to have to replace a drive, 
> and had picked up a replacement.  I installed it in place of the drive that 
> was now marked faulted, and powered up.  I was met with repeated panics and 
> reboots.  I managed to copy down part of the backtrace:
>
>   unix:die+c8
>   unix:trap+1351
>   unix:cmntrap+e9
>   unix:mutex_enter+b
>   zfs:metaslab_free+97
>   zfs:zio_dva_free+29
>   zfs:zio_next_stage+b3
>   zfs:zio_gang_pipeline+??
>
> (This may contain typos, and I didn't get the offset on that last frame.)
>
> At this point I tried replacing the drive I had just removed (removing the 
> new, blank drive), but that didn't help.  So, as mentioned above, I tried 
> booting into safe mode and renaming `/etc/zfs/zpool.cache' -- just on a 
> hunch, but I figured there had to be some such way to make ZFS forget about 
> the pool -- and that allowed me to boot.
>
> I used good old `format' to run read tests on the drives overnight -- no bad 
> blocks were detected.
>
> So, there are a couple lines of discussion here.  On the one hand, it seems I 
> have a hardware problem, but I haven't yet diagnosed it.  More on this below. 
>  On the other, even in the face of hardware problems, I have to report some 
> disappointment with ZFS.  I had really been enjoying the warm fuzzy feeling 
> ZFS gave me (and I was talking it up to my colleagues; I'm the only one here 
> using it).  Now I'm in a worse state than I would probably be with UFS on 
> RAID, where `fsck' would probably have managed to salvage a lot of the 
> filesystem (I would certainly be able to mount it! -- unless the drives were 
> all failing catastrophically, which doesn't seem to be happening).
>
> One could say, there are two aspects to filesystem robustness: integrity 
> checking and recovery.  ZFS, with its block checksums, gets an A in integrity 
> checking, but now appears to do very poorly in recovering in the face of 
> substantial but not total hardware degradation, when that degradation is 
> sufficiently severe that the redundancy of the pool can't correct for it.
>
> Perhaps this is a vanishingly rare case and I am just very unlucky.  
> Nonetheless I would like to make some suggestions.  (1) It would still be 
> nice to have a salvager.  (2) I think it would make sense, at least as an 
> option, to add even more redundancy to ZFS's on-disk layout; for instance, it 
> could keep copies of all directories.
>
> Okay, back to my hardware problems.  I know you're going to tell me I 
> probably have a bad power supply, and I can't rule that out, but it's an 
> expensive PSU and generously sized for the box; and the box had been rock 
> stable for a good 18 months before this happened.  I'm naturally more 
> inclined to suspect the new components, which are the SATA drives.  (I

Re: [zfs-discuss] [zones-discuss] ZFS shared /home between zones

2008-01-03 Thread Ian Collins

James C. McPherson wrote:
>
> The ws command hates it - "hmm, the underlying device for
> /scratch is /scratch maybe if I loop around stat()ing
> it it'll turn into a pumpkin"
>
> :-)
>
>
>   
As does dmake, which is a real PITA for a developer!

Ian

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs panic on boot

2008-01-03 Thread Rob Logan

 > space_map_add+0xdb(ff014c1a21b8, 472785000, 1000)
 > space_map_load+0x1fc(ff014c1a21b8, fbd52568, 1,  
ff014c1a1e88, ff0149c88c30)
 > running snv79.

hmm.. did you spend any time in snv_74 or snv_75 that might
have gotten http://bugs.opensolaris.org/view_bug.do?bug_id=6603147

zdb -e 
would be interesting, but the damage might have been done.

Rob
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] rename(2) (mv(1)) between ZFS filesystems in the same zpool

2008-01-03 Thread Jonathan Loran




Joerg Schilling wrote:

Carsten Bormann <[EMAIL PROTECTED]> wrote:

  

On Dec 29 2007, at 08:33, Jonathan Loran wrote:



We snapshot the file as it exists at the time of
the mv in the old file system until all referring file handles are
closed, then destroy the single file snap.  I know, not easy to
implement, but that is the correct behavior, I believe.
  

Exactly.

Note that apart from open descriptors, there may be other links to the  
file on the old FS; it has to be clear whether writes to the file in  
the new FS change the file in the old FS or not.  I'd rather say they  
shouldn't.
Yes, this would be different from the normal rename(2) semantics with  
respect to multiply linked files.  And yes, the semantics of link(2)  
should also be consistent with this.



This in an interesting problem. Your proposal would imply that a file
may have different identities in different filesystems:

-   different st_dev

-   different st_ino

-   different link count

This cannot be implemented with a single "inode data" anymore.

Well, it is not impossible as my WOFS (mentioned before) implements
hardlinks via "inode relative symlinks". In order to allow this. a file
would need a storage pool global serial number that allows to match different
inode sets for the file.

Jörg

  


At first, as I mentioned in my earlier email, I was thinking we needed 
to emulate the cross-fs rename/link/etc behavior as it is currently 
implemented, where a file appears to actually be copied.  But now I'm 
not so sure. 

In Unixland, the ideal has always been to have the whole file system, 
kit and caboodle, singly rooted at /.  Heck, even devices are in the 
file system.  Of course, reality required that Programmatically, we 
needed to be aware of what file system your cwd is in.  At a minimum, 
it's returned in our various stat structs (st_dev). 

I can see I'm getting long winded, but I'm thinking: what is the value 
of having different behavior with a cross zfs file move, within the same 
pool as that  between  directories.  I'm not addressing the previous 
discussion about how to treat file handles, etc, but more about sharing 
open file blocks, linked across zfs boundaries before and after such a mv. 

I think the test is this: can we find a scenario where something would 
break if we did share the file blocks across zfs boundaries after such a 
mv?  For every example I've been able to think of, if I ask the 
question: what if I moved the file from one directory to the other, 
instead of across zfs boundaries, would it have been different? it's 
been no.  Comments please. 


Jon

--


- _/ _/  /   - Jonathan Loran -   -
-/  /   /IT Manager   -
-  _  /   _  / / Space Sciences Laboratory, UC Berkeley
-/  / /  (510) 643-5146 [EMAIL PROTECTED]
- __/__/__/   AST:7731^29u18e3




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS Not Offlining Disk on SCSI Sense Error (X4500)

2008-01-03 Thread Eric Schrock

When you say "starts throwing sense errors," does that mean every I/O to
the drive will fail, or some arbitrary percentage of I/Os will fail?  If
it's the latter, ZFS is trying to do the right thing by recognizing
these as transient errors, but eventually the ZFS diagnosis should kick
in.  What does '::spa -ve' in 'mdb -k' show in one of these situations?
How about '::zio_state'?

- Eric

On Thu, Jan 03, 2008 at 03:11:39PM -0700, Jason J. W. Williams wrote:
> Hi Albert,
> 
> Thank you for the link. ZFS isn't offlining the disk in b77.
> 
> -J
> 
> On Jan 3, 2008 3:07 PM, Albert Chin
> <[EMAIL PROTECTED]> wrote:
> >
> > On Thu, Jan 03, 2008 at 02:57:08PM -0700, Jason J. W. Williams wrote:
> > > There seems to be a persistent issue we have with ZFS where one of the
> > > SATA disk in a zpool on a Thumper starts throwing sense errors, ZFS
> > > does not offline the disk and instead hangs all zpools across the
> > > system. If it is not caught soon enough, application data ends up in
> > > an inconsistent state. We've had this issue with b54 through b77 (as
> > > of last night).
> > >
> > > We don't seem to be the only folks with this issue reading through the
> > > archives. Are there any plans to fix this behavior? It really makes
> > > ZFS less than desirable/reliable.
> >
> > http://blogs.sun.com/eschrock/entry/zfs_and_fma
> >
> > FMA For ZFS Phase 2 (PSARC/2007/283) was integrated in b68:
> >   http://www.opensolaris.org/os/community/arc/caselog/2007/283/
> >   http://www.opensolaris.org/os/community/on/flag-days/all/
> >
> > --
> > albert chin ([EMAIL PROTECTED])
> > ___
> > zfs-discuss mailing list
> > zfs-discuss@opensolaris.org
> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> >
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

--
Eric Schrock, FishWorkshttp://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS Not Offlining Disk on SCSI Sense Error (X4500)

2008-01-03 Thread Eric Schrock

This should be pretty much fixed on build 77.  It will lock up for the
duration of a single command timeout, but ZFS should recover quickly
without queueing up additional commands.  Since the default timeout is
60 seconds, and we retry 3 times, and we do a probe afterwards, you may
see hangs of up to 6 minutes.  Unfortunately there's not much we can do,
since that's the minimum amount of time to do two I/O operations to a
single drive (one that fails and one to do a basic probe of the disk).
You can tune down 'sd_io_time' to a more reasonable value to get shorter
command timeouts, but this may break slow things (like powered down
CD-ROM drives).

Other options at the ZFS level could be imagined, but would require
per-pool tunables:

1. Allowing I/O to complete as soon as it was on enough devices, instead
   of replicating to all devices.

2. Inventing a per-pool tunable that controlled timeouts independent
   of SCSI timeouts.

Neither of these is trivial, and both potentially compromise data
integrity, hence the lack of such features.  There's no easy solution to
the problem, but we're happy to hear ideas.

- Eric

On Thu, Jan 03, 2008 at 02:57:08PM -0700, Jason J. W. Williams wrote:
> Hello,
> 
> There seems to be a persistent issue we have with ZFS where one of the
> SATA disk in a zpool on a Thumper starts throwing sense errors, ZFS
> does not offline the disk and instead hangs all zpools across the
> system. If it is not caught soon enough, application data ends up in
> an inconsistent state. We've had this issue with b54 through b77 (as
> of last night).
> 
> We don't seem to be the only folks with this issue reading through the
> archives. Are there any plans to fix this behavior? It really makes
> ZFS less than desirable/reliable.
> 
> Best Regards,
> Jason
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

--
Eric Schrock, FishWorkshttp://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS Not Offlining Disk on SCSI Sense Error (X4500)

2008-01-03 Thread Jason J. W. Williams

Hi Eric,

I'd really like to suggest a helpful idea, but all I can suggest is an
end result. Running ZFS on top of STK arrays doing the RAID, they
offline their bad disks very quickly and the applications never
notice. In the X4500s, ZFS times out and locks up the applications. If
ZFS is going to be able to compete with the more traditional arrays it
seems the failure behavior has to be just as seamless.

-J

On Jan 3, 2008 3:03 PM, Eric Schrock <[EMAIL PROTECTED]> wrote:
> This should be pretty much fixed on build 77.  It will lock up for the
> duration of a single command timeout, but ZFS should recover quickly
> without queueing up additional commands.  Since the default timeout is
> 60 seconds, and we retry 3 times, and we do a probe afterwards, you may
> see hangs of up to 6 minutes.  Unfortunately there's not much we can do,
> since that's the minimum amount of time to do two I/O operations to a
> single drive (one that fails and one to do a basic probe of the disk).
> You can tune down 'sd_io_time' to a more reasonable value to get shorter
> command timeouts, but this may break slow things (like powered down
> CD-ROM drives).
>
> Other options at the ZFS level could be imagined, but would require
> per-pool tunables:
>
> 1. Allowing I/O to complete as soon as it was on enough devices, instead
>of replicating to all devices.
>
> 2. Inventing a per-pool tunable that controlled timeouts independent
>of SCSI timeouts.
>
> Neither of these is trivial, and both potentially compromise data
> integrity, hence the lack of such features.  There's no easy solution to
> the problem, but we're happy to hear ideas.
>
> - Eric
>
> On Thu, Jan 03, 2008 at 02:57:08PM -0700, Jason J. W. Williams wrote:
>
> > Hello,
> >
> > There seems to be a persistent issue we have with ZFS where one of the
> > SATA disk in a zpool on a Thumper starts throwing sense errors, ZFS
> > does not offline the disk and instead hangs all zpools across the
> > system. If it is not caught soon enough, application data ends up in
> > an inconsistent state. We've had this issue with b54 through b77 (as
> > of last night).
> >
> > We don't seem to be the only folks with this issue reading through the
> > archives. Are there any plans to fix this behavior? It really makes
> > ZFS less than desirable/reliable.
> >
> > Best Regards,
> > Jason
>
> > ___
> > zfs-discuss mailing list
> > zfs-discuss@opensolaris.org
> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
> --
> Eric Schrock, FishWorkshttp://blogs.sun.com/eschrock
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS Not Offlining Disk on SCSI Sense Error (X4500)

2008-01-03 Thread Jason J. W. Williams

Hi Albert,

Thank you for the link. ZFS isn't offlining the disk in b77.

-J

On Jan 3, 2008 3:07 PM, Albert Chin
<[EMAIL PROTECTED]> wrote:
>
> On Thu, Jan 03, 2008 at 02:57:08PM -0700, Jason J. W. Williams wrote:
> > There seems to be a persistent issue we have with ZFS where one of the
> > SATA disk in a zpool on a Thumper starts throwing sense errors, ZFS
> > does not offline the disk and instead hangs all zpools across the
> > system. If it is not caught soon enough, application data ends up in
> > an inconsistent state. We've had this issue with b54 through b77 (as
> > of last night).
> >
> > We don't seem to be the only folks with this issue reading through the
> > archives. Are there any plans to fix this behavior? It really makes
> > ZFS less than desirable/reliable.
>
> http://blogs.sun.com/eschrock/entry/zfs_and_fma
>
> FMA For ZFS Phase 2 (PSARC/2007/283) was integrated in b68:
>   http://www.opensolaris.org/os/community/arc/caselog/2007/283/
>   http://www.opensolaris.org/os/community/on/flag-days/all/
>
> --
> albert chin ([EMAIL PROTECTED])
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS Not Offlining Disk on SCSI Sense Error (X4500)

2008-01-03 Thread Albert Chin

On Thu, Jan 03, 2008 at 02:57:08PM -0700, Jason J. W. Williams wrote:
> There seems to be a persistent issue we have with ZFS where one of the
> SATA disk in a zpool on a Thumper starts throwing sense errors, ZFS
> does not offline the disk and instead hangs all zpools across the
> system. If it is not caught soon enough, application data ends up in
> an inconsistent state. We've had this issue with b54 through b77 (as
> of last night).
> 
> We don't seem to be the only folks with this issue reading through the
> archives. Are there any plans to fix this behavior? It really makes
> ZFS less than desirable/reliable.

http://blogs.sun.com/eschrock/entry/zfs_and_fma

FMA For ZFS Phase 2 (PSARC/2007/283) was integrated in b68:
  http://www.opensolaris.org/os/community/arc/caselog/2007/283/
  http://www.opensolaris.org/os/community/on/flag-days/all/

-- 
albert chin ([EMAIL PROTECTED])
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ZFS Not Offlining Disk on SCSI Sense Error (X4500)

2008-01-03 Thread Jason J. W. Williams

Hello,

There seems to be a persistent issue we have with ZFS where one of the
SATA disk in a zpool on a Thumper starts throwing sense errors, ZFS
does not offline the disk and instead hangs all zpools across the
system. If it is not caught soon enough, application data ends up in
an inconsistent state. We've had this issue with b54 through b77 (as
of last night).

We don't seem to be the only folks with this issue reading through the
archives. Are there any plans to fix this behavior? It really makes
ZFS less than desirable/reliable.

Best Regards,
Jason
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Bugid 6535160

2008-01-03 Thread Vincent Fox

We loaded Nevada_78 on a peer T2000 unit.  Imported the same ZFS pool.  I 
didn't even upgrade the pool since we wanted to be able to move it back to 
10u4.  Cut 'n paste of my colleague's email with the results:

Here's the latest Pepsi Challenge results.

Sol10u4 vs Nevada78. Same tuning options, same zpool, same storage, same SAN
switch - you get the idea. The only difference is the OS.

Sol10u4:
 4984: 82.878: Per-Operation Breakdown
closefile4404ops/s   0.0mb/s  0.0ms/op   19us/op-cpu
readfile4 404ops/s   6.3mb/s  0.1ms/op  109us/op-cpu
openfile4 404ops/s   0.0mb/s  0.1ms/op  112us/op-cpu
closefile3404ops/s   0.0mb/s  0.0ms/op   25us/op-cpu
fsyncfile3404ops/s   0.0mb/s 18.7ms/op 1168us/op-cpu
appendfilerand3   404ops/s   6.3mb/s  0.2ms/op  192us/op-cpu
readfile3 404ops/s   6.3mb/s  0.1ms/op  111us/op-cpu
openfile3 404ops/s   0.0mb/s  0.1ms/op  111us/op-cpu
closefile2404ops/s   0.0mb/s  0.0ms/op   24us/op-cpu
fsyncfile2404ops/s   0.0mb/s 19.0ms/op 1162us/op-cpu
appendfilerand2   404ops/s   6.3mb/s  0.2ms/op  173us/op-cpu
createfile2   404ops/s   0.0mb/s  0.3ms/op  334us/op-cpu
deletefile1   404ops/s   0.0mb/s  0.2ms/op  173us/op-cpu

 4984: 82.879: 
IO Summary:  318239 ops 5251.8 ops/s, (808/808 r/w)  25.2mb/s,   1228us
cpu/op,   9.7ms latency


Nevada78:
 1107: 82.554: Per-Operation Breakdown
closefile4   1223ops/s   0.0mb/s  0.0ms/op   22us/op-cpu
readfile41223ops/s  19.4mb/s  0.1ms/op  112us/op-cpu
openfile41223ops/s   0.0mb/s  0.1ms/op  128us/op-cpu
closefile3   1223ops/s   0.0mb/s  0.0ms/op   29us/op-cpu
fsyncfile3   1223ops/s   0.0mb/s  4.6ms/op  256us/op-cpu
appendfilerand3  1223ops/s  19.1mb/s  0.2ms/op  191us/op-cpu
readfile31223ops/s  19.9mb/s  0.1ms/op  116us/op-cpu
openfile31223ops/s   0.0mb/s  0.1ms/op  127us/op-cpu
closefile2   1223ops/s   0.0mb/s  0.0ms/op   28us/op-cpu
fsyncfile2   1223ops/s   0.0mb/s  4.4ms/op  239us/op-cpu
appendfilerand2  1223ops/s  19.1mb/s  0.1ms/op  159us/op-cpu
createfile2  1223ops/s   0.0mb/s  0.5ms/op  389us/op-cpu
deletefile1  1223ops/s   0.0mb/s  0.2ms/op  198us/op-cpu

 1107: 82.581: 
IO Summary:  954637 ops 15903.4 ops/s, (2447/2447 r/w)  77.5mb/s,
590us cpu/op,   2.6ms latency


That's a 3-4x improvement in ops/sec and average fsync time.


Here are the results from our UFS software mirror for comparison:
 4984: 211.056: Per-Operation Breakdown
closefile4465ops/s   0.0mb/s  0.0ms/op   23us/op-cpu
readfile4 465ops/s  12.6mb/s  0.1ms/op  142us/op-cpu
openfile4 465ops/s   0.0mb/s  0.1ms/op   83us/op-cpu
closefile3465ops/s   0.0mb/s  0.0ms/op   24us/op-cpu
fsyncfile3465ops/s   0.0mb/s  6.0ms/op  498us/op-cpu
appendfilerand3   465ops/s   7.3mb/s  1.7ms/op  282us/op-cpu
readfile3 465ops/s  11.1mb/s  0.1ms/op  132us/op-cpu
openfile3 465ops/s   0.0mb/s  0.1ms/op   84us/op-cpu
closefile2465ops/s   0.0mb/s  0.0ms/op   26us/op-cpu
fsyncfile2465ops/s   0.0mb/s  5.9ms/op  445us/op-cpu
appendfilerand2   465ops/s   7.3mb/s  1.1ms/op  231us/op-cpu
createfile2   465ops/s   0.0mb/s  2.2ms/op  443us/op-cpu
deletefile1   465ops/s   0.0mb/s  2.0ms/op  269us/op-cpu

 4984: 211.057: 
IO Summary:  366557 ops 6049.2 ops/s, (931/931 r/w)  38.2mb/s,912us
cpu/op,   4.8ms latency


So either we're hitting a pretty serious zfs bug, or they're purposely
holding back performance in Solaris 10 so that we all have a good reason to
upgrade to 11.  ;) 
 

-Nick
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs panic on boot

2008-01-03 Thread Gordon Ross

I'm seeing this too.  Nothing unusual happened before the panic.
Just a shutdown (init 5) and later startup.  I have the crashdump
and copy of the problem zpool (on swan).  Here's the stack trace:

> $C
ff0004463680 vpanic()
ff00044636b0 vcmn_err+0x28(3, f792ecf0, ff0004463778)
ff00044637a0 zfs_panic_recover+0xb6()
ff0004463830 space_map_add+0xdb(ff014c1a21b8, 472785000, 1000)
ff00044638e0 space_map_load+0x1fc(ff014c1a21b8, fbd52568, 1, 
ff014c1a1e88, ff0149c88c30)
ff0004463920 metaslab_activate+0x66(ff014c1a1e80, 4000)
ff00044639e0 metaslab_group_alloc+0x24e(ff014bdeb000, 4000, 3a6734, 
1435b, ff014baa9840, 2)
ff0004463ab0 metaslab_alloc_dva+0x1da(ff01477880c0, ff014beefa70, 
4000, ff014baa9840, 2, 0, 3a6734, 0)
ff0004463b50 metaslab_alloc+0x82(ff01477880c0, ff014beefa70, 4000, 
ff014baa9840, 3, 3a6734, 0, 0)
ff0004463ba0 zio_dva_allocate+0x62(ff014934c458)
ff0004463bd0 zio_execute+0x7f(ff014934c458)
ff0004463c60 taskq_thread+0x1a7(ff014bfb77a0)
ff0004463c70 thread_start+8()

This is on a Ferrari laptop (AMD X64) running snv79.
I'd love to rescue my zpool.  Any suggestions?

Thanks,
Gordon
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] [zones-discuss] ZFS shared /home between zones

2008-01-03 Thread Steve McKinty

In general you should not allow a Solaris system to be both an NFS server and 
NFS client for the same filesystem, irrespective of whether zones are involved. 
Among other problems, you can run into kernel deadlocks in some (rare) 
circumstances. This is documented in the NFS administration docs. A loopback 
mount is definitely the recommended approach.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] hot spare and resilvering problem

2008-01-03 Thread Maciej Olchowik

Hi,

> Do you have snapshots taking place (like in a cron job) during the
> resilver process?  If so, you may be hitting a bug that the resilver
> will restart from the beginning whenever a new snapshot occurs.  If
> you disable the snapshots during the resilver then it should complete
> to 100%.

no, I don't have snapshots taking place. I found that when I query zfs 
pool with "zpool status" it restarts resilvering process, strange ...

Anyway, after ~10 days resilvering has finally completed to 100%
  resilver completed with 0 errors on Wed Jan  2 12:46:10 2008

The filesystem is still slow however. When I try to run zpool iostat it
takes few hours to produce output, same with zfs create.

I can't even post output of "zpool status -v" as it takes that long to complete.

We have 11 disks (+1 hot spare) in raidZ config, why is the filesystem so slow 
even now when hot spare has replaced faulty disk ?

thanks,

Maciej
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] rename(2) (mv(1)) between ZFS filesystems in the same zpool

2008-01-03 Thread Joerg Schilling

Carsten Bormann <[EMAIL PROTECTED]> wrote:

> On Dec 29 2007, at 08:33, Jonathan Loran wrote:
>
> > We snapshot the file as it exists at the time of
> > the mv in the old file system until all referring file handles are
> > closed, then destroy the single file snap.  I know, not easy to
> > implement, but that is the correct behavior, I believe.
>
> Exactly.
>
> Note that apart from open descriptors, there may be other links to the  
> file on the old FS; it has to be clear whether writes to the file in  
> the new FS change the file in the old FS or not.  I'd rather say they  
> shouldn't.
> Yes, this would be different from the normal rename(2) semantics with  
> respect to multiply linked files.  And yes, the semantics of link(2)  
> should also be consistent with this.

This in an interesting problem. Your proposal would imply that a file
may have different identities in different filesystems:

-   different st_dev

-   different st_ino

-   different link count

This cannot be implemented with a single "inode data" anymore.

Well, it is not impossible as my WOFS (mentioned before) implements
hardlinks via "inode relative symlinks". In order to allow this. a file
would need a storage pool global serial number that allows to match different
inode sets for the file.

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] rename(2) (mv(1)) between ZFS filesystems in the same zpool

2008-01-03 Thread Carsten Bormann

On Dec 29 2007, at 08:33, Jonathan Loran wrote:

> We snapshot the file as it exists at the time of
> the mv in the old file system until all referring file handles are
> closed, then destroy the single file snap.  I know, not easy to
> implement, but that is the correct behavior, I believe.

Exactly.

Note that apart from open descriptors, there may be other links to the  
file on the old FS; it has to be clear whether writes to the file in  
the new FS change the file in the old FS or not.  I'd rather say they  
shouldn't.
Yes, this would be different from the normal rename(2) semantics with  
respect to multiply linked files.  And yes, the semantics of link(2)  
should also be consistent with this.

Gruesse, Carsten

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS Not Offlining Disk on SCSI Sense Error (X4500)

Re: [zfs-discuss] [osol-help] ZFS woes

Re: [zfs-discuss] [zones-discuss] ZFS shared /home between zones

Re: [zfs-discuss] zfs panic on boot

Re: [zfs-discuss] rename(2) (mv(1)) between ZFS filesystems in the same zpool

Re: [zfs-discuss] ZFS Not Offlining Disk on SCSI Sense Error (X4500)

Re: [zfs-discuss] ZFS Not Offlining Disk on SCSI Sense Error (X4500)

Re: [zfs-discuss] ZFS Not Offlining Disk on SCSI Sense Error (X4500)

Re: [zfs-discuss] ZFS Not Offlining Disk on SCSI Sense Error (X4500)

Re: [zfs-discuss] ZFS Not Offlining Disk on SCSI Sense Error (X4500)

[zfs-discuss] ZFS Not Offlining Disk on SCSI Sense Error (X4500)

Re: [zfs-discuss] Bugid 6535160

Re: [zfs-discuss] zfs panic on boot

Re: [zfs-discuss] [zones-discuss] ZFS shared /home between zones

Re: [zfs-discuss] hot spare and resilvering problem

Re: [zfs-discuss] rename(2) (mv(1)) between ZFS filesystems in the same zpool

Re: [zfs-discuss] rename(2) (mv(1)) between ZFS filesystems in the same zpool

17 matches

Site Navigation

Mail list logo

Footer information