[zfs-discuss] Snapshot size as reported by the USED property

2010-08-30 Thread Peter Radig
I create snapshots on my datasets quite frequently. My understanding of the 
USED property of a snapshot is that it indicates the amount of data that was 
written to the dataset after the snapshot was taken. But now I'm seeing a 
snapshot with USED == 0 where there was definitely write activity after it was 
taken.

Is my understanding wrong or do I see a thing that is not supposed to happen? I 
am on NCP3.

Thanks,
Peter
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] pool died during scrub

2010-08-30 Thread Jeff Bacon
I have a bunch of sol10U8 boxes with ZFS pools, most all raidz2 8-disk
stripe. They're all supermicro-based with retail LSI cards.

I've noticed a tendency for things to go a little bonkers during the
weekly scrub (they all scrub over the weekend), and that's when I'll
lose a disk here and there. OK, fine, that's sort of the point, and
they're SATA drives so things happen. 

I've never lost a pool though, until now. This is Not Fun. 

 ::status
debugging crash dump vmcore.0 (64-bit) from ny-fs4
operating system: 5.10 Generic_142901-10 (i86pc)
panic message:
BAD TRAP: type=e (#pf Page fault) rp=fe80007cb850 addr=28 occurred
in module zfs due to a NULL pointer dereference
dump content: kernel pages only
 $C
fe80007cb960 vdev_is_dead+2()
fe80007cb9a0 vdev_mirror_child_select+0x65()
fe80007cba00 vdev_mirror_io_start+0x44()
fe80007cba30 zio_vdev_io_start+0x159()
fe80007cba60 zio_execute+0x6f()
fe80007cba90 zio_wait+0x2d()
fe80007cbb40 arc_read_nolock+0x668()
fe80007cbbd0 dmu_objset_open_impl+0xcf()
fe80007cbc20 dsl_pool_open+0x4e()
fe80007cbcc0 spa_load+0x307()
fe80007cbd00 spa_open_common+0xf7()
fe80007cbd10 spa_open+0xb()
fe80007cbd30 pool_status_check+0x19()
fe80007cbd80 zfsdev_ioctl+0x1b1()
fe80007cbd90 cdev_ioctl+0x1d()
fe80007cbdb0 spec_ioctl+0x50()
fe80007cbde0 fop_ioctl+0x25()
fe80007cbec0 ioctl+0xac()
fe80007cbf10 _sys_sysenter_post_swapgs+0x14b()

  pool: srv
id: 9515618289022845993
 state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
devices and try again.
   see: http://www.sun.com/msg/ZFS-8000-6X
config:

srvUNAVAIL  missing device
  raidz2   ONLINE
c2t5000C5001F2CCE1Fd0  ONLINE
c2t5000C5001F34F5FAd0  ONLINE
c2t5000C5001F48D399d0  ONLINE
c2t5000C5001F485EC3d0  ONLINE
c2t5000C5001F492E42d0  ONLINE
c2t5000C5001F48549Bd0  ONLINE
c2t5000C5001F370919d0  ONLINE
c2t5000C5001F484245d0  ONLINE
  raidz2   ONLINE
c2t5F000B5C8187d0  ONLINE
c2t5F000B5C8157d0  ONLINE
c2t5F000B5C9101d0  ONLINE
c2t5F000B5C8167d0  ONLINE
c2t5F000B5C9120d0  ONLINE
c2t5F000B5C9151d0  ONLINE
c2t5F000B5C9170d0  ONLINE
c2t5F000B5C9180d0  ONLINE
  raidz2   ONLINE
c2t5000C50010A88E76d0  ONLINE
c2t5000C5000DCD308Cd0  ONLINE
c2t5000C5001F1F456Dd0  ONLINE
c2t5000C50010920E06d0  ONLINE
c2t5000C5001F20C81Fd0  ONLINE
c2t5000C5001F3C7735d0  ONLINE
c2t5000C500113BC008d0  ONLINE
c2t5000C50014CD416Ad0  ONLINE

Additional devices are known to be part of this pool, though
their
exact configuration cannot be determined.


All of this would be ok... except THOSE ARE THE ONLY DEVICES THAT WERE
PART OF THE POOL. How can it be missing a device that didn't exist? 

A zpool import -fF results in the above kernel panic. This also
creates /etc/zfs/zpool.cache.tmp, which then results in the pool being
imported, which leads to a continuous reboot/panic cycle. 

I can't obviously use b134 to import the pool without logs, since that
would imply upgrading the pool first, which is hard to do if it's not
imported. 

My zdb skills are lacking - zdb -l gets you about so far and that's it.
(where the heck are the other options to zdb even written down, besides
in the code?)

OK, so this isn't the end of the world, but it's 15TB of data I'd really
rather not have to re-copy across a 100Mbit line. It really more
concerns me that ZFS would do this in the first place - it's not
supposed to corrupt itself!!
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs lists discrepancy after added a new vdev to pool

2010-08-30 Thread Darin Perusich

On Saturday, August 28, 2010 06:04:17 am Mattias Pantzare wrote:
 On Sat, Aug 28, 2010 at 02:54, Darin Perusich
 
 darin.perus...@cognigencorp.com wrote:
  Hello All,
  
  I'm sure this has been discussed previously but I haven't been able to
  find an answer to this. I've added another raidz1 vdev to an existing
  storage pool and the increased available storage isn't reflected in the
  'zfs list' output. Why is this?
  
  The system in question is runnning Solaris 10 5/09 s10s_u7wos_08, kernel
  Generic_139555-08. The system does not have the lastest patches which
  might be the cure.
  
  Thanks!
  
 
 I think you have to explain your problem more, 392G is more than 196G?

This is actually the wrong output, it was the end of a LONG day. Here's the 
correct output.

zpool create datapool raidz1 c1t50060E800042AA70d0 c1t50060E800042AA70d1
zpool list
NAME   SIZE   USED  AVAILCAP  HEALTH  ALTROOT
datapool   398G   191K   398G 0%  ONLINE  -

zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
datapool91K   196G 1K  /datapool

zpool add datapool raidz c1t50060E800042AA70d2 c1t50060E800042AA70d3

zpool list
NAME   SIZE   USED  AVAILCAP  HEALTH  ALTROOT
datapool   796G   231K   796G 0%  ONLINE  -

zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
datapool   111K   392G18K  /datapool

-- 
Darin Perusich
Unix Systems Administrator
Cognigen Corporation
395 Youngs Rd.
Williamsville, NY 14221
Phone: 716-633-3463
Email: darin...@cognigencorp.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs lists discrepancy after added a new vdev to pool

2010-08-30 Thread Darin Perusich

On Saturday, August 28, 2010 12:27:36 am Edho P Arief wrote:
 On Sat, Aug 28, 2010 at 7:54 AM, Darin Perusich
 
 darin.perus...@cognigencorp.com wrote:
  Hello All,
  
  I'm sure this has been discussed previously but I haven't been able to
  find an answer to this. I've added another raidz1 vdev to an existing
  storage pool and the increased available storage isn't reflected in the
  'zfs list' output. Why is this?
 
 you must do zpool export followed by zpool import

I tried this but it didn't have any effect.

-- 
Darin Perusich
Unix Systems Administrator
Cognigen Corporation
395 Youngs Rd.
Williamsville, NY 14221
Phone: 716-633-3463
Email: darin...@cognigencorp.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] VM's on ZFS - 7210

2010-08-30 Thread Eff Norwood
As I said, please by all means try it and post your benchmarks for first hour, 
first day and first week and then first month. The data will be of interest to 
you. On a subjective basis, if you feel that an SSD is working just fine as 
your ZIL, run with it. Good luck!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs lists discrepancy after added a new vdev to pool

2010-08-30 Thread Darin Perusich

On Saturday, August 28, 2010 05:56:27 am Tomas Ögren wrote:
 On 27 August, 2010 - Darin Perusich sent me these 2,1K bytes:
  Hello All,
  
  I'm sure this has been discussed previously but I haven't been able to
  find an answer to this. I've added another raidz1 vdev to an existing
  storage pool and the increased available storage isn't reflected in the
  'zfs list' output. Why is this?
  
  The system in question is runnning Solaris 10 5/09 s10s_u7wos_08, kernel
  Generic_139555-08. The system does not have the lastest patches which
  might be the cure.
  
  Thanks!
  
  Here's what I'm seeing.
  zpool create datapool raidz1 c1t50060E800042AA70d0  c1t50060E800042AA70d1
 
 Just fyi, this is an inefficient variant of a mirror. More cpu required
 and lower performance.
 

This is a testing setup, the production pool is currently 1 raidz1 vdev split 
across 6 disks. Thanks for the heads up though.

-- 
Darin Perusich
Unix Systems Administrator
Cognigen Corporation
395 Youngs Rd.
Williamsville, NY 14221
Phone: 716-633-3463
Email: darin...@cognigencorp.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs lists discrepancy after added a new vdev to pool

2010-08-30 Thread Richard Elling
This is a FAQ
Why doesn't the space that is reported by the zpool list command and the zfs 
list command match?
http://hub.opensolaris.org/bin/view/Community+Group+zfs/faq

 -- richard

On Aug 30, 2010, at 5:47 AM, Darin Perusich wrote:

 
 On Saturday, August 28, 2010 06:04:17 am Mattias Pantzare wrote:
 On Sat, Aug 28, 2010 at 02:54, Darin Perusich
 
 darin.perus...@cognigencorp.com wrote:
 Hello All,
 
 I'm sure this has been discussed previously but I haven't been able to
 find an answer to this. I've added another raidz1 vdev to an existing
 storage pool and the increased available storage isn't reflected in the
 'zfs list' output. Why is this?
 
 The system in question is runnning Solaris 10 5/09 s10s_u7wos_08, kernel
 Generic_139555-08. The system does not have the lastest patches which
 might be the cure.
 
 Thanks!
 
 
 I think you have to explain your problem more, 392G is more than 196G?
 
 This is actually the wrong output, it was the end of a LONG day. Here's the 
 correct output.
 
 zpool create datapool raidz1 c1t50060E800042AA70d0 c1t50060E800042AA70d1
 zpool list
 NAME   SIZE   USED  AVAILCAP  HEALTH  ALTROOT
 datapool   398G   191K   398G 0%  ONLINE  -
 
 zfs list
 NAME   USED  AVAIL  REFER  MOUNTPOINT
 datapool91K   196G 1K  /datapool
 
 zpool add datapool raidz c1t50060E800042AA70d2 c1t50060E800042AA70d3
 
 zpool list
 NAME   SIZE   USED  AVAILCAP  HEALTH  ALTROOT
 datapool   796G   231K   796G 0%  ONLINE  -
 
 zfs list
 NAME   USED  AVAIL  REFER  MOUNTPOINT
 datapool   111K   392G18K  /datapool
 
 -- 
 Darin Perusich
 Unix Systems Administrator
 Cognigen Corporation
 395 Youngs Rd.
 Williamsville, NY 14221
 Phone: 716-633-3463
 Email: darin...@cognigencorp.com
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 
OpenStorage Summit, October 25-27, Palo Alto, CA
http://nexenta-summit2010.eventbrite.com
ZFS and performance consulting
http://www.RichardElling.com












___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Postmortem - file system recovered [SEC=UNCLASSIFIED]

2010-08-30 Thread Brian
I am afraid I can't describe the exact procedure that eventually fixed the file 
system as I merely observed it while Victor was logged into my system.  I am 
quoting from the explanation he provided but if he reads this perhaps he could 
add whatever details seem pertinent.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] pool died during scrub

2010-08-30 Thread Mark J Musante

On Mon, 30 Aug 2010, Jeff Bacon wrote:




All of this would be ok... except THOSE ARE THE ONLY DEVICES THAT WERE 
PART OF THE POOL. How can it be missing a device that didn't exist?


The device(s) in question are probably the logs you refer to here:

I can't obviously use b134 to import the pool without logs, since that 
would imply upgrading the pool first, which is hard to do if it's not 
imported.


The stack trace you show is indicative of a memory corruption that may 
have gotten out to disk.  In other words, ZFS wrote data to ram, ram was 
corrupted, then the checksum was calculated and the result was written 
out.


Do you have a core dump from the panic?  Also, what kind of DRAM does this 
system use?


If you're lucky, then there's no corruption and instead it's a stale 
config that's causing the problem.  Try removing /etc/zfs/zpool.cache and 
then doing an zpool import -a

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] pool died during scrub

2010-08-30 Thread Jeff Bacon
  All of this would be ok... except THOSE ARE THE ONLY DEVICES THAT
WERE
  PART OF THE POOL. How can it be missing a device that didn't exist?
 
 The device(s) in question are probably the logs you refer to here:

There is a log, with a different GUID, from another pool from long ago.
It isn't valid. I clipped that: 

ny-fs4(71)# zpool import
  pool: srv
id: 6111323963551805601
 state: UNAVAIL
status: The pool was last accessed by another system.
action: The pool cannot be imported due to damaged devices or data.
   see: http://www.sun.com/msg/ZFS-8000-EY
config:

srv   UNAVAIL  insufficient replicas
logs
srv   UNAVAIL  insufficient replicas
  mirror  ONLINE
c3t0d0s4  ONLINE  box doesn't even have a c3
c0t0d0s4  ONLINE  what it's looking at - leftover from
who knows what

  pool: srv
id: 9515618289022845993
 state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
devices and try again.
   see: http://www.sun.com/msg/ZFS-8000-6X
config:



  I can't obviously use b134 to import the pool without logs, since
that
  would imply upgrading the pool first, which is hard to do if it's
not
  imported.
 The stack trace you show is indicative of a memory corruption that may
 have gotten out to disk.  In other words, ZFS wrote data to ram, ram
was
 corrupted, then the checksum was calculated and the result was written
 out.

Now this worries me. Granted, box works fairly hard, but ... no ECC
events to IPMI that I can see. Possible that the controller ka-futzed
somehow... but then presumably there should be SOME valid data to go
back to here somewhere?

The one fairly unusual item about this box is that it has another pool
with 12 15k SAS drives, which has a mysql database on it which gets
fairly well thrashed on a permanent basis.

 Do you have a core dump from the panic?  Also, what kind of DRAM
 does this system use?

It has 12 4GB DDR3-1066 ECC REG DIMMs. 

I can regenerate the panic on command (try to import the pool with -F
and it will go back into reboot loop mode). I pulled the stack from a
core dump. 


 If you're lucky, then there's no corruption and instead it's a
 stale config that's causing the problem.  Try removing
 /etc/zfs/zpool.cache and then doing an zpool import -a

Not nearly that lucky. It won't import. If it goes into reboot mode, the
only thing you can do is go to single-user, remove the cache, and reboot
so it forgets about the pool.



(Please, no rumblings from the peanut gallery about the evils of SATA or
SAS/SATA encapsulation. This is the only box in this mode. The mysql
database is an RTG stats database whose loss is not the end of the
world. The dataset is replicated in two other sites, this is a local
copy - just that it's 15TB, and as I said, recovery is, well,
time-consuming and therefore not the preferred option.

Real Production Boxes - slowly coming on line - are all using the
SuperMicro E26 dual-port backplane with 2TB constellation SAS drives on
paired LSI 9211-8is, with aforementioned ECC REG RAM, and I'm trying to
figure out how to either
 -- get my hands on SAS SSDs (of which there appears to be one, the new
OCZ Vertex 2 Pro), or
 -- install interposers in front of SATA SSDs so at least the
controllers aren't dealing with SATA encap - the big challenge being, of
all things, the form factor and the tray 

I think I'm going to yank the SAS drives out and migrate them so that
they're on a separate backplane and controller)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Intermittent ZFS hang

2010-08-30 Thread Charles J. Knipe
Howdy,

We're having a ZFS performance issue over here that I was hoping you guys could 
help me troubleshoot.  We have a ZFS pool made up of 24 disks, arranged into 7 
raid-z devices of 4 disks each.  We're using it as an iSCSI back-end for VMWare 
and some Oracle RAC clusters.

Under normal circumstances performance is very good both in benchmarks and 
under real-world use.  Every couple days, however, I/O seems to hang for 
anywhere between several seconds and several minutes.  The hang seems to be a 
complete stop of all write I/O.  The following zpool iostat illustrates:

pool0   2.47T  5.13T120  0   293K  0
pool0   2.47T  5.13T127  0   308K  0
pool0   2.47T  5.13T131  0   322K  0
pool0   2.47T  5.13T144  0   347K  0
pool0   2.47T  5.13T135  0   331K  0
pool0   2.47T  5.13T122  0   295K  0
pool0   2.47T  5.13T135  0   330K  0

While this is going on our VMs all hang, as do any zfs create commands or 
attempts to touch/create files in the zfs pool from the local system.  After 
several minutes the system un-hangs and we see very high write rates before 
things return to normal across the board.

Some more information about our configuration:  We're running OpenSolaris 
svn-134.  ZFS is at version 22.  Our disks are 15kRPM 300gb Seagate Cheetahs, 
mounted in Promise J610S Dual enclosures, hanging off a Dell SAS 5/e 
controller.  We'd tried out most of this configuration previously on 
OpenSolaris 2009.06 without running into this problem.  The only thing that's 
new, aside from the newer OpenSolaris/ZFS is a set of four SSDs configured as 
log disks.

At first we blamed de-dupe, but we've disabled that.  Next we suspected the SSD 
log disks, but we've seen the problem with those removed, as well.

Has anyone seen anything like this before?  Are there any tools we can use to 
gather information during the hang which might be useful in determining what's 
going wrong?

Thanks for any insights you may have.

-Charles
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] pool scrub clean, filesystem broken

2010-08-30 Thread Brian
I've posted a post-mortem followup thread:

http://opensolaris.org/jive/thread.jspa?threadID=133472
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Terrible ZFS performance on a Dell 1850 w/ PERC 4e/Si (Sol10U6)

2010-08-30 Thread Andrei Ghimus
I have the same problem you do, ZFS performance under Solaris 10 u8 is horrible.

When you say passthrough mode, do you mean non-RAID configuration?
And if so, could you tell me how you configured it?

The best I can manage is to configure each physical drive as a RAID 0 array 
then export that as a logical drive. 

All tips/suggestions are appreciated.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Intermittent ZFS hang

2010-08-30 Thread Remco Lengers

 Charles,

Did you check for any HW issues reported during the hangs? fmdump -ev 
and the like?


..Remco

On 8/30/10 6:02 PM, Charles J. Knipe wrote:

Howdy,

We're having a ZFS performance issue over here that I was hoping you guys could 
help me troubleshoot.  We have a ZFS pool made up of 24 disks, arranged into 7 
raid-z devices of 4 disks each.  We're using it as an iSCSI back-end for VMWare 
and some Oracle RAC clusters.

Under normal circumstances performance is very good both in benchmarks and 
under real-world use.  Every couple days, however, I/O seems to hang for 
anywhere between several seconds and several minutes.  The hang seems to be a 
complete stop of all write I/O.  The following zpool iostat illustrates:

pool0   2.47T  5.13T120  0   293K  0
pool0   2.47T  5.13T127  0   308K  0
pool0   2.47T  5.13T131  0   322K  0
pool0   2.47T  5.13T144  0   347K  0
pool0   2.47T  5.13T135  0   331K  0
pool0   2.47T  5.13T122  0   295K  0
pool0   2.47T  5.13T135  0   330K  0

While this is going on our VMs all hang, as do any zfs create commands or attempts to 
touch/create files in the zfs pool from the local system.  After several minutes the system 
un-hangs and we see very high write rates before things return to normal across the 
board.

Some more information about our configuration:  We're running OpenSolaris 
svn-134.  ZFS is at version 22.  Our disks are 15kRPM 300gb Seagate Cheetahs, 
mounted in Promise J610S Dual enclosures, hanging off a Dell SAS 5/e 
controller.  We'd tried out most of this configuration previously on 
OpenSolaris 2009.06 without running into this problem.  The only thing that's 
new, aside from the newer OpenSolaris/ZFS is a set of four SSDs configured as 
log disks.

At first we blamed de-dupe, but we've disabled that.  Next we suspected the SSD 
log disks, but we've seen the problem with those removed, as well.

Has anyone seen anything like this before?  Are there any tools we can use to 
gather information during the hang which might be useful in determining what's 
going wrong?

Thanks for any insights you may have.

-Charles

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Intermittent ZFS hang

2010-08-30 Thread David Blasingame Oracle

Charles,

Is it just ZFS hanging (or what it appears to be is slowing down or 
blocking) or does the whole system hang? 


A couple of questions

What does iostat show during the time period of the slowdown?
What does mpstat show during the time of the slowdown?

You can look at the metadata statistics by running the following.

echo ::arc | mdb -k

When looking at a ZFS problem, I usually like to gather

echo ::spa | mdb -k

echo ::zio_state | mdb -k

I suspect you could drill down more with dtrace or lockstat to see where 
the slowdown is happening.


Dave


On 08/30/10 11:02, Charles J. Knipe wrote:

Howdy,

We're having a ZFS performance issue over here that I was hoping you guys could 
help me troubleshoot.  We have a ZFS pool made up of 24 disks, arranged into 7 
raid-z devices of 4 disks each.  We're using it as an iSCSI back-end for VMWare 
and some Oracle RAC clusters.

Under normal circumstances performance is very good both in benchmarks and 
under real-world use.  Every couple days, however, I/O seems to hang for 
anywhere between several seconds and several minutes.  The hang seems to be a 
complete stop of all write I/O.  The following zpool iostat illustrates:

pool0   2.47T  5.13T120  0   293K  0
pool0   2.47T  5.13T127  0   308K  0
pool0   2.47T  5.13T131  0   322K  0
pool0   2.47T  5.13T144  0   347K  0
pool0   2.47T  5.13T135  0   331K  0
pool0   2.47T  5.13T122  0   295K  0
pool0   2.47T  5.13T135  0   330K  0

While this is going on our VMs all hang, as do any zfs create commands or attempts to 
touch/create files in the zfs pool from the local system.  After several minutes the system 
un-hangs and we see very high write rates before things return to normal across the 
board.

Some more information about our configuration:  We're running OpenSolaris 
svn-134.  ZFS is at version 22.  Our disks are 15kRPM 300gb Seagate Cheetahs, 
mounted in Promise J610S Dual enclosures, hanging off a Dell SAS 5/e 
controller.  We'd tried out most of this configuration previously on 
OpenSolaris 2009.06 without running into this problem.  The only thing that's 
new, aside from the newer OpenSolaris/ZFS is a set of four SSDs configured as 
log disks.

At first we blamed de-dupe, but we've disabled that.  Next we suspected the SSD 
log disks, but we've seen the problem with those removed, as well.

Has anyone seen anything like this before?  Are there any tools we can use to 
gather information during the hang which might be useful in determining what's 
going wrong?

Thanks for any insights you may have.

-Charles
  



--




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Intermittent ZFS hang

2010-08-30 Thread Charles J. Knipe
David,

Thanks for your reply.  Answers to your questions are below.

 Is it just ZFS hanging (or what it appears to be is
 slowing down or
 blocking) or does the whole system hang?nbsp; br

Only the ZFS storage is affected.  Any attempt to write to it blocks until the 
issue passes.  Other than that the system behaves normally.  I have not, as far 
as I remember, tried writing to the root pool while this is going on, I'll have 
to check that next time.  I suspect the problem is likely limited to a single 
pool.

 What does iostat show during the time period of the
 slowdown?br
 What does mpstat show during the time of the
 slowdown?br
 br
 You can look at the metadata statistics by running
 the following.
 echo ::arc | mdb -kbr
 When looking at a ZFS problem, I usually like to
 gather
 echo ::spa | mdb -kbr
 echo ::zio_state | mdb -kbr

I will plan to dump information from all of these sources next time I can catch 
it in the act.  Any other diag commands you think might be useful?

 I suspect you could drill down more with dtrace or
 lockstat to see
 where the slowdown is happening.

I'm brand new to DTrace.  I'm doing some reading now toward being in a position 
to ask intelligent questions.

-Charles
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 4k block alignment question (X-25E)

2010-08-30 Thread Eric D. Mudama

On Mon, Aug 30 at 15:05, Ray Van Dolson wrote:

I want to fix (as much as is possible) a misalignment issue with an
X-25E that I am using for both OS and as an slog device.

This is on x86 hardware running Solaris 10U8.

Partition table looks as follows:

Part  TagFlag CylindersSizeBlocks
 0   rootwm   1 - 1306   10.00GB(1306/0/0) 20980890
 1 unassignedwu   0   0 (0/0/0)   0
 2 backupwm   0 - 3886   29.78GB(3887/0/0) 62444655
 3 unassignedwu1307 - 3886   19.76GB(2580/0/0) 41447700
 4 unassignedwu   0   0 (0/0/0)   0
 5 unassignedwu   0   0 (0/0/0)   0
 6 unassignedwu   0   0 (0/0/0)   0
 7 unassignedwu   0   0 (0/0/0)   0
 8   bootwu   0 -07.84MB(1/0/0)   16065
 9 unassignedwu   0   0 (0/0/0)   0

And here is fdisk:

Total disk size is 3890 cylinders
Cylinder size is 16065 (512 byte) blocks

  Cylinders
 Partition   StatusType  Start   End   Length%
 =   ==  =   ===   ==   ===
 1   ActiveSolaris   1  38893889100

Slice 0 is where the OS lives and slice 3 is our slog.  As you can see
from the fdisk partition table (and from the slice view), the OS
partition starts on cylinder 1 -- which is not 4k aligned.

I don't think there is much I can do to fix this without reinstalling.

However, I'm most concerned about the slog slice and would like to
recreate its partition such that it begins on cylinder 1312.

So a few questions:

   - Would making s3 be 4k block aligned help even though s0 is not?
   - Do I need to worry about 4k block aligning the *end* of the
 slice?  eg instead of ending s3 on cylinder 3886, end it on 3880
 instead?

Thanks,
Ray


Do you specifically have benchmark data indicating unaligned or
aligned+offset access on the X25-E is significantly worse than aligned
access?

I'd thought the tier1 SSDs didn't have problems with these workloads.

--eric

--
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 4k block alignment question (X-25E)

2010-08-30 Thread Ray Van Dolson
On Mon, Aug 30, 2010 at 03:37:52PM -0700, Eric D. Mudama wrote:
 On Mon, Aug 30 at 15:05, Ray Van Dolson wrote:
 I want to fix (as much as is possible) a misalignment issue with an
 X-25E that I am using for both OS and as an slog device.
 
 This is on x86 hardware running Solaris 10U8.
 
 Partition table looks as follows:
 
 Part  TagFlag CylindersSizeBlocks
   0   rootwm   1 - 1306   10.00GB(1306/0/0) 20980890
   1 unassignedwu   0   0 (0/0/0)   0
   2 backupwm   0 - 3886   29.78GB(3887/0/0) 62444655
   3 unassignedwu1307 - 3886   19.76GB(2580/0/0) 41447700
   4 unassignedwu   0   0 (0/0/0)   0
   5 unassignedwu   0   0 (0/0/0)   0
   6 unassignedwu   0   0 (0/0/0)   0
   7 unassignedwu   0   0 (0/0/0)   0
   8   bootwu   0 -07.84MB(1/0/0)   16065
   9 unassignedwu   0   0 (0/0/0)   0
 
 And here is fdisk:
 
  Total disk size is 3890 cylinders
  Cylinder size is 16065 (512 byte) blocks
 
Cylinders
   Partition   StatusType  Start   End   Length%
   =   ==  =   ===   ==   ===
   1   ActiveSolaris   1  38893889100
 
 Slice 0 is where the OS lives and slice 3 is our slog.  As you can see
 from the fdisk partition table (and from the slice view), the OS
 partition starts on cylinder 1 -- which is not 4k aligned.
 
 I don't think there is much I can do to fix this without reinstalling.
 
 However, I'm most concerned about the slog slice and would like to
 recreate its partition such that it begins on cylinder 1312.
 
 So a few questions:
 
 - Would making s3 be 4k block aligned help even though s0 is not?
 - Do I need to worry about 4k block aligning the *end* of the
   slice?  eg instead of ending s3 on cylinder 3886, end it on 3880
   instead?
 
 Thanks,
 Ray
 
 Do you specifically have benchmark data indicating unaligned or
 aligned+offset access on the X25-E is significantly worse than aligned
 access?
 
 I'd thought the tier1 SSDs didn't have problems with these workloads.

I've been experiencing heavy Device Not Ready errors with this
configuration, and thought perhaps it could be exacerbated by the block
alignment issue.

See this thread[1].

So this would be a troubleshooting step to attempt to further isolate
the problem -- by eliminating the 4k alignment issue as a factor.

Just want to make sure I set up the alignment as optimally as possible.

Ray

[1] http://markmail.org/message/5rmfzvqwlmosh2oh
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 4k block alignment question (X-25E)

2010-08-30 Thread Richard Elling
comment below...

On Aug 30, 2010, at 3:42 PM, Ray Van Dolson wrote:

 On Mon, Aug 30, 2010 at 03:37:52PM -0700, Eric D. Mudama wrote:
 On Mon, Aug 30 at 15:05, Ray Van Dolson wrote:
 I want to fix (as much as is possible) a misalignment issue with an
 X-25E that I am using for both OS and as an slog device.
 
 This is on x86 hardware running Solaris 10U8.
 
 Partition table looks as follows:
 
 Part  TagFlag CylindersSizeBlocks
 0   rootwm   1 - 1306   10.00GB(1306/0/0) 20980890
 1 unassignedwu   0   0 (0/0/0)   0
 2 backupwm   0 - 3886   29.78GB(3887/0/0) 62444655
 3 unassignedwu1307 - 3886   19.76GB(2580/0/0) 41447700
 4 unassignedwu   0   0 (0/0/0)   0
 5 unassignedwu   0   0 (0/0/0)   0
 6 unassignedwu   0   0 (0/0/0)   0
 7 unassignedwu   0   0 (0/0/0)   0
 8   bootwu   0 -07.84MB(1/0/0)   16065
 9 unassignedwu   0   0 (0/0/0)   0
 
 And here is fdisk:
 
Total disk size is 3890 cylinders
Cylinder size is 16065 (512 byte) blocks
 
  Cylinders
 Partition   StatusType  Start   End   Length%
 =   ==  =   ===   ==   ===
 1   ActiveSolaris   1  38893889100
 
 Slice 0 is where the OS lives and slice 3 is our slog.  As you can see
 from the fdisk partition table (and from the slice view), the OS
 partition starts on cylinder 1 -- which is not 4k aligned.

To get to a fine alignment, you need an EFI label. However, Solaris does
not (yet) support booting from EFI labeled disks.  The older SMI labels 
are all cylinder aligned which gives you a 1/4 chance of alignment.

 
 I don't think there is much I can do to fix this without reinstalling.
 
 However, I'm most concerned about the slog slice and would like to
 recreate its partition such that it begins on cylinder 1312.
 
 So a few questions:
 
   - Would making s3 be 4k block aligned help even though s0 is not?
   - Do I need to worry about 4k block aligning the *end* of the
 slice?  eg instead of ending s3 on cylinder 3886, end it on 3880
 instead?
 
 Thanks,
 Ray
 
 Do you specifically have benchmark data indicating unaligned or
 aligned+offset access on the X25-E is significantly worse than aligned
 access?
 
 I'd thought the tier1 SSDs didn't have problems with these workloads.
 
 I've been experiencing heavy Device Not Ready errors with this
 configuration, and thought perhaps it could be exacerbated by the block
 alignment issue.
 
 See this thread[1].
 
 So this would be a troubleshooting step to attempt to further isolate
 the problem -- by eliminating the 4k alignment issue as a factor.

In my experience, port expanders with SATA drives do not handle
the high I/O rate that can be generated by a modest server. We are
still trying to get to the bottom of these issues, but they do not appear
to be related to the OS, mpt driver, ZIL use, or alignment. 
 -- richard

 
 Just want to make sure I set up the alignment as optimally as possible.
 
 Ray
 
 [1] http://markmail.org/message/5rmfzvqwlmosh2oh
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 
OpenStorage Summit, October 25-27, Palo Alto, CA
http://nexenta-summit2010.eventbrite.com
ZFS and performance consulting
http://www.RichardElling.com












___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 4k block alignment question (X-25E)

2010-08-30 Thread Ray Van Dolson
On Mon, Aug 30, 2010 at 03:56:42PM -0700, Richard Elling wrote:
 comment below...
 
 On Aug 30, 2010, at 3:42 PM, Ray Van Dolson wrote:
 
  On Mon, Aug 30, 2010 at 03:37:52PM -0700, Eric D. Mudama wrote:
  On Mon, Aug 30 at 15:05, Ray Van Dolson wrote:
  I want to fix (as much as is possible) a misalignment issue with an
  X-25E that I am using for both OS and as an slog device.
  
  This is on x86 hardware running Solaris 10U8.
  
  Partition table looks as follows:
  
  Part  TagFlag CylindersSizeBlocks
  0   rootwm   1 - 1306   10.00GB(1306/0/0) 20980890
  1 unassignedwu   0   0 (0/0/0)   0
  2 backupwm   0 - 3886   29.78GB(3887/0/0) 62444655
  3 unassignedwu1307 - 3886   19.76GB(2580/0/0) 41447700
  4 unassignedwu   0   0 (0/0/0)   0
  5 unassignedwu   0   0 (0/0/0)   0
  6 unassignedwu   0   0 (0/0/0)   0
  7 unassignedwu   0   0 (0/0/0)   0
  8   bootwu   0 -07.84MB(1/0/0)   16065
  9 unassignedwu   0   0 (0/0/0)   0
  
  And here is fdisk:
  
 Total disk size is 3890 cylinders
 Cylinder size is 16065 (512 byte) blocks
  
   Cylinders
  Partition   StatusType  Start   End   Length%
  =   ==  =   ===   ==   ===
  1   ActiveSolaris   1  38893889100
  
  Slice 0 is where the OS lives and slice 3 is our slog.  As you can see
  from the fdisk partition table (and from the slice view), the OS
  partition starts on cylinder 1 -- which is not 4k aligned.
 
 To get to a fine alignment, you need an EFI label. However, Solaris does
 not (yet) support booting from EFI labeled disks.  The older SMI labels 
 are all cylinder aligned which gives you a 1/4 chance of alignment.

Yep... our other boxes similar to this one are using whole disks as
ZIL, so we're able to use EFI.

The Device Not Ready errors happen there too (SSD's are on an expander)
but only from between 5-15 errors per day (vs the 500 per hour on the
split OS/slog setup).

 
  
  I don't think there is much I can do to fix this without reinstalling.
  
  However, I'm most concerned about the slog slice and would like to
  recreate its partition such that it begins on cylinder 1312.
  
  So a few questions:
  
- Would making s3 be 4k block aligned help even though s0 is not?
- Do I need to worry about 4k block aligning the *end* of the
  slice?  eg instead of ending s3 on cylinder 3886, end it on 3880
  instead?
  
  Thanks,
  Ray
  
  Do you specifically have benchmark data indicating unaligned or
  aligned+offset access on the X25-E is significantly worse than aligned
  access?
  
  I'd thought the tier1 SSDs didn't have problems with these workloads.
  
  I've been experiencing heavy Device Not Ready errors with this
  configuration, and thought perhaps it could be exacerbated by the block
  alignment issue.
  
  See this thread[1].
  
  So this would be a troubleshooting step to attempt to further isolate
  the problem -- by eliminating the 4k alignment issue as a factor.
 
 In my experience, port expanders with SATA drives do not handle
 the high I/O rate that can be generated by a modest server. We are
 still trying to get to the bottom of these issues, but they do not appear
 to be related to the OS, mpt driver, ZIL use, or alignment. 
  -- richard

Very interesting.  We've been looking at Nexenta as we haven't been
able to reproduce our issues on OpenSolaris -- I was hoping this meant
NexentaStor wouldn't have the issue.

In any case -- any thoughts on whether or not I'll be helping anything
if I change my slog slice starting cylinder to be 4k aligned even
though slice 0 isn't?

 
  
  Just want to make sure I set up the alignment as optimally as possible.
  
  Ray
  
  [1] http://markmail.org/message/5rmfzvqwlmosh2oh

Thanks,
Ray
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 4k block alignment question (X-25E)

2010-08-30 Thread Edho P Arief
On Tue, Aug 31, 2010 at 6:03 AM, Ray Van Dolson rvandol...@esri.com wrote:
 In any case -- any thoughts on whether or not I'll be helping anything
 if I change my slog slice starting cylinder to be 4k aligned even
 though slice 0 isn't?


some people claims that due to how zfs works, there will be
performance hit as long the reported sector size is different with the
physical size.

This thread[1] has the discussion on what happened and how to handle
such drives on freebsd.

[1] http://marc.info/?l=freebsd-fsm=126976001214266w=2

-- 
O ascii ribbon campaign - stop html mail - www.asciiribbon.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 4k block alignment question (X-25E)

2010-08-30 Thread Eric D. Mudama

On Tue, Aug 31 at  6:12, Edho P Arief wrote:

On Tue, Aug 31, 2010 at 6:03 AM, Ray Van Dolson rvandol...@esri.com wrote:

In any case -- any thoughts on whether or not I'll be helping anything
if I change my slog slice starting cylinder to be 4k aligned even
though slice 0 isn't?



some people claims that due to how zfs works, there will be
performance hit as long the reported sector size is different with the
physical size.

This thread[1] has the discussion on what happened and how to handle
such drives on freebsd.

[1] http://marc.info/?l=freebsd-fsm=126976001214266w=2


Yes, but that's for a 4k rotating drive, which has a much different
latency profile than an SSD.  I was wondering if anyone had a
benchmarking showing this alignment mattered on the latest SSDs.  My
guess is no, but I have no data.



--
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 4k block alignment question (X-25E)

2010-08-30 Thread Ray Van Dolson
On Mon, Aug 30, 2010 at 04:12:48PM -0700, Edho P Arief wrote:
 On Tue, Aug 31, 2010 at 6:03 AM, Ray Van Dolson rvandol...@esri.com wrote:
  In any case -- any thoughts on whether or not I'll be helping anything
  if I change my slog slice starting cylinder to be 4k aligned even
  though slice 0 isn't?
 
 
 some people claims that due to how zfs works, there will be
 performance hit as long the reported sector size is different with the
 physical size.
 
 This thread[1] has the discussion on what happened and how to handle
 such drives on freebsd.
 
 [1] http://marc.info/?l=freebsd-fsm=126976001214266w=2

Thanks for the pointer -- these posts seem to reference data disks
within the pool rather than disks being used for slog.

Perhaps some of the same issues could arise, but I'm not sure that
variable stripe sizing in a RAIDZ pool would change how the ZIL / slog
devices are addressed.  I'm sure someone will correct me if I'm wrong
on that...

Ray
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 4k block alignment question (X-25E)

2010-08-30 Thread Christopher George
 I was wondering if anyone had a benchmarking showing this alignment 
 mattered on the latest SSDs. My guess is no, but I have no data.

I don't believe there can be any doubt whether a Flash based SSD (tier1 
or not)  is negatively affected by partition misalignment.  It is intrinsic to 
the required asymmetric erase/program dual operation and the resultant 
RMW penalty to perform a write if unaligned.  This is detailed in the 
following vendor benchmarking guidelines (SF-1500 controller):

http://www.smartm.com/files/salesLiterature/storage/AN001_Benchmark_XceedIOPSSATA_Apr2010_.pdf

Highlight from link - Proper partition alignment is one of the most critical 
attributes that can greatly boost the I/O performance of an SSD due to 
reduced read modify‐write operations.

It should be noted, the above highlight only applies to Flash based SSD 
as an NVRAM based SSD does *not* suffer the same fate, as its 
performance is not bound by or vary with partition (mis)alignment.

Best regards,

Christopher George
Founder/CTO
www.ddrdrive.com
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss