[zfs-discuss] zpool status output confusion

2010-05-27 Thread Per Jorgensen
I get the following output when i run a zpool status , but i am a little 
confused of why c9t8d0 is more left align then the rest of the disks in the 
pool , what does it mean ?

$ zpool status blmpool
  pool: blmpool
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
blmpool ONLINE   0 0 0
  raidz2ONLINE   0 0 0
c9t0d0  ONLINE   0 0 0
c9t1d0  ONLINE   0 0 0
c9t3d0  ONLINE   0 0 0
c9t4d0  ONLINE   0 0 0
c9t5d0  ONLINE   0 0 0
c9t6d0  ONLINE   0 0 0
c9t7d0  ONLINE   0 0 0
  c9t8d0ONLINE   0 0 0
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool status output confusion

2010-05-27 Thread Tomas Ögren
On 27 May, 2010 - Per Jorgensen sent me these 1,0K bytes:

 I get the following output when i run a zpool status , but i am a
 little confused of why c9t8d0 is more left align then the rest of
 the disks in the pool , what does it mean ?

Because someone forced it in without redundancy (or created it as such).
Your pool is bad, as c9t8d0 is without redundancy. If it fails, your
pool is toast.

zpool history   should be able to tell when it happened at least.

 $ zpool status blmpool
   pool: blmpool
  state: ONLINE
  scrub: none requested
 config:
 
 NAMESTATE READ WRITE CKSUM
 blmpool ONLINE   0 0 0
   raidz2ONLINE   0 0 0
 c9t0d0  ONLINE   0 0 0
 c9t1d0  ONLINE   0 0 0
 c9t3d0  ONLINE   0 0 0
 c9t4d0  ONLINE   0 0 0
 c9t5d0  ONLINE   0 0 0
 c9t6d0  ONLINE   0 0 0
 c9t7d0  ONLINE   0 0 0
   c9t8d0ONLINE   0 0 0
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool status output confusion

2010-05-27 Thread Victor Latushkin

On May 27, 2010, at 12:37 PM, Per Jorgensen wrote:

 I get the following output when i run a zpool status , but i am a little 
 confused of why c9t8d0 is more left align then the rest of the disks in the 
 pool , what does it mean ?

It means that is is another top-level vdev in your pool.
Basically you have two top-level vdevs: one is your raidz2 vdev containing 7 
disks, and another one single disk top-level vdev c9t8d0.

I guess it was added like this zpool add -f blmpool c9t8d0. Without -f it 
would complain about mismatching replication levels. You can check pool history 
to see when exactly it was done.


 
 $ zpool status blmpool
  pool: blmpool
 state: ONLINE
 scrub: none requested
 config:
 
NAMESTATE READ WRITE CKSUM
blmpool ONLINE   0 0 0
  raidz2ONLINE   0 0 0
c9t0d0  ONLINE   0 0 0
c9t1d0  ONLINE   0 0 0
c9t3d0  ONLINE   0 0 0
c9t4d0  ONLINE   0 0 0
c9t5d0  ONLINE   0 0 0
c9t6d0  ONLINE   0 0 0
c9t7d0  ONLINE   0 0 0
  c9t8d0ONLINE   0 0 0
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] creating a fast ZIL device for $200

2010-05-27 Thread sensille
Neil Perrin wrote:
 Yes, I agree this seems very appealing. I have investigated and
 observed similar results. Just allocating larger intent log blocks but
 only writing to say the first half of them has seen the same effect.
 Despite the impressive results, we have not pursued this further mainly
 because of it's maintainability. There is quite a variance between
 drives so, as mentioned, feedback profiling of the device is needed
 in the working system. The layering of the Solaris IO subsystem doesn't
 provide the feedback necessary and the ZIL code is layered on the SPA/DMU.
 Still it should be possible. Good luck!
 

Thanks :) Though I hoped to get a different answer. An integration into
ZFS code would be much more elegant, but of course in a few years the
necessity for this optimization will be gone, when SSDs are cheap, fast
and reliable.


There seems to be some interest in this idea here. Would it make sense
to start a project for it? Currently I'm implementing a driver as a
proof of concept, but I'm in need of a lot of discussions about algo-
rithms and concepts, and maybe some code reviews.

Can I count on some support from here?

--Arne



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool status output confusion

2010-05-27 Thread Per Jorgensen
thanks for the quick responses and yes the history show just what you said :(

is there a way i can get c9t8d0 out of the pool , or how do i get the pool back 
to optimal redundancy ?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Ideal SATA/SAS Controllers for ZFS

2010-05-27 Thread Giovanni Tirloni
On Thu, May 27, 2010 at 2:39 AM, Marc Bevand m.bev...@gmail.com wrote:
 Hi,

 Brandon High bhigh at freaks.com writes:

 I only looked at the Megaraid  that he mentioned, which has a PCIe
 1.0 4x interface, or 1000MB/s.

 You mean x8 interface (theoretically plugged into that x4 slot below...)

 The board also has a PCIe 1.0 4x electrical slot, which is 8x
 physical. If the card was in the PCIe slot furthest from the CPUs,
 then it was only running 4x.

The tests were done connecting both cards to the PCIe 2.0 x8 slot#6
that connects directly to the Intel 5520 chipset.

I totally ignored the differences between PCIe 1.0 and 2.0. My fault.



 If Giovanni had put the Megaraid  in this slot, he would have seen
 an even lower throughput, around 600MB/s:

 This slot is provided by the ICH10R which as you can see on:
 http://www.supermicro.com/manuals/motherboard/5500/MNL-1062.pdf
 is connected to the northbridge through a DMI link, an Intel-
 proprietary PCIe 1.0 x4 link. The ICH10R supports a Max_Payload_Size
 of only 128 bytes on the DMI link:
 http://www.intel.com/Assets/PDF/datasheet/320838.pdf
 And as per my experience:
 http://opensolaris.org/jive/thread.jspa?threadID=54481tstart=45
 a 128-byte MPS allows using just about 60% of the theoretical PCIe
 throughput, that is, for the DMI link: 250MB/s * 4 links * 60% = 600MB/s.
 Note that the PCIe x4 slot supports a larger, 256-byte MPS but this is
 irrevelant as the DMI link will be the bottleneck anyway due to the
 smaller MPS.

  A single 3Gbps link provides in theory 300MB/s usable after 8b-10b
 encoding,
  but practical throughput numbers are closer to 90% of this figure, or
 270MB/s.
  6 disks per link means that each disk gets allocated 270/6 = 45MB/s.

 ... except that a SFF-8087 connector contains four 3Gbps connections.

 Yes, four 3Gbps links, but 24 disks per SFF-8087 connector. That's
 still 6 disks per 3Gbps (according to Giovanni, his LSI HBA was
 connected to the backplane with a single SFF-8087 cable).


Correct. The backplane on the SC646E1 only has one SFF-8087 cable to the HBA.


 It may depend on how the drives were connected to the expander. You're
 assuming that all 18 are on 3 channels, in which case moving drives
 around could help performance a bit.

 True, I assumed this and, frankly, this is probably what he did by
 using adjacent drive bays... A more optimal solution would be spread
 the 18 drives in a 5+5+4+4 config so that the 2 most congested 3Gbps
 links are shared by only 5 drives, instead of 6, which would boost the
 througput by 6/5 = 1.2x. Which would change my first overall 810MB/s
 estimate to 810*1.2 = 972MB/s.

The chassis has 4 columns of 6 disks. The 18 disks I was testing were
all on columns #1 #2 #3.

Column #0 still has a pair of SSDs and more disks which I havent' used
in this test. I'll try to move things around to make use of the 4 port
multipliers and test again.

SuperMicro is going to release 6Gb/s backplane that uses the LSI
SAS2X36 chipset in the near future, I've been told.

Good thing this is still a lab experience. Thanks very much for the
invaluable help!

-- 
Giovanni
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool status output confusion

2010-05-27 Thread Ian Collins

On 05/27/10 09:16 PM, Per Jorgensen wrote:

thanks for the quick responses and yes the history show just what you said :(

is there a way i can get c9t8d0 out of the pool , or how do i get the pool back 
to optimal redundancy ?
   


No, you will have to destroy the pool and start over.  Or if that isn't 
an option, attach a mirror deive to c9t8d0.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] creating a fast ZIL device for $200

2010-05-27 Thread sensille

(resent because of mail problems)
Edward Ned Harvey wrote:

From: sensille [mailto:sensi...@gmx.net]

The only thing I'd like to point out
is that
ZFS doesn't do random writes on a slog, but nearly linear writes. This
might
even be hurting performance more than random writes, because you always
hit
the worst case of one full rotation.


Um ... I certainly have a doubt about this.  My understanding is that hard
disks are already optimized for sustained sequential throughput.  I have a
really hard time believing Seagate, WD, etc, designed their drives such that
you read/write one track, then pause and wait for a full rotation, then
read/write one track, and wait again, and so forth.  This would limit the
drive to approx 50% duty cycle, and the market is very competitive.

Yes, I am really quite sure, without any knowledge at all, that the drive
mfgrs are intelligent enough to map the logical blocks in such a way that
sequential reads/writes which are larger than a single track will not suffer
such a huge penalty.  Just a small penalty to jump up one track, and wait
for a few degrees of rotation, not 360 degrees.


I'm afraid you got me wrong here. Of course the drives are optimized for
sequential reads/writes. If you give the drive a single read or write that
is larger than one track the drive acts exactly as you described. The same
holds if you give the drive multiple smaller consecutive reads/writes in
advance (NCQ/TCQ) so that the drive can coagulate them to one big op.

But this is not what happens in case of ZFS/ZIL with a single application.
The application requests a synchronous op. This request goes down into
ZFS, which in turn allocates a ZIL block, writes it to the disk and issues a
cache flush. Only after the cache flush completes, ZFS can acknowledge the
op to the application. Now the application can issue the next op, for which
ZFS will again allocate ZIL block, probably immediately after the previous
one. It writes the block and issues a flush. But in the meantime the head
has traveled some sectors down the track. To physically write the block the
drive has of course to wait until the sector is under the head again, which
means waiting nearly one full rotation. If ZFS would have chosen a block
appropriately further down the track the possibility would have been high
that the head had not passed it and could write without a big rotational
delay.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] creating a fast ZIL device for $200

2010-05-27 Thread sensille

(resent because of received bounce)
Edward Ned Harvey wrote:

From: sensille [mailto:sensi...@gmx.net]



So this brings me back to the question I indirectly asked in the middle of a
much longer previous email - 


Is there some way, in software, to detect the current position of the head?
If not, then I only see two possibilities:

Either you have some previous knowledge (or assumptions) about the drive
geometry, rotation speed, and wall clock time passed since the last write
completed, and use this (possibly vague or inaccurate) info to make your
best guess what available blocks are accessible with minimum latency next
...



That is my approach currently, and it works quite well. I obtain the prior
knowledge through a special measuring process run before first using the
disk. To keep the driver in sync with the disk during idle times it issues
dummy ops in regular intervals, say 20 per second.


or else some sort of new hardware behavior would be necessary.  Possibly a
special type of drive, which always assumes a command to write to a
magical block number actually means write to the next available block or
something like that ... or reading from a magical block actually tells you
the position of the head or something like that...


That would be nice. But what would be much nicer is a drive with an extremely
small setup time. Current drives need the command 0.4-0.7ms in advance,
depending on manufacturer and drive type.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] creating a fast ZIL device for $200

2010-05-27 Thread Garrett D'Amore

On 5/27/2010 10:33 AM, sensille wrote:

(resent because of received bounce)
Edward Ned Harvey wrote:

From: sensille [mailto:sensi...@gmx.net]



So this brings me back to the question I indirectly asked in the 
middle of a

much longer previous email -
Is there some way, in software, to detect the current position of the 
head?

If not, then I only see two possibilities:

Either you have some previous knowledge (or assumptions) about the drive
geometry, rotation speed, and wall clock time passed since the last 
write

completed, and use this (possibly vague or inaccurate) info to make your
best guess what available blocks are accessible with minimum latency 
next

...



That is my approach currently, and it works quite well. I obtain the 
prior

knowledge through a special measuring process run before first using the
disk. To keep the driver in sync with the disk during idle times it 
issues

dummy ops in regular intervals, say 20 per second.

or else some sort of new hardware behavior would be necessary.  
Possibly a

special type of drive, which always assumes a command to write to a
magical block number actually means write to the next available 
block or
something like that ... or reading from a magical block actually 
tells you

the position of the head or something like that...


That would be nice. But what would be much nicer is a drive with an 
extremely

small setup time. Current drives need the command 0.4-0.7ms in advance,
depending on manufacturer and drive type.


Technology like DDRdrive X1 (which is well beyond $200) doesn't have 
this problem.  The setup times for that kind of hardware are measured in 
usec.  (I.e. measured in PCI cycles.)


- Garrett



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] creating a fast ZIL device for $200

2010-05-27 Thread sensille
Edward Ned Harvey wrote:
 From: sensille [mailto:sensi...@gmx.net]

 The only thing I'd like to point out
 is that
 ZFS doesn't do random writes on a slog, but nearly linear writes. This
 might
 even be hurting performance more than random writes, because you always
 hit
 the worst case of one full rotation.
 
 Um ... I certainly have a doubt about this.  My understanding is that hard
 disks are already optimized for sustained sequential throughput.  I have a
 really hard time believing Seagate, WD, etc, designed their drives such that
 you read/write one track, then pause and wait for a full rotation, then
 read/write one track, and wait again, and so forth.  This would limit the
 drive to approx 50% duty cycle, and the market is very competitive.
 
 Yes, I am really quite sure, without any knowledge at all, that the drive
 mfgrs are intelligent enough to map the logical blocks in such a way that
 sequential reads/writes which are larger than a single track will not suffer
 such a huge penalty.  Just a small penalty to jump up one track, and wait
 for a few degrees of rotation, not 360 degrees.

I'm afraid you got me wrong here. Of course the drives are optimized for
sequential reads/writes. If you give the drive a single read or write that
is larger than one track the drive acts exactly as you described. The same
holds if you give the drive multiple smaller consecutive reads/writes in
advance (NCQ/TCQ) so that the drive can coagulate them to one big op.

But this is not what happens in case of ZFS/ZIL with a single application.
The application requests a synchronous op. This request goes down into
ZFS, which in turn allocates a ZIL block, writes it to the disk and issues a
cache flush. Only after the cache flush completes, ZFS can acknowledge the
op to the application. Now the application can issue the next op, for which
ZFS will again allocate ZIL block, probably immediately after the previous
one. It writes the block and issues a flush. But in the meantime the head
has traveled some sectors down the track. To physically write the block the
drive has of course to wait until the sector is under the head again, which
means waiting nearly one full rotation. If ZFS would have chosen a block
appropriately further down the track the possibility would have been high
that the head had not passed it and could write without a big rotational
delay.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Windows file versioning integration?

2010-05-27 Thread Roy Sigurd Karlsbakk
Hi all

Since Windows 2003 Server or so, it has had some versioning support usable from 
the client side if checking the properties on a file. Is it somehow possible to 
use this functionality with ZFS snapshots?

-- 
Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Windows file versioning integration?

2010-05-27 Thread Richard Elling
On May 27, 2010, at 6:32 AM, Roy Sigurd Karlsbakk wrote:

 Hi all
 
 Since Windows 2003 Server or so, it has had some versioning support usable 
 from the client side if checking the properties on a file. Is it somehow 
 possible to use this functionality with ZFS snapshots?

Yes, there is some integration with VSS and snapshots.  But a more complete
and full featured solution looks like:
http://www.nexenta.com/corp/applications/delorean
 -- richard

-- 
Richard Elling
rich...@nexenta.com   +1-760-896-4422
ZFS and NexentaStor training, Rotterdam, July 13-15, 2010
http://nexenta-rotterdam.eventbrite.com/




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Windows file versioning integration?

2010-05-27 Thread iMx
  Hi all
 
  Since Windows 2003 Server or so, it has had some versioning support
  usable from the client side if checking the properties on a file. Is
  it somehow possible to use this functionality with ZFS snapshots?

http://blogs.sun.com/amw/entry/using_the_previous_versions_tab ;)

--
iMx
i...@streamvia.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs/lofi/share panic

2010-05-27 Thread Jan Kryl
Hi Frank,

On 24/05/10 16:52 -0400, Frank Middleton wrote:
  Many many moons ago, I submitted a CR into bugs about a
  highly reproducible panic that occurs if you try to re-share
  a  lofi mounted image. That CR has AFAIK long since
  disappeared - I even forget what it was called.
 
  This server is used for doing network installs. Let's say
  you have a 64 bit iso lofi-mounted and shared. You do the
  install, and then wish to switch to a 32 bit iso. You unshare,
  umount, delete the loopback, and then lofiadm the new iso,
  mount it and then share it. Panic, every time.
 
  Is this such a rare use-case that no one is interested? I have
  the backtrace and cores if anyone wants them, although
  such were submitted with the original CR. This is pretty
  frustrating since you start to run out of ideas for mountpoint
  names after a while unless you forget and get the panic.
 
  FWIW (even on a freshly booted system after a panic)
  # lofiadm zyzzy.iso /dev/lofi/1
  # mount -F hsfs /dev/lofi/1 /mnt
  mount: /dev/lofi/1 is already mounted or /mnt is busy
  # mount -O -F hsfs /dev/lofi/1 /mnt
  # share /mnt
  #
 
  If you unshare /mnt and then do this again, it will panic.
  This has been a bug since before Open Solaris came out.
 
  It doesn't happen if the iso is originally on UFS, but
  UFS really isn't an option any more.  FWIW the dataset
  containing the isos has the sharenfs attribute set,
  although it doesn;t have to be actually mounted by
  any remote NFS for this panic to occur.
 
  Suggestions for a workaround most welcome!
 
the bug (6798273) has been closed as incomplete with following
note:

I cannot reproduce any issue with the given testcase on b137.

So you should test this with b137 or newer build. There have
been some extensive changes going to treeclimb_* functions,
so the bug is probably fixed or will be in near future.

Let us know if you can still reproduce the panic on
recent build.

thanks
-jan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs/lofi/share panic

2010-05-27 Thread Kyle McDonald
On 5/27/2010 2:45 PM, Jan Kryl wrote:
 Hi Frank,

 On 24/05/10 16:52 -0400, Frank Middleton wrote:
   
  Many many moons ago, I submitted a CR into bugs about a
  highly reproducible panic that occurs if you try to re-share
  a  lofi mounted image. That CR has AFAIK long since
  disappeared - I even forget what it was called.

  This server is used for doing network installs. Let's say
  you have a 64 bit iso lofi-mounted and shared. You do the
  install, and then wish to switch to a 32 bit iso. You unshare,
  umount, delete the loopback, and then lofiadm the new iso,
  mount it and then share it. Panic, every time.

  Is this such a rare use-case that no one is interested? I have
  the backtrace and cores if anyone wants them, although
  such were submitted with the original CR. This is pretty
  frustrating since you start to run out of ideas for mountpoint
  names after a while unless you forget and get the panic.

  FWIW (even on a freshly booted system after a panic)
  # lofiadm zyzzy.iso /dev/lofi/1
  # mount -F hsfs /dev/lofi/1 /mnt
  mount: /dev/lofi/1 is already mounted or /mnt is busy
  # mount -O -F hsfs /dev/lofi/1 /mnt
  # share /mnt
  #

  If you unshare /mnt and then do this again, it will panic.
  This has been a bug since before Open Solaris came out.

  It doesn't happen if the iso is originally on UFS, but
  UFS really isn't an option any more.  FWIW the dataset
  containing the isos has the sharenfs attribute set,
  although it doesn;t have to be actually mounted by
  any remote NFS for this panic to occur.

  Suggestions for a workaround most welcome!

 
 the bug (6798273) has been closed as incomplete with following
 note:

 I cannot reproduce any issue with the given testcase on b137.

 So you should test this with b137 or newer build. There have
 been some extensive changes going to treeclimb_* functions,
 so the bug is probably fixed or will be in near future.

 Let us know if you can still reproduce the panic on
 recent build.

   
I don't know if the code path is the same enough, bu you should also try
it like this:

# mount -F hsfs zyzzy.iso /mnt

For many builds now, (Open)Solaris hasn't needed the 'lofiadm' step for
ISO's (and possibly other FS's that can be guessed)

I now put ISO's (for installs just like you) directly in my /etc/vfstab.

  -Kyle

 thanks
 -jan
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs/lofi/share panic

2010-05-27 Thread Carson Gaspar

Jan Kryl wrote:

the bug (6798273) has been closed as incomplete with following
note:

I cannot reproduce any issue with the given testcase on b137.

So you should test this with b137 or newer build. There have
been some extensive changes going to treeclimb_* functions,
so the bug is probably fixed or will be in near future.

Let us know if you can still reproduce the panic on
recent build.


The most recent build available outside of Oracle is still 134, or am I 
missing something?


--
Carson

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs/lofi/share panic

2010-05-27 Thread Garrett D'Amore

On 5/27/2010 12:21 PM, Carson Gaspar wrote:

Jan Kryl wrote:

the bug (6798273) has been closed as incomplete with following
note:

I cannot reproduce any issue with the given testcase on b137.

So you should test this with b137 or newer build. There have
been some extensive changes going to treeclimb_* functions,
so the bug is probably fixed or will be in near future.

Let us know if you can still reproduce the panic on
recent build.


The most recent build available outside of Oracle is still 134, or am 
I missing something?


That's the latest binary build.  It is possible to build something newer 
yourself, but doing so will take some unusual effort.


- Garrett

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] nfs share of nested zfs directories?

2010-05-27 Thread Cassandra Pugh
I was wondering if there is a special option to share out a set of
nested
   directories?  Currently if I share out a directory with
/pool/mydir1/mydir2
   on a system, mydir1 shows up, and I can see mydir2, but nothing in
mydir2.
   mydir1 and mydir2 are each a zfs filesystem, each shared with the proper
   sharenfs permissions.
   Did I miss a browse or traverse option somewhere?
   -
   Cassandra
 Unix Administrator
   From a little spark may burst a mighty flame.
   -Dante Alighieri
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] nfs share of nested zfs directories?

2010-05-27 Thread Garrett D'Amore
I share filesystems all the time this way, and have never had this 
problem.  My first guess would be a problem with NFS or directory 
permissions.  You are using NFS, right?


- Garrett

On 5/27/2010 1:02 PM, Cassandra Pugh wrote:
   I was wondering if there is a special option to share out a set of 
nested
   directories?  Currently if I share out a directory with 
/pool/mydir1/mydir2
   on a system, mydir1 shows up, and I can see mydir2, but nothing in 
mydir2.
   mydir1 and mydir2 are each a zfs filesystem, each shared with the 
proper

   sharenfs permissions.
   Did I miss a browse or traverse option somewhere?
   -
   Cassandra
 Unix Administrator
   From a little spark may burst a mighty flame.
   -Dante Alighieri



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] nfs share of nested zfs directories?

2010-05-27 Thread Roy Sigurd Karlsbakk
- Cassandra Pugh cp...@pppl.gov skrev: 




I was wondering if there is a special option to share out a set of nested 
directories? Currently if I share out a directory with /pool/mydir1/mydir2 
on a system, mydir1 shows up, and I can see mydir2, but nothing in mydir2. 
mydir1 and mydir2 are each a zfs filesystem, each shared with the proper 
sharenfs permissions. 
Did I miss a browse or traverse option somewhere? 
is mydir2 on a separate filesystem/dataset? 
-- 
Vennlige hilsener / Best regards 

roy 
-- 
Roy Sigurd Karlsbakk 
(+47) 97542685 
r...@karlsbakk.net 
http://blogg.karlsbakk.net/ 
-- 
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk. 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs/lofi/share panic

2010-05-27 Thread Dennis Clarke

  FWIW (even on a freshly booted system after a panic)
  # lofiadm zyzzy.iso /dev/lofi/1
  # mount -F hsfs /dev/lofi/1 /mnt
  mount: /dev/lofi/1 is already mounted or /mnt is busy
  # mount -O -F hsfs /dev/lofi/1 /mnt
  # share /mnt
  #

  If you unshare /mnt and then do this again, it will panic.
  This has been a bug since before Open Solaris came out.


I just tried this with a UFS based filesystem just for a lark.

r...@aequitas:/# mkdir /testfs
r...@aequitas:/# mount -F ufs -o noatime,nologging /dev/dsk/c0d1s0 /testfs
r...@aequitas:/# ls -l /testfs/sol\-nv\-b130\-x86\-dvd.iso
-rw-r--r-- 1 root root 3818782720 Feb  5 16:02
/testfs/sol-nv-b130-x86-dvd.iso

r...@aequitas:/# lofiadm -a /testfs/sol-nv-b130-x86-dvd.iso
May 27 21:08:58 aequitas pseudo: pseudo-device: lofi0
May 27 21:08:58 aequitas genunix: lofi0 is /pseudo/l...@0
May 27 21:08:58 aequitas rootnex: xsvc0 at root: space 0 offset 0
May 27 21:08:58 aequitas genunix: xsvc0 is /x...@0,0
May 27 21:08:58 aequitas pseudo: pseudo-device: devinfo0
May 27 21:08:58 aequitas genunix: devinfo0 is /pseudo/devi...@0
/dev/lofi/1
r...@aequitas:/# mount -F hsfs -o ro /dev/lofi/1 /mnt
r...@aequitas:/# share -F nfs -o nosub,nosuid,sec=sys,ro,anon=0 /mnt

Then at a Sol 10 server :

# uname -a
SunOS jupiter 5.10 Generic_142900-11 sun4u sparc SUNW,Sun-Fire-480R

# dfshares aequitas
RESOURCE  SERVER ACCESSTRANSPORT
  aequitas:/mnt aequitas  - -
#
# mount -F nfs -o bg,intr,nosuid,ro,vers=4 aequitas:/mnt /mnt

# ls /mnt
Copyrightautorun.inf
JDS-THIRDPARTYLICENSEREADME  autorun.sh
License  boot
README.txt   installer
Solaris_11   sddtool
Sun_HPC_ClusterTools
# umount aequitas:/mnt
# dfshares aequitas
RESOURCE  SERVER ACCESSTRANSPORT
  aequitas:/mnt aequitas  - -

Then back at the snv_138 box I unshare and re-share and ... nothing bad
happens.

r...@aequitas:/# unshare /mnt
r...@aequitas:/# share -F nfs -o nosub,nosuid,sec=sys,ro,anon=0 /mnt
r...@aequitas:/# unshare /mnt
r...@aequitas:/#

Guess I must now try this with a ZFS fs under that iso file.


-- 
Dennis Clarke
dcla...@opensolaris.ca  - Email related to the open source Solaris
dcla...@blastwave.org   - Email related to open source for Solaris


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] nfs share of nested zfs directories?

2010-05-27 Thread Cindy Swearingen

Cassandra,

Which Solaris release is this?

This is working for me between an Solaris 10 server and a OpenSolaris 
client.


Nested mount points can be tricky and I'm not sure if you are looking
for the mirror mount feature that is not available in the Solaris 10
release, where new directory contents are accessible on the client.

See the examples below.


Thanks,

Cindy

On the server:

# zpool create pool c1t3d0
# zfs create pool/myfs1
# cp /usr/dict/words /pool/myfs1/file.1
# zfs create -o mountpoint=/pool/myfs1/myfs2 pool/myfs2
# ls /pool/myfs1
file.1  myfs2
# cp /usr/dict/words /pool/myfs1/myfs2/file.2
# ls /pool/myfs1/myfs2/
file.2
# zfs set sharenfs=on pool/myfs1
# zfs set sharenfs=on pool/myfs2
# share
-   /pool/myfs1   rw   
-   /pool/myfs1/myfs2   rw   

On the client:

# ls /net/t2k-brm-03/pool/myfs1
file.1  myfs2
# ls /net/t2k-brm-03/pool/myfs1/myfs2
file.2
# mount -F nfs t2k-brm-03:/pool/myfs1 /mnt
# ls /mnt
file.1  myfs2
# ls /mnt/myfs2
file.2

On the server:

# touch /pool/myfs1/myfs2/file.3

On the client:

# ls /mnt/myfs2
file.2  file.3

On 05/27/10 14:02, Cassandra Pugh wrote:
 I was wondering if there is a special option to share out a set of 
nested
   directories?  Currently if I share out a directory with 
/pool/mydir1/mydir2
   on a system, mydir1 shows up, and I can see mydir2, but nothing in 
mydir2.

   mydir1 and mydir2 are each a zfs filesystem, each shared with the proper
   sharenfs permissions.
   Did I miss a browse or traverse option somewhere?
   -
   Cassandra
 Unix Administrator
   From a little spark may burst a mighty flame.
   -Dante Alighieri





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Ideal SATA/SAS Controllers for ZFS

2010-05-27 Thread Marc Bevand
On Wed, May 26, 2010 at 6:09 PM, Giovanni Tirloni gtirl...@sysdroid.com wrote:
 On Wed, May 26, 2010 at 9:22 PM, Brandon High bh...@freaks.com wrote:

 I'd wager it's the PCIe x4. That's about 1000MB/s raw bandwidth, about
 800MB/s after overhead.

 Makes perfect sense. I was calculating the bottlenecks using the
 full-duplex bandwidth and it wasn't apparent the one-way bottleneck.

Actually both of you guys are wrong :-)

The Supermicro X8DTi mobo and LSISAS9211-4i HBA are both PCIe 2.0 compatible,
so the max theoretical PCIe x4 throughput is 4GB/s aggregate, or 2GB/s in each
direction, well above the 800MB/s bottleneck observed by Giovanni.

This bottleneck is actually caused by the backplane: Supermicro E1 chassis
like Giovanni's (SC846E1) include port multipliers that degrade performance
by putting 6 disks behind a single 3Gbps link.

A single 3Gbps link provides in theory 300MB/s usable after 8b-10b encoding,
but practical throughput numbers are closer to 90% of this figure, or 270MB/s.
6 disks per link means that each disk gets allocated 270/6 = 45MB/s.

So with 18 disks striped, this gives a max usable throughput of 18*45 = 810MB/s,
which matches exactly what Giovanni observed. QED!

-mrb
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] gang blocks at will?

2010-05-27 Thread Andriy Gapon
on 27/05/2010 07:11 Jeff Bonwick said the following:
 You can set metaslab_gang_bang to (say) 8k to force lots of gang block 
 allocations.

Bill, Jeff,

thanks a lot!
This helped to reproduce the issue and find the bug.

Just in case:
http://www.freebsd.org/cgi/query-pr.cgi?pr=bin/144214

 On May 25, 2010, at 11:42 PM, Andriy Gapon wrote:
 
 I am working on improving some ZFS-related bits in FreeBSD boot chain.
 At the moment it seems that the things work mostly fine except for a case 
 where
 the boot code needs to read gang blocks.  We have some reports from users 
 about
 failures, but unfortunately their pools are not available for testing anymore
 and I can not reproduce the issue at will.
 I am sure that (Open)Solaris GRUB version has been properly tested, including
 the above environment.
 Could you please help me with ideas how to create a pool/filesystem/file that
 would have gang-blocks with high probability?
 Perhaps, there are some pre-made test pool images available?
 Or some specialized tool?

 Thanks a lot!
 -- 
 Andriy Gapon
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 


-- 
Andriy Gapon
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Windows file versioning integration?

2010-05-27 Thread Ed Stout

  Hi all
 
  Since Windows 2003 Server or so, it has had some versioning support
  usable from the client side if checking the properties on a file. Is
  it somehow possible to use this functionality with ZFS snapshots?

http://blogs.sun.com/amw/entry/using_the_previous_versions_tab ;)

--
iMx
i...@streamvia.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool status output confusion

2010-05-27 Thread Brandon High
On Thu, May 27, 2010 at 2:16 AM, Per Jorgensen p...@combox.dk wrote:
 is there a way i can get c9t8d0 out of the pool , or how do i get the pool 
 back to optimal redundancy ?

It's not possible to remove vdevs right now. When the mythical
bp_rewrite shows up, then you can.

For now, the only thing you can do to save your pool is attach another
disk (or two) as a mirror.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] nfs share of nested zfs directories?

2010-05-27 Thread Brandon High
On Thu, May 27, 2010 at 1:02 PM, Cassandra Pugh cp...@pppl.gov wrote:
    I was wondering if there is a special option to share out a set of nested
    directories?  Currently if I share out a directory with
 /pool/mydir1/mydir2
    on a system, mydir1 shows up, and I can see mydir2, but nothing in
 mydir2.
    mydir1 and mydir2 are each a zfs filesystem, each shared with the proper
    sharenfs permissions.
    Did I miss a browse or traverse option somewhere?

What kind of client are you mounting on? Linux clients don't properly
follow nested exports.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] nfs share of nested zfs directories?

2010-05-27 Thread Reshekel Shedwitz
Some tips…

(1) Do a zfs mount -a and a zfs share -a. Just in case something didn't get 
shared out correctly (though that's supposed to automatically happen, I think)

(2) The Solaris automounter (i.e. in a NIS environment) does not seem to 
automatically mount descendent filesystems (i.e. if the NIS automounter has a 
map for /public pointing to myserver:/mnt/zfs/public but on myserver, I create 
a descendent filesystem in /mnt/zfs/public/folder1, browsing to /public/folder1 
on another computer will just show an empty directory all the time).

If you're in that sort of environment, you need to add another map on NIS.

(3) Try using /net mounts. If you're not aware of how this works, you can 
browse to /net/computer name to see all the NFS mounts. On Solaris, /net 
*will* automatically mount descendent filesystems (unlike NIS).
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] nfs share of nested zfs directories?

2010-05-27 Thread Kyle McDonald
On 5/27/2010 9:30 PM, Reshekel Shedwitz wrote:
 Some tips…

 (1) Do a zfs mount -a and a zfs share -a. Just in case something didn't get 
 shared out correctly (though that's supposed to automatically happen, I think)

 (2) The Solaris automounter (i.e. in a NIS environment) does not seem to 
 automatically mount descendent filesystems (i.e. if the NIS automounter has a 
 map for /public pointing to myserver:/mnt/zfs/public but on myserver, I 
 create a descendent filesystem in /mnt/zfs/public/folder1, browsing to 
 /public/folder1 on another computer will just show an empty directory all the 
 time).
   
 The automounter behaves the same irregardless of whether NIS is
invovled or not (or LDAP for that matter.) The Automounter can be
configured with files locally, and that won't change it's behavior.

The behavior your describing has been the behavior of all flavors of NFS
since it was born, and also doesn't have anything to do with the
automounter - it was by design. No automounter I'm aware of is capable
of learning on it's own that 'folder1' is a new filesystem (not a new
directory) and mounting it. So this isn't limited to Solaris.

 If you're in that sort of environment, you need to add another map on NIS.
   
Your example doesn't specify if /public is a direct or indirect mount,
being in / kind of implies it's direct, and those mounts can be more
limiting (more so in the past) and most admins avoid using the
auto.direct map for these reasons.

If the example was /import/public with /import being defined by the
auto.import map, then the solution to this problem is not an entirely
new entry in the the map for /import/public/folder1, but to convert the
entry for /import/folder1 to a hierarchical mount entry, specifying
explicitly the folder1 sub mount. A hierarchical mount can even mount
folder1 from a different server than public came from.

In the past (SunOS4 and early Solaris timeframe) heirarchical mounts had
some limitations (mainly issues with unmounting them) that made people
wary of them. Most if not all of those have been eliminated.

In general the Solaris automounter is very reliable and flexible and can
be configured to do almost anything you want. Recent linux automounters
(autofs4??) have come very close to the Solaris ones, however earlier
ones had some missing fieatures, buggy features, and some different
interpretations of the maps.

But the issues described in this thread is not an automounter issue,
it's a design issue of NFS - at least for all versions of NFS before v4.
Version 4 has a feature that others have mentioned called mirror
mounts that tries to pass along the information trequired for the
client to re-create the sub-mount - Even if the original fileserver
mounted the sub-filesystem from another server! It's a cool feature, but
NFS v4 suport in clients isn't complete yet, so specifying the full
hierarchical mount tree in the automount maps is still required.

 (3) Try using /net mounts. If you're not aware of how this works, you can 
 browse to /net/computer name to see all the NFS mounts. On Solaris, /net 
 *will* automatically mount descendent filesystems (unlike NIS).
   
In general /net mounts are a bad idea. While it will basically scan the
output of 'showmount -e' for everything the server exports, and mount it
all, that's not exactly what you always want. It will only pick up
sub-filesystem that are explicitly shared (which NFSv4 might also only
do I'm not sure) and it will miss branches of the tree if they are
mounted from another server.

Also most automounters that I'm aware of will only mount all the
exported filesystems at the time of the access to /net/hostname, and
(unles it's unused long enough to be unmounted) will miss all changes in
what is exported on the server until the mount is triggered again.

On top of that, /net/hostname mounts encourage embedding the hostname of
the server in config files, scripts, and binaries (-R path for shared
libraries) and that's not good since you then can't move a filesystem
from one host to another, since you need to maintain that /net/hostname
path forever - or edit many files and recompile programs. (If I recall
correctly, this was once used as one of the arguments against shared
libraries by some.)

Because of this, by using /net/hostname, you give up one of the biggest
benefits of the automounter - redirection. By making an auto.import map
that has an entry for 'public' you allow yourself to be able to clone
public to a new server, and modify the map to (over time as it is
unmounted and remounted) migrate the clients to the new server.

Lastly using /net also diables the load-sharing and failover abilities
of read-only automounts, since you are by definition limiting yourself
to one hostname.

That was longer than I expected, but hopefully it will help some. :)

 -Kyle

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org

Re: [zfs-discuss] nfs share of nested zfs directories?

2010-05-27 Thread Haudy Kazemi

Brandon High wrote:

On Thu, May 27, 2010 at 1:02 PM, Cassandra Pugh cp...@pppl.gov wrote:
  

   I was wondering if there is a special option to share out a set of nested
   directories?  Currently if I share out a directory with
/pool/mydir1/mydir2
   on a system, mydir1 shows up, and I can see mydir2, but nothing in
mydir2.
   mydir1 and mydir2 are each a zfs filesystem, each shared with the proper
   sharenfs permissions.
   Did I miss a browse or traverse option somewhere?



What kind of client are you mounting on? Linux clients don't properly
follow nested exports.

-B
  


This behavior is not limited to Linux clients nor to nfs shares.  I've 
seen it with Windows (SMB) clients and CIFS shares.  The CIFS version is 
referenced here:


Nested ZFS Filesystems in a CIFS Share
http://mail.opensolaris.org/pipermail/cifs-discuss/2008-June/000358.html
http://bugs.opensolaris.org/view_bug.do?bug_id=6582165

Is there any commonality besides the observed behaviors?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Ideal SATA/SAS Controllers for ZFS

2010-05-27 Thread Will Murnane
On Fri, May 28, 2010 at 00:56, Marc Bevand m.bev...@gmail.com wrote:
 Giovanni Tirloni gtirloni at sysdroid.com writes:

 The chassis has 4 columns of 6 disks. The 18 disks I was testing were
 all on columns #1 #2 #3.

 Good, so this confirms my estimations. I know you said the current
 ~810 MB/s are amply sufficient for your needs. Spreading the 18 drives
 across all 4 port multipliers
The Supermicro SC846E1 cases don't contain multiple (sata) port
multipliers, they contain a single SAS expander, which shares
bandwidth among the controllers and drives: no column- or row-based
limitations should be present.

That backplane has two 8087 ports, IIRC: one labeled for the host, and
one for a downstream chassis.  I don't think there's actually any
physical or logical difference between the upstream and downstream
ports, so you might consider trying connecting two cables (ideally
from two SAS controllers, with multipath) and see if that goes any
faster.

Giovanni: When you say you saturated the system with a RAID-0 device,
what do you mean?  I think the suggested benchmark (read from all the
disks independently, using dd or some other sequential-transfer
mechanism like vdbench) would be more interesting in terms of finding
the limiting bus bandwidth than a ZFS-based or hardware-raid-based
benchmark.  Inter-disk synchronization and checksums and such can put
a damper on ZFS performance, so simple read-sequentially-from-disk can
often deliver surprising results.

Note that such results aren't always useful: after all, the goal is to
run ZFS on the hardware, not dd! but may indicate that a certain
component of the system is or is not to blame.

Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss