date:20090729

Re: [zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08

2009-07-29 Thread Jorgen Lundman



We just picked up the fastest SSD we could in the local biccamera, which 
turned out to be a CSSDーSM32NI, with supposedly 95MB/s write speed.


I put it in place, and replaced the slog over:

  0m49.173s
  0m48.809s

So, it is slower than the CF test. This is disappointing. Everyone else 
seems to use Intel X25-M, which have a write-speed of 170MB/s (2nd 
generation) so perhaps that is why it works better for them. It is 
curious that it is slower than the CF card. Perhaps because it shares 
with so many other SATA devices?


Oh and we'll probably have to get a 3.5 frame for it, as I doubt it'll 
stay standing after the next earthquake. :)


Lund


Jorgen Lundman wrote:


This thread started over in nfs-discuss, as it appeared to be an nfs 
problem initially. Or at the very least, interaction between nfs and zil.


Just summarising speeds we have found when untarring something. Always 
in a new/empty directory. Only looking at write speed. read is always 
very fast.


The reason we started to look at this was because the 7 year old netapp 
being phased out, could untar the test file in 11 seconds. The 
x4500/x4540 Suns took 5 minutes.


For all our tests, we used MTOS-4.261-ja.tar.gz, just a random tarball I 
had lying around, but it can be downloaded here if you want the same 
test. (http://www.movabletype.org/downloads/stable/MTOS-4.261-ja.tar.gz)


The command executed generally, is:

# mkdir .test34  time gtar --directory=.test34 -zxf 
/tmp/MTOS-4.261-ja.tar.gz




Solaris 10 1/06 intel client: netapp 6.5.1 FAS960 server: NFSv3
  0m11.114s

Solaris 10 6/06 intel client: x4500 OpenSolaris svn117 server: nfsv4
  5m11.654s

Solaris 10 6/06 intel client: x4500 Solaris 10 10/08 server: nfsv3
  8m55.911s

Solaris 10 6/06 intel client: x4500 Solaris 10 10/08 server: nfsv4
  10m32.629s


Just untarring the tarball on the x4500 itself:

: x4500 OpenSolaris svn117 server
  0m0.478s

: x4500 Solaris 10 10/08 server
  0m1.361s



So ZFS itself is very fast. Replacing NFS with different protocols, 
identical setup, just changing tar with rsync, and nfsd with sshd.


The baseline test, using:
rsync -are ssh /tmp/MTOS-4.261-ja /export/x4500/testXX


Solaris 10 6/06 intel client: x4500 OpenSolaris svn117 : rsync on nfsv4
  3m44.857s

Solaris 10 6/06 intel client: x4500 OpenSolaris svn117 : rsync+ssh
  0m1.387s

So, get rid of nfsd and it goes from 3 minutes to 1 second!

Lets share it with smb, and mount it:


OsX 10.5.6 intel client: x4500 OpenSolaris svn117 : smb+untar
  0m24.480s


Neat, even SMB can beat nfs in default settings.

This would then indicate to me that nfsd is broken somehow, but then we 
try again after only disabling ZIL.



Solaris 10 6/06 : x4500 OpenSolaris svn117 DISABLE ZIL: nfsv4
  0m8.453s
  0m8.284s
  0m8.264s

Nice, so this is theoretically the fastest NFS speeds we can reach? We 
run postfix+dovecot for mail, which probably would be safe to not use 
ZIL. The other type is FTP/WWW/CGI, which has more active 
writes/updates. Probably not as good. Comments?



Enable ZIL, but disable zfscache (Just as a test, I have been told 
disabling zfscache is far more dangerous).



Solaris 10 6/06 : x4500 OpenSolaris svn117 DISABLE zfscacheflush: nfsv4
  0m45.139s

Interesting. Anyway, enable ZIL and zfscacheflush again, and learn a 
whole lot about slog.


First I tried creating a 2G slog on the boot mirror:


Solaris 10 6/06 : x4500 OpenSolaris svn117 slog boot pool: nfsv4

  1m59.970s


Some improvements. For a lark, I created a 2GB file in /tmp/ and changed 
the slog to that. (I know, having the slog in volatile RAM is pretty 
much the same as disabling ZIL. But it should give me theoretical 
maximum speed with ZIL enabled right?).



Solaris 10 6/06 : x4500 OpenSolaris svn117 slog /tmp/junk: nfsv4
  0m8.916s


Nice! Same speed as ZIL disabled. Since this is a X4540, we thought we 
would test with a CF card attached. Alas the 600X (92MB/s) card are not 
out until next month, rats! So, we bought a 300X (40MB/s) card.



Solaris 10 6/06 : x4500 OpenSolaris svn117 slog 300X CFFlash: nfsv4
  0m26.566s


Not too bad really. But you have to reboot to see a CF card, fiddle with 
BIOS for the boot order etc. Just not an easy add on a live system. A 
SATA emulated SSD DISK can be hot-swapped.



Also, I learned an interesting lesson about rebooting with slog at 
/tmp/junk.



I am hoping to pick up a SSD SATA device today and see what speeds we 
get out of that.


That rsync (1s) vs nfs(8s) I can accept as over-head on a much more 
complicated protocol, but why would it take 3 minutes to write the same 
data on the same pool, with rsync(1s) vs nfs(3m)? The ZIL was on, slog 
is default, but both writing the same way. Does nfsd add FD_SYNC to 
every close regardless as to whether the application did or not?

This I have not yet wrapped my head around.

For example, I know rsync

Re: [zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08

2009-07-29 Thread Ross

Everyone else should be using the Intel X25-E.  There's a massive difference 
between the M and E models, and for a slog it's IOPS and low latency that you 
need.  

I've heard that Sun use X25-E's, but I'm sure that original reports had them 
using STEC.  I have a feeling the 2nd generation X25-E's are going to give STEC 
a run for their money though.  If I were you, I'd see if you can get your hands 
on an X25-E for evaluation purposes.

Also, if you're just running NFS over gigabit ethernet, a single X25-E may be 
enough, but at around 90MB/s sustained performance for each, you might need to 
stripe a few of them to match the speeds your Thumper is capable of.

We're not running an x4500, but we were lucky enough to get our hands on some 
PCI 512MB nvram cards a while back, and I can confirm they make a huge 
difference to NFS speeds - for our purposes they're identical to ramdisk slog 
performance.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] zpool import hungs up forever...

2009-07-29 Thread Pavel Kovalenko

after several errors on QLogic HBA pool cache was damaged and zfs cannot import 
pool, there is no any disk or cpu activity during import...
#uname -a
SunOS orion 5.11 snv_111b i86pc i386 i86pc
# zpool import
  pool: data1
id: 6305414271646982336
 state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier and
the '-f' flag.
   see: http://www.sun.com/msg/ZFS-8000-EY
config:

data1   ONLINE
  c14t0d0   ONLINE
and after
#zpool import -f data1
terminal still waiting forever. 

also after 
# zdb -e data1
Uberblock

magic = 00bab10c
version = 6
txg = 2682808
guid_sum = 14250651627001887594
timestamp = 1247866318 UTC = Sat Jul 18 01:31:58 2009

Dataset mos [META], ID 0, cr_txg 4, 27.1M, 3050 objects
Dataset data1 [ZPL], ID 5, cr_txg 4, 5.74T, 52987 objects

terminal still waiting forever too, 
and thereis no helpful messages in system log,
and there are no ideas how to recover this pool, any help will be greatly 
appreciated...
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool import hungs up forever...

2009-07-29 Thread Victor Latushkin


On 29.07.09 13:04, Pavel Kovalenko wrote:

after several errors on QLogic HBA pool cache was damaged and zfs cannot import 
pool, there is no any disk or cpu activity during import...
#uname -a
SunOS orion 5.11 snv_111b i86pc i386 i86pc
# zpool import
  pool: data1
id: 6305414271646982336
 state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier and
the '-f' flag.
   see: http://www.sun.com/msg/ZFS-8000-EY
config:

data1   ONLINE
  c14t0d0   ONLINE
and after
#zpool import -f data1
terminal still waiting forever. 


Try to use

echo 0tpid of zpool::pid2proc|::walk thread|::findstack -v | mdb -k

to find out what it is doing.

see fmdump -eV output for fresh error reports from ZFS

also after 
# zdb -e data1

Uberblock

magic = 00bab10c
version = 6
txg = 2682808
guid_sum = 14250651627001887594
timestamp = 1247866318 UTC = Sat Jul 18 01:31:58 2009

Dataset mos [META], ID 0, cr_txg 4, 27.1M, 3050 objects
Dataset data1 [ZPL], ID 5, cr_txg 4, 5.74T, 52987 objects

terminal still waiting forever too, 
and thereis no helpful messages in system log,


Does

zdb -e -t 2682807 data1

make any difference?

victor
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08

2009-07-29 Thread Ross

Hi James, I'll not reply in line since the forum software is completely munging 
your post.

On the X25-E I believe there is cache, and it's not backed up.  While I haven't 
tested it, I would expect the X25-E to have the cache turned off while used as 
a ZIL.

The 2nd generation X25-E announced by Intel does have 'safe storage' as they 
term it.  I believe it has more cache, a faster write speed, and is able to 
guarantee that the contents of the cache will always make it to stable storage.

My guess would be that since it's designed for the server market, the cache on 
the X25-E would be irrelevant - the device is going to honor flush requests and 
the ZIL will be stable.  I suspect that the X25-E G2 will ignore flush 
requests, with Intel's engineers confident that the data in the cache is safe.

The NVRAM card we're using is a MM-5425, identical to the one used in the 
famous 'blog on slogs', I was lucky to get my hands on a pair and some drivers 
:-)

I think the raid controller approach is a nice idea though, and should work 
just as well.

I'd love an 80GB ioDrive to use as our ZIL, I think that's the best hardware 
solution out there right now, but until Fusion-IO release Solaris drivers I'm 
going to have to stick with my 512MB...
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Motherboard for home zfs/solaris file server

2009-07-29 Thread Constantin Gonzalez


Hi,

thank you so much for this post. This is exactly what I was looking for.
I've been eyeing the M3A76-CM board, but will now look at 78 and M4A as
well.

Actually, not that many Asus M3A, let alone M4A boards show up yet on the
OpenSolaris HCL, so I'd like to encourage everyone to share their hardware
experience by clicking on the submit hardware link on:

  http://www.sun.com/bigadmin/hcl/data/os/

I've done it a couple of times and it's really just a matter of 5-10 minutes
where you can help others know if a certain component works or not or if a
special driver or /etc/driver_aliases setting is required.

I'm also interested in getting the power down. Right now, I have the
Athlon X2 5050e (45W TDP) on my list, but I'd also like to know more about
the possibilities of the Athlon II X2 250 and whether it has better potential
for power savings.

Neal, the M3A78 seems to have a RealTek RTL8111/8168B NIC chip. I pulled
this off a Gentoo Wiki, because strangely this information doesn't show up
on the Asus website.

Also, thanks for the CF to pata hint for the root pool mirror. Will try to
find fast CFs to boot from. The performance problems you see when writing
may be related to master/slave issues, but I'm not a good PC tweaker to back
that up.

Cheers,
   Constantin


F. Wessels wrote:

Hi,

I'm using asus m3a78 boards (with the sb700) for opensolaris and m2a* boards
(with the sb600) for linux some of them with 4*1GB and others with 4*2Gb ECC
memory. Ecc faults will be detected and reported. I tested it with a small
tungsten light. By moving the light source slowly towards the memory banks
you'll heat them up in a controlled way and at a certain point bit flips will
occur. I recommend you to go for a m4a board since they support up to 16 GB.
 I don't know if you can run opensolaris without a videocard after
installation I think you can disable the halt on no video card in the bios.
But Simon Breden had some trouble with it, see his homeserver blog. But you
can go for one of the three m4a boards with a 780g onboard. Those will give
you 2 pci-e x16 connectors. I don't think the onboard nic is supported. I
always put an intel (e1000) in, just to prevent any trouble. I don't have any
trouble with the sb700 in ahci mode. Hotplugging works like a charm.
Transfering a couple of GB's over esata takes considerable less time than via
usb. I have a pata to dual cf adapter and two industrial 16gb cf cards as
mirrored root pool. It takes for ever to install nevada, at least 14 hours. I
suspect the cf cards lack caches. But I don't update that regularly, still on
snv104.  And have 2 mirrors and a hot spare. The sixth port is an esata port
I use to transfer large amounts of data. This system consumes about 73 watts
idle and 82 under load i/o load. (5 disks , a separate nic  ,8 gb ram and a
be2400 all using just 73 watts!!!) Please note that frequency scaling is only
supported on the K10 architecture. But don't expect to much power saving from
it. A lower voltage yields far greater savings than a lower frequency. In
september I'll do a post about the afore mentioned M4A boards and an lsi sas
controller in one of the pcie x16 slots.


--
Constantin Gonzalez  Sun Microsystems GmbH, Germany
Principal Field Technologisthttp://blogs.sun.com/constantin
Tel.: +49 89/4 60 08-25 91   http://google.com/search?q=constantin+gonzalez

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool import hungs up forever...

2009-07-29 Thread Pavel Kovalenko

fortunately, after several hours terminal went back --
# zdb -e data1
Uberblock

magic = 00bab10c
version = 6
txg = 2682808
guid_sum = 14250651627001887594
timestamp = 1247866318 UTC = Sat Jul 18 01:31:58 2009

Dataset mos [META], ID 0, cr_txg 4, 27.1M, 3050 objects
Dataset data1 [ZPL], ID 5, cr_txg 4, 5.74T, 52987 objects

capacity   operations   bandwidth   errors 
descriptionused avail  read write  read write  read write cksum
data1 5.74T 6.99T   772 0 96.0M 0 0 091
  /dev/dsk/c14t0d05.74T 6.99T   772 0 96.0M 0 0 0   223
#
i've tried to run zdb -e -t 2682807 data1
and 
#echo 0t::pid2proc|::walk thread|::findstack -v | mdb -k
shows
stack pointer for thread fbc2cca0: fbc4d980
[ fbc4d980 _resume_from_idle+0xf1() ]
  fbc4d9b0 swtch+0x147()
  fbc4da40 sched+0x3fd()
  fbc4da70 main+0x437()
  fbc4da80 _locore_start+0x92()

and 
#fmdump -eV
shows checksum errors, such as 
Jul 28 2009 11:17:35.386268381 ereport.fs.zfs.checksum
nvlist version: 0
class = ereport.fs.zfs.checksum
ena = 0x1baa23c52ce01c01
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = zfs
pool = 0x578154df5f3260c0
vdev = 0x6e4327476e17daaa
(end detector)

pool = data1
pool_guid = 0x578154df5f3260c0
pool_context = 2
pool_failmode = wait
vdev_guid = 0x6e4327476e17daaa
vdev_type = disk
vdev_path = /dev/dsk/c14t0d0p0
vdev_devid = id1,s...@n2661000612646364/q
parent_guid = 0x578154df5f3260c0
parent_type = root
zio_err = 50
zio_offset = 0x2313d58000
zio_size = 0x4000
zio_objset = 0x0
zio_object = 0xc
zio_level = 0
zio_blkid = 0x0
__ttl = 0x1
__tod = 0x4a6ea60f 0x1705fcdd
Jul 28 2009 11:17:35.386268179 ereport.fs.zfs.checksum
nvlist version: 0
class = ereport.fs.zfs.checksum
ena = 0x1baa23c52ce01c01
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = zfs
pool = 0x578154df5f3260c0
vdev = 0x6e4327476e17daaa
(end detector)

pool = data1
pool_guid = 0x578154df5f3260c0
pool_context = 2
pool_failmode = wait
vdev_guid = 0x6e4327476e17daaa
vdev_type = disk
vdev_path = /dev/dsk/c14t0d0p0
vdev_devid = id1,s...@n2661000612646364/q
parent_guid = 0x578154df5f3260c0
parent_type = root
zio_err = 50
zio_offset = 0x5c516eac000
zio_size = 0x4000
zio_objset = 0x0
zio_object = 0xc
zio_level = 0
zio_blkid = 0x0
__ttl = 0x1
__tod = 0x4a6ea60f 0x1705fc13

can i hope, that some data will recover after several hours with zpool import 
-f data1 command?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] resizing zpools by growing LUN

2009-07-29 Thread Jan

Hi all,
I need to know if it is possible to expand the capacity of a zpool without loss 
of data by growing the LUN (2TB) presented from an HP EVA to a Solaris 10 host.

I know that there is a possible way in Solaris Express Community Edition, b117 
with the autoexpand property. But I still work with Solaris 10 U7. Besides, 
when will this feature be integrated in Solaris 10?

Is there a workaround? I have checked it out with format tool - without effects.

Thanks for any info.

Jan
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Set New File/Folder ZFS ACLs Automatically through Samba?

2009-07-29 Thread Thomas Nau


Jeff,


On Tue, 28 Jul 2009, Jeff Hulen wrote:

Do any of you know how to set the default ZFS ACLs for newly created
files and folders when those files and folders are created through Samba?

I want to have all new files and folders only inherit extended
(non-trivial) ACLs that are set on the parent folders.  But when a file
is created through samba on the zfs file system, it gets mode 744
(trivial) added to it.  For directories, it gets mode 755 added to it.

I've tried everything I could find and think of:

1.) Setting a umask.
2.) Editing /etc/sfw/smb.conf 'force create mode' and 'force directory
mode.  Then `svcadm restart samba`.
3.) Adding trivial inheritable ACLs to the parent folder.

Changes 1 and 2 had no effect.

In number 3 I got folders to effectively do what I want, but not files.
I set the ACLs of the parent to:

drwx--+ 24 AD+administrator AD+records2132 Jul 28 12:01 records/
user:AD+administrator:rwxpdDaARWcCos:fdi---:allow
user:AD+administrator:rwxpdDaARWcCos:--:allow
group:AD+records:rwxpd-aARWc--s:fdi---:allow
group:AD+records:rwxpd-aARWc--s:--:allow
group:AD+release:r-x---a-R-c---:--:allow
owner@:rwxp---A-W-Co-:fd:allow
group@:rwxp--:fd:deny
 everyone@:rwxp---A-W-Co-:fd:deny


Then new directories and files get created like this from a windows
workstation connected to the server:

drwx--+  2 AD+testuser AD+domain users   2 Jul 28 12:01 test
user:AD+administrator:rwxpdDaARWcCos:fdi---:allow
user:AD+administrator:rwxpdDaARWcCos:--:allow
group:AD+records:rwxpd-aARWc--s:fdi---:allow
group:AD+records:rwxpd-aARWc--s:--:allow
owner@:rwxp---A-W-Co-:fdi---:allow
owner@:---A-W-Co-:--:allow
group@:rwxp--:fdi---:deny
group@:--:--:deny
 everyone@:rwxp---A-W-Co-:fdi---:deny
 everyone@:---A-W-Co-:--:deny
owner@:--:--:deny
owner@:rwxp---A-W-Co-:--:allow
group@:-w-p--:--:deny
group@:r-x---:--:allow
 everyone@:-w-p---A-W-Co-:--:deny
 everyone@:r-x---a-R-c--s:--:allow
-rwxr--r--+  1 AD+testuser AD+domain users   0 Jul 28 12:01 test.txt
user:AD+administrator:rwxpdDaARWcCos:--:allow
group:AD+records:rwxpd-aARWc--s:--:allow
owner@:---A-W-Co-:--:allow
group@:--:--:deny
 everyone@:---A-W-Co-:--:deny
owner@:--:--:deny
owner@:rwxp---A-W-Co-:--:allow
group@:-wxp--:--:deny
group@:r-:--:allow
 everyone@:-wxp---A-W-Co-:--:deny
 everyone@:r-a-R-c--s:--:allow


I need group AD+release to have read-only access to only
specific files within records.  I could set that up, but any new files or
folders that are created will be viewable by AD+release.  That
would not be acceptable.

Do any of you know how to set the samba file/folder creation ACLS on ZFS
file systems?  Or do you have something I could try?



The following setup works quite well for us with a self compiled
Samba 3.0.34 taken from the SFW source tree. The only problem
we ran into was that Microsoft Office sometimes seems to set
permissions on files in an, at least for me, unpredictable way.

smb.conf:
...
[data]
;
; public fileserver share
;
path = /smb/data
comment = user and group directories
public = no
writable = yes
browseable = yes
vfs objects = zfsacl
inherit permissions = yes
inherit acls = yes
store dos attributes = yes
hide dot files = no
nfs4: mode = simple
nfs4: acedup = merge
zfsacl: acesort = dontcare
; delete readonly = yes
;
; set to no else Microsoft Excel/Word cause 
permission problems
;
map archive = no
map hidden = no
map read only = no
map system = no


Some zfs properties of the top-level zfs which get inherited to
the children

NAME  PROPERTY VALUESOURCE
smb   snapdir  visible  local
smb   aclmode  groupmaskdefault
smb   aclinherit   restricted   default
smb   casesensitivity  sensitive-

Now for every group directory reflecting a particular department
such as kizinfra we set permissions as

# ls -ldV kizinfra

Re: [zfs-discuss] zpool import hungs up forever...

2009-07-29 Thread Victor Latushkin


On 29.07.09 14:42, Pavel Kovalenko wrote:

fortunately, after several hours terminal went back --
# zdb -e data1
Uberblock

magic = 00bab10c
version = 6
txg = 2682808
guid_sum = 14250651627001887594
timestamp = 1247866318 UTC = Sat Jul 18 01:31:58 2009

Dataset mos [META], ID 0, cr_txg 4, 27.1M, 3050 objects
Dataset data1 [ZPL], ID 5, cr_txg 4, 5.74T, 52987 objects

capacity   operations   bandwidth   errors 
descriptionused avail  read write  read write  read write cksum
data1 5.74T 6.99T   772 0 96.0M 0 0 091
  /dev/dsk/c14t0d05.74T 6.99T   772 0 96.0M 0 0 0   223
#


So we know that there are some checksum errors there but at least zdb 
was able to open pool in read-only mode.



i've tried to run zdb -e -t 2682807 data1
and 
#echo 0t::pid2proc|::walk thread|::findstack -v | mdb -k


This is wrong - you need to put PID of the 'zpool import data1' process 
right after '0t'.


and 
#fmdump -eV
shows checksum errors, such as 
Jul 28 2009 11:17:35.386268381 ereport.fs.zfs.checksum

nvlist version: 0
class = ereport.fs.zfs.checksum
ena = 0x1baa23c52ce01c01
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = zfs
pool = 0x578154df5f3260c0
vdev = 0x6e4327476e17daaa
(end detector)

pool = data1
pool_guid = 0x578154df5f3260c0
pool_context = 2
pool_failmode = wait
vdev_guid = 0x6e4327476e17daaa
vdev_type = disk
vdev_path = /dev/dsk/c14t0d0p0
vdev_devid = id1,s...@n2661000612646364/q
parent_guid = 0x578154df5f3260c0
parent_type = root
zio_err = 50
zio_offset = 0x2313d58000
zio_size = 0x4000
zio_objset = 0x0
zio_object = 0xc
zio_level = 0
zio_blkid = 0x0
__ttl = 0x1
__tod = 0x4a6ea60f 0x1705fcdd


This tells us that object 0xc in metabjset (objset 0x0) is corrupted.

So to get more details you can do the following:

zdb -e - data1

zdb -e -bbcs data1

victor
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool import hungs up forever...

2009-07-29 Thread Markus Kovero

I recently noticed that importing larger pools that are occupied by large 
amounts of data can do zpool import for several hours while zpool iostat only 
showing some random reads now and then and iostat -xen showing quite busy disk 
usage, It's almost it goes thru every bit in pool before it goes thru.

Somebody said that zpool import got faster on snv118, but I don't have real 
information on that yet.

Yours
Markus Kovero

-Original Message-
From: zfs-discuss-boun...@opensolaris.org 
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Victor Latushkin
Sent: 29. heinäkuuta 2009 14:05
To: Pavel Kovalenko
Cc: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] zpool import hungs up forever...

On 29.07.09 14:42, Pavel Kovalenko wrote:
 fortunately, after several hours terminal went back --
 # zdb -e data1
 Uberblock
 
 magic = 00bab10c
 version = 6
 txg = 2682808
 guid_sum = 14250651627001887594
 timestamp = 1247866318 UTC = Sat Jul 18 01:31:58 2009
 
 Dataset mos [META], ID 0, cr_txg 4, 27.1M, 3050 objects
 Dataset data1 [ZPL], ID 5, cr_txg 4, 5.74T, 52987 objects
 
 capacity   operations   bandwidth   errors 
 
 descriptionused avail  read write  read write  read write 
 cksum
 data1 5.74T 6.99T   772 0 96.0M 0 0 0
 91
   /dev/dsk/c14t0d05.74T 6.99T   772 0 96.0M 0 0 0   
 223
 #

So we know that there are some checksum errors there but at least zdb 
was able to open pool in read-only mode.

 i've tried to run zdb -e -t 2682807 data1
 and 
 #echo 0t::pid2proc|::walk thread|::findstack -v | mdb -k

This is wrong - you need to put PID of the 'zpool import data1' process 
right after '0t'.

 and 
 #fmdump -eV
 shows checksum errors, such as 
 Jul 28 2009 11:17:35.386268381 ereport.fs.zfs.checksum
 nvlist version: 0
 class = ereport.fs.zfs.checksum
 ena = 0x1baa23c52ce01c01
 detector = (embedded nvlist)
 nvlist version: 0
 version = 0x0
 scheme = zfs
 pool = 0x578154df5f3260c0
 vdev = 0x6e4327476e17daaa
 (end detector)
 
 pool = data1
 pool_guid = 0x578154df5f3260c0
 pool_context = 2
 pool_failmode = wait
 vdev_guid = 0x6e4327476e17daaa
 vdev_type = disk
 vdev_path = /dev/dsk/c14t0d0p0
 vdev_devid = id1,s...@n2661000612646364/q
 parent_guid = 0x578154df5f3260c0
 parent_type = root
 zio_err = 50
 zio_offset = 0x2313d58000
 zio_size = 0x4000
 zio_objset = 0x0
 zio_object = 0xc
 zio_level = 0
 zio_blkid = 0x0
 __ttl = 0x1
 __tod = 0x4a6ea60f 0x1705fcdd

This tells us that object 0xc in metabjset (objset 0x0) is corrupted.

So to get more details you can do the following:

zdb -e - data1

zdb -e -bbcs data1

victor
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool import hungs up forever...

2009-07-29 Thread Pavel Kovalenko

Victor, 

after 
# ps -ef | grep zdb | grep -v grep 
root  3281  1683   1 14:22:09 pts/2   8:57 zdb -e -t 2682807 data1
i've inserted pid after 0t:
# echo 0t3281::pid2proc|::walk thread|::findstack -v | mdb -k mdb-k0t3281
and got a couple of records:
stack pointer for thread ff02017ad700: ff0008ce8a50
[ ff0008ce8a50 _resume_from_idle+0xf1() ]
  ff0008ce8a80 swtch+0x147()
  ff0008ce8ac0 sema_p+0x1d9(ff01df7a9b70)
  ff0008ce8af0 biowait+0x76(ff01df7a9ab0)
  ff0008ce8bf0 default_physio+0x3d3(f7a148f0, 0, d70207, 40, 
  f7a14130, ff0008ce8e80)
  ff0008ce8c30 physio+0x25(f7a148f0, 0, d70207, 40, 
  f7a14130, ff0008ce8e80)
  ff0008ce8c80 sdread+0x150(d70207, ff0008ce8e80, ff01e94927c8)
  ff0008ce8cb0 cdev_read+0x3d(d70207, ff0008ce8e80, ff01e94927c8
  )
  ff0008ce8d30 spec_read+0x270(ff020de0aa00, ff0008ce8e80, 0, 
  ff01e94927c8, 0)
  ff0008ce8da0 fop_read+0x6b(ff020de0aa00, ff0008ce8e80, 0, 
  ff01e94927c8, 0)
  ff0008ce8f00 pread+0x22c(8, 3e98000, 2, 291cf9a)
  ff0008ce8f10 sys_syscall+0x17b()
stack pointer for thread ff01e92dd8a0: ff000904cd00
[ ff000904cd00 _resume_from_idle+0xf1() ]
  ff000904cd30 swtch+0x147()
  ff000904cd90 cv_wait_sig_swap_core+0x170(ff01e92dda76, 
  ff01e92dda78, 0)
  ff000904cdb0 cv_wait_sig_swap+0x18(ff01e92dda76, ff01e92dda78)
  ff000904ce20 cv_waituntil_sig+0x135(ff01e92dda76, ff01e92dda78, 0
  , 0)
  ff000904cec0 lwp_park+0x157(0, 0)
  ff000904cf00 syslwp_park+0x31(0, 0, 0)
  ff000904cf10 sys_syscall+0x17b()
stack pointer for thread ff01e92c4e60: ff0008ceed00
[ ff0008ceed00 _resume_from_idle+0xf1() ]
  ff0008ceed30 swtch+0x147()
  ff0008ceed90 cv_wait_sig_swap_core+0x170(ff01e92c5036, 
  ff01e92c5038, 0)
  ff0008ceedb0 cv_wait_sig_swap+0x18(ff01e92c5036, ff01e92c5038)
  ff0008ceee20 cv_waituntil_sig+0x135(ff01e92c5036, ff01e92c5038, 0
  , 0)
  ff0008ceeec0 lwp_park+0x157(0, 0)
  ff0008ceef00 syslwp_park+0x31(0, 0, 0)
  ff0008ceef10 sys_syscall+0x17b()
and 
#zdb -e - data1 zdb-e-_list.txt
lists a lot of data objects, that were on the pool:
# ls -la zdb-e-_list.txt 
-rw-r--r--   1 root root 28863781 Jul 28 21:41 zdb-e-_list.txt

I can provide more detailed information by email pkovalenko at mtv.ru, 
as i don't know any zfs specialist who can help me recover pool.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] avail drops to 32.1T from 40.8T after create -o mountpoint

2009-07-29 Thread Mark J Musante


On Tue, 28 Jul 2009, Glen Gunselman wrote:



# zpool list
NAME SIZE   USED  AVAILCAP  HEALTH  ALTROOT
zpool1  40.8T   176K  40.8T 0%  ONLINE  -



# zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
zpool1 364K  32.1T  28.8K  /zpool1


This is normal, and admittedly somewhat confusing (see CR 6308817).  Even 
if you had not created the additional zfs datasets, it still would have 
listed 40T and 32T.


Here's an example using five 1G disks in a raidz:

-bash-3.2# zpool list
NAME   SIZE   USED  AVAILCAP  HEALTH  ALTROOT
tank  4.97G   132K  4.97G 0%  ONLINE  -
-bash-3.2# zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
tank  98.3K  3.91G  28.8K  /tank

The AVAIL column in the zpool output shows 5G, whereas it shows 4G in the 
zfs list.  The difference is the 1G parity.  If we use raidz2, we'd expect 
2G to be used for the parity, and this is borne out in a quick test using 
the same disks:


-bash-3.2# zpool list
NAME   SIZE   USED  AVAILCAP  HEALTH  ALTROOT
tank  4.97G   189K  4.97G 0%  ONLINE  -
-bash-3.2# zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
tank   105K  2.91G  32.2K  /tank


Contrast that with a five-way mirror:

-bash-3.2# zpool list
NAME   SIZE   USED  AVAILCAP  HEALTH  ALTROOT
tank  1016M  73.5K  1016M 0%  ONLINE  -
-bash-3.2# zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
tank69K   984M18K  /tank

Now they both show the pool capacity to be around 1G.


Regards,
markm
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] [indiana-discuss] zfs issues?

2009-07-29 Thread James Lever



On 29/07/2009, at 12:00 AM, James Lever wrote:

CR 6865661 *HOT* Created, P1 opensolaris/triage-queue zfs scrub  
rpool causes zpool hang


This bug I logged has been marked as related to CR 6843235 which is  
fixed in snv 119.


cheers,
James
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] avail drops to 32.1T from 40.8T after create -o mountpoint

2009-07-29 Thread Glen Gunselman

 IIRC zpool list includes the parity drives in the disk space calculation
and zfs list doesn't.

 Terabyte drives are more likely 900-something GB drives thanks to that
base-2 vs. base-10 confusion HD manufacturers introduced. Using that
900GB figure I get to both 40TB and 32TB for with and without parity
drives. Spares aren't counted.

I see format/verify shows the disk size as 931GB

Volume name = 
ascii name  = ATA-HITACHI HUA7210S-A90A-931.51GB
bytes/sector=  512
sectors = 1953525166
accessible sectors = 1953525133
Part  TagFlag First Sector  Size  Last Sector
  0usrwm   256   931.51GB   1953508749
  1 unassignedwm 000
  2 unassignedwm 000
  3 unassignedwm 000
  4 unassignedwm 000
  5 unassignedwm 000
  6 unassignedwm 000
  8   reservedwm1953508750 8.00MB   1953525133

I totally over looked the count the spares/don't count the spares issue. When 
they (the manufacturers) round up and then multiply by 48 the difference 
between what the sales brochure shows and what you end up with becomes 
significant.

There was a time when manufacturers know about base-2 but those days are long 
gone.

Thanks for the reply,
Glen
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] avail drops to 32.1T from 40.8T after create -o mountpoint

2009-07-29 Thread Glen Gunselman

 Here is the output from my J4500 with 48 x 1 TB
 disks. It is almost the 
 exact same configuration as
 yours. This is used for Netbackup. As Mario just
 pointed out, zpool 
 list includes the parity drive
 in the space calculation whereas zfs list doesn't.
 
 [r...@xxx /]# zpool status
 

Scoot,

Thanks for the sample zpool status output.  I will be using the storage for 
NetBackup, also.  (I am booting the X4500 from a SAN - 6140 - and using a SL48 
w/2 LTO4 drives.)

Glen
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] avail drops to 32.1T from 40.8T after create -o mountpoint

2009-07-29 Thread Glen Gunselman

 This is normal, and admittedly somewhat confusing
 (see CR 6308817).  Even 
 if you had not created the additional zfs datasets,
 it still would have 
 listed 40T and 32T.
 

Mark, 

Thanks for the examples.  

Where would I see CR 6308817 my usual search tools aren't find it.

Glen
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] feature proposal

2009-07-29 Thread Andriy Gapon


What do you think about the following feature?

Subdirectory is automatically a new filesystem property - an administrator 
turns
on this magic property of a filesystem, after that every mkdir *in the root* of
that filesystem creates a new filesystem. The new filesystems have
default/inherited properties except for the magic property which is off.

Right now I see this as being mostly useful for /home. Main benefit in this case
is that various user administration tools can work unmodified and do the right
thing when an administrator wants a policy of a separate fs per user
But I am sure that there could be other interesting uses for this.

-- 
Andriy Gapon
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] feature proposal

2009-07-29 Thread Andre van Eyssen


On Wed, 29 Jul 2009, Andriy Gapon wrote:


Subdirectory is automatically a new filesystem property - an administrator 
turns
on this magic property of a filesystem, after that every mkdir *in the root* of
that filesystem creates a new filesystem. The new filesystems have
default/inherited properties except for the magic property which is off.

Right now I see this as being mostly useful for /home. Main benefit in this case
is that various user administration tools can work unmodified and do the right
thing when an administrator wants a policy of a separate fs per user
But I am sure that there could be other interesting uses for this.


It's a nice idea, but zfs filesystems consume memory and have overhead. 
This would make it trivial for a non-root user (assuming they have 
permissions) to crush the host under the weight of .. mkdir.


$ mkdir -p waste/resources/now/waste/resources/now/waste/resources/now

(now make that much longer and put it in a loop)

Also, will rmdir call zfs destroy? Snapshots interacting with that could 
be somewhat unpredictable. What about rm -rf?


It'd either require major surgery to userland tools, including every 
single program that might want to create a directory, or major surgery to 
the kernel. The former is unworkable, the latter .. scary.


--
Andre van Eyssen.
mail: an...@purplecow.org  jabber: an...@interact.purplecow.org
purplecow.org: UNIX for the masses http://www2.purplecow.org
purplecow.org: PCOWpix http://pix.purplecow.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] avail drops to 32.1T from 40.8T after create -o mountpoint

2009-07-29 Thread Mark J Musante


On Wed, 29 Jul 2009, Glen Gunselman wrote:


Where would I see CR 6308817 my usual search tools aren't find it.


http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6308817


Regards,
markm
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] feature proposal

2009-07-29 Thread Darren J Moffat


Andriy Gapon wrote:

What do you think about the following feature?

Subdirectory is automatically a new filesystem property - an administrator 
turns
on this magic property of a filesystem, after that every mkdir *in the root* of
that filesystem creates a new filesystem. The new filesystems have
default/inherited properties except for the magic property which is off.


This has been brought up before and I thought there was an open CR for 
it but I can't find it.



Right now I see this as being mostly useful for /home. Main benefit in this case
is that various user administration tools can work unmodified and do the right
thing when an administrator wants a policy of a separate fs per user
But I am sure that there could be other interesting uses for this.


A good use case.  Another good one is shared build machine which is 
similar to the home dir case.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] feature proposal

2009-07-29 Thread David Magda

On Wed, July 29, 2009 10:24, Andre van Eyssen wrote:

 It'd either require major surgery to userland tools, including every
 single program that might want to create a directory, or major surgery to
 the kernel. The former is unworkable, the latter .. scary.

How about: add a flag (-Z?) to useradd(1M) and usermod(1M) so that if
base_dir is on ZFS, then the user's homedir is created as a new file
system (assuming -m).


Which makes me wonder: is there a programmatic way to determine if a path
is on ZFS?


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] feature proposal

2009-07-29 Thread Darren J Moffat


David Magda wrote:

On Wed, July 29, 2009 10:24, Andre van Eyssen wrote:


It'd either require major surgery to userland tools, including every
single program that might want to create a directory, or major surgery to
the kernel. The former is unworkable, the latter .. scary.


How about: add a flag (-Z?) to useradd(1M) and usermod(1M) so that if
base_dir is on ZFS, then the user's homedir is created as a new file
system (assuming -m).


Which makes me wonder: is there a programmatic way to determine if a path
is on ZFS?


st_fstype field of struct stat.

--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] feature proposal

2009-07-29 Thread Andriy Gapon

on 29/07/2009 17:24 Andre van Eyssen said the following:
 On Wed, 29 Jul 2009, Andriy Gapon wrote:
 
 Subdirectory is automatically a new filesystem property - an
 administrator turns
 on this magic property of a filesystem, after that every mkdir *in the
 root* of
 that filesystem creates a new filesystem. The new filesystems have
 default/inherited properties except for the magic property which is off.

 Right now I see this as being mostly useful for /home. Main benefit in
 this case
 is that various user administration tools can work unmodified and do
 the right
 thing when an administrator wants a policy of a separate fs per user
 But I am sure that there could be other interesting uses for this.
 
 It's a nice idea, but zfs filesystems consume memory and have overhead.
 This would make it trivial for a non-root user (assuming they have
 permissions) to crush the host under the weight of .. mkdir.

Well, I specifically stated that this property should not be recursive, i.e. it
should work only in a root of a filesystem.
When setting this property on a filesystem an administrator should carefully set
permissions to make sure that only trusted entities can create directories 
there.

'rmdir' question requires some thinking, my first reaction is it should do zfs
destroy...


-- 
Andriy Gapon
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] feature proposal

2009-07-29 Thread Andre van Eyssen


On Wed, 29 Jul 2009, David Magda wrote:


Which makes me wonder: is there a programmatic way to determine if a path
is on ZFS?


statvfs(2)

--
Andre van Eyssen.
mail: an...@purplecow.org  jabber: an...@interact.purplecow.org
purplecow.org: UNIX for the masses http://www2.purplecow.org
purplecow.org: PCOWpix http://pix.purplecow.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] feature proposal

2009-07-29 Thread Mark J Musante


On Wed, 29 Jul 2009, David Magda wrote:

Which makes me wonder: is there a programmatic way to determine if a 
path is on ZFS?


Yes, if it's local. Just use df -n $path and it'll spit out the filesystem 
type.  If it's mounted over NFS, it'll just say something like nfs or 
autofs, though.



Regards,
markm
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] feature proposal

2009-07-29 Thread Andre van Eyssen


On Wed, 29 Jul 2009, Andriy Gapon wrote:


Well, I specifically stated that this property should not be recursive, i.e. it
should work only in a root of a filesystem.
When setting this property on a filesystem an administrator should carefully set
permissions to make sure that only trusted entities can create directories 
there.


Even limited to the root of a filesystem, it still gives a user the 
ability to consume resources rapidly. While I appreciate the fact that it 
would be restricted by permissions, I can think of a number of usage cases 
where it could suddenly tank a host. One use that might pop up, for 
example, would be cache spools - which often contain *many* directories. 
One runaway and kaboom.


We generally use hosts now with plenty of RAM and the per-filesystem 
overhead for ZFS doesn't cause much concern. However, on a scratch box, 
try creating a big stack of filesystems - you can end up with a pool that 
consumes so much memory you can't import it!



'rmdir' question requires some thinking, my first reaction is it should do zfs
destroy...


.. which will fail if there's a snapshot, for example. The problem seems 
to be reasonably complex - compounded by the fact that many programs that 
create or remove directories do so directly - not by calling externals 
that would be ZFS aware.


--
Andre van Eyssen.
mail: an...@purplecow.org  jabber: an...@interact.purplecow.org
purplecow.org: UNIX for the masses http://www2.purplecow.org
purplecow.org: PCOWpix http://pix.purplecow.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Strange errors in zpool scrub, Solaris 10u6 x86_64

2009-07-29 Thread Jim Klimov

I did a zpool scrub recently, and while it was running it reported errors and 
woed 
about restoring from backup. When the scrub is complete, it reports finishing 
with
0 errors though. On the next scrub some other errors are reported in different 
files.
iostat -xne does report a few errors (1 s/w on each of 2 mirrored drives, and 
2 h/w
errors on one of the drives).

Any ideas? Is it a cosmetic problem or a crawling-hiding bug in my hardware and
I should go about replacing something somewhere? I don't see such behavior on
any other servers around...

Thanks for ideas,
//Jim

zpool status -v while scrub is running:

  pool: rpool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub in progress for 0h1m, 29.45% done, 0h2m to go
config:

NAME  STATE READ WRITE CKSUM
rpool ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c3t2d0s0  ONLINE   0 0 0
c3t3d0s0  ONLINE   0 0 0

errors: Permanent errors have been detected in the following files:

//dev/dsk/c3t2d0s0
//dev/dsk/c3t3d0s0



That has previously complained about //dev/dsk having problems, although both
the directory and device files are accessible well.

zpool status -v when scrub has finished:

  pool: rpool
 state: ONLINE
 scrub: scrub completed after 0h4m with 0 errors on Wed Jul 29 18:23:43 2009
config:

NAME  STATE READ WRITE CKSUM
rpool ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c3t2d0s0  ONLINE   0 0 0
c3t3d0s0  ONLINE   0 0 0

errors: No known data errors
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] feature proposal

2009-07-29 Thread Andre van Eyssen


On Wed, 29 Jul 2009, Mark J Musante wrote:


Yes, if it's local. Just use df -n $path and it'll spit out the filesystem 
type.  If it's mounted over NFS, it'll just say something like nfs or autofs, 
though.


$ df -n /opt
Filesystemkbytesused   avail capacity  Mounted on
/dev/md/dsk/d24  33563061 11252547 2197488434%/opt
$ df -n /sata750
Filesystemkbytesused   avail capacity  Mounted on
sata750  2873622528  77 322671575 1%/sata750

Not giving the filesystem type. It's easy to spot the zfs with the lack of 
recognisable device path, though.


--
Andre van Eyssen.
mail: an...@purplecow.org  jabber: an...@interact.purplecow.org
purplecow.org: UNIX for the masses http://www2.purplecow.org
purplecow.org: PCOWpix http://pix.purplecow.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] LVM and ZFS

2009-07-29 Thread Peter Eriksson

I'm curious about if there are any potential problems with using LVM 
metadevices as ZFS zpool targets. I have a couple of situations where using a 
device directly by ZFS causes errors on the console about Bus  and lots of 
stalled I/O. But as soon as I wrap that device inside an LVM metadevice and 
then use it in the ZFS zpool things work perfectly fine and smoothly (no 
stalls).

Situation 1 is when trying to use Intel X25-E or X25-M SSD disks in a Sun X4240 
server with the LSI SAS controller - I never could get things to run without 
errors no matter what. (Tried multiple LSI controllers and multiple SSD disks). 

Jul  8 09:43:31 merope scsi: [ID 365881 kern.info] 
/p...@0,0/pci10de,3...@f/pci1000,3...@0 (mpt0):
Jul  8 09:43:31 merope  Log info 31126000 received for target 15.
Jul  8 09:43:31 merope  scsi_status=0, ioc_status=804b, scsi_state=c
Jul  8 09:43:31 merope scsi: [ID 365881 kern.info] 
/p...@0,0/pci10de,3...@f/pci1000,3...@0 (mpt0):
Jul  8 09:43:31 merope  Log info 31126000 received for target 15.
Jul  8 09:43:31 merope  scsi_status=0, ioc_status=804b, scsi_state=c
Jul  8 09:43:31 merope scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@f/pci1000,3...@0/s...@f,0 (sd32):
Jul  8 09:43:31 merope  Error for Command: write   Error Level: 
Retryable
Jul  8 09:43:31 merope scsi: [ID 107833 kern.notice]Requested Block: 64256  
   Error Block: 64256
Jul  8 09:43:31 merope scsi: [ID 107833 kern.notice]Vendor: ATA 
   Serial Number: CVEM8493
00BM
Jul  8 09:43:31 merope scsi: [ID 107833 kern.notice]Sense Key: Unit 
Attention
Jul  8 09:43:31 merope scsi: [ID 107833 kern.notice]ASC: 0x29 (power on, 
reset, or bus reset occurred), ASCQ: 0x0, FRU
: 0x0

Situation 2 is when I installed an X25-E in a X4500 Thumper. Here I didn't see 
any errors on the console of the server, but performance would at regular 
intervals drop to zero (felt the same as in the LSI case above, just without 
the console errors).

(In situation 1 above things would work perfectly fine when I was using an 
Adaptec controller instead).

Anyway, when I put the 4GB partition of that SSD disk that I was using for 
testing inside a simple LVM metadevice all errors vanished and performance 
increased many many times. And no  hickups.

But I wonder... Is there anything in a setup like this that might be dangerous 
- something that might come back and bite me in the future? LVM(disksuite) is 
really mature technology and something I've been using without problems on many 
servers for many years so I think it can be trusted but anyway...?

(I use the partition of that SSD-in-a-LVM-metadevice as a SLOG device for the 
ZFS zpools on those servers and performance is now really *really* good).
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] avail drops to 32.1T from 40.8T after create -o mountpoint

2009-07-29 Thread Victor Latushkin


On 29.07.09 16:59, Mark J Musante wrote:

On Tue, 28 Jul 2009, Glen Gunselman wrote:



# zpool list
NAME SIZE   USED  AVAILCAP  HEALTH  ALTROOT
zpool1  40.8T   176K  40.8T 0%  ONLINE  -



# zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
zpool1 364K  32.1T  28.8K  /zpool1


This is normal, and admittedly somewhat confusing (see CR 6308817).  
Even if you had not created the additional zfs datasets, it still would 
have listed 40T and 32T.


Here's an example using five 1G disks in a raidz:

-bash-3.2# zpool list
NAME   SIZE   USED  AVAILCAP  HEALTH  ALTROOT
tank  4.97G   132K  4.97G 0%  ONLINE  -
-bash-3.2# zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
tank  98.3K  3.91G  28.8K  /tank

The AVAIL column in the zpool output shows 5G, whereas it shows 4G in 
the zfs list.  The difference is the 1G parity.  If we use raidz2, we'd 
expect 2G to be used for the parity, and this is borne out in a quick 
test using the same disks:


-bash-3.2# zpool list
NAME   SIZE   USED  AVAILCAP  HEALTH  ALTROOT
tank  4.97G   189K  4.97G 0%  ONLINE  -
-bash-3.2# zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
tank   105K  2.91G  32.2K  /tank


Contrast that with a five-way mirror:

-bash-3.2# zpool list
NAME   SIZE   USED  AVAILCAP  HEALTH  ALTROOT
tank  1016M  73.5K  1016M 0%  ONLINE  -
-bash-3.2# zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
tank69K   984M18K  /tank


Mirror case shows one more thing worth to mention - difference between 
available space reported by zpool and zfs is explained by a reservation 
set aside by ZFS for internal purposes - it is 32MB or 1/64 of pool 
capacity whichever is bigger (32MB in this example). Same reservation 
applies to RAID-Z case as well, though it is difficult to see it ;-)


victor
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] feature proposal

2009-07-29 Thread Kyle McDonald


Andriy Gapon wrote:

What do you think about the following feature?

Subdirectory is automatically a new filesystem property - an administrator 
turns
on this magic property of a filesystem, after that every mkdir *in the root* of
that filesystem creates a new filesystem. The new filesystems have
default/inherited properties except for the magic property which is off.

Right now I see this as being mostly useful for /home. Main benefit in this case
is that various user administration tools can work unmodified and do the right
thing when an administrator wants a policy of a separate fs per user
But I am sure that there could be other interesting uses for this.

  
But now that quotas are working properly, Why would you want to continue 
the hack of 1 FS per user?


I'm seriously curious here. In my view it's just more work. A more 
cluttered zfs list, and share output. A lot less straight forward and 
simple too.

Why bother? What's the benefit?

-Kyle


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] feature proposal

2009-07-29 Thread Darren J Moffat


Andre van Eyssen wrote:

On Wed, 29 Jul 2009, Andriy Gapon wrote:

Well, I specifically stated that this property should not be 
recursive, i.e. it

should work only in a root of a filesystem.
When setting this property on a filesystem an administrator should 
carefully set
permissions to make sure that only trusted entities can create 
directories there.


Even limited to the root of a filesystem, it still gives a user the 
ability to consume resources rapidly. While I appreciate the fact that 
it would be restricted by permissions, I can think of a number of usage 
cases where it could suddenly tank a host. One use that might pop up, 
for example, would be cache spools - which often contain *many* 
directories. One runaway and kaboom.


No worse than any other use case, if you can create datasets you can do 
that anyway.  If you aren't running with restrictive resource controls 
you can tank the host in so many easier ways.  Note that the proposal 
is that this be off by default and has to be something you explicitly 
enable.


We generally use hosts now with plenty of RAM and the per-filesystem 
overhead for ZFS doesn't cause much concern. However, on a scratch box, 
try creating a big stack of filesystems - you can end up with a pool 
that consumes so much memory you can't import it!


'rmdir' question requires some thinking, my first reaction is it 
should do zfs

destroy...


.. which will fail if there's a snapshot, for example. The problem seems 
to be reasonably complex - compounded by the fact that many programs 
that create or remove directories do so directly - not by calling 
externals that would be ZFS aware.


I don't understand how you came to that conclusion.  This wouldn't be 
implemented in /usr/bin/mkdir but in the ZFS implementation of the 
mkdir(2) syscall.



--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] feature proposal

2009-07-29 Thread Darren J Moffat


Kyle McDonald wrote:

Andriy Gapon wrote:

What do you think about the following feature?

Subdirectory is automatically a new filesystem property - an 
administrator turns
on this magic property of a filesystem, after that every mkdir *in the 
root* of

that filesystem creates a new filesystem. The new filesystems have
default/inherited properties except for the magic property which is off.

Right now I see this as being mostly useful for /home. Main benefit in 
this case
is that various user administration tools can work unmodified and do 
the right

thing when an administrator wants a policy of a separate fs per user
But I am sure that there could be other interesting uses for this.

  
But now that quotas are working properly, Why would you want to continue 
the hack of 1 FS per user?


hack ?  Different usage cases!


Why bother? What's the benefit?


The benefit is that users can control their own snapshot policy, they 
can create and destroy their own sub datasets, send and recv them etc.

We can also delegate specific properties to users if we want as well.

This is exactly how I have the builds area setup on our ONNV build 
machines for the Solaris security team.Sure the output of zfs list 
is long - but I don't care about that.


When encryption comes along having a separate filesystem per user is an 
useful deployment case because it means we can deploy with separate keys 
for each user (granted may be less interesting if they only access their 
home dir over NFS/CIFS but still useful).  I have a prototype PAM module
that uses the users login password as the ZFS dataset wrapping key and 
keeps that in sync with the users login password on password change.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool export taking hours

2009-07-29 Thread George Wilson


fyleow wrote:

fyleow wrote:

I have a raidz1 tank of 5x 640 GB hard drives on my

newly installed OpenSolaris 2009.06 system. I did a
zpool export tank and the process has been running
for 3 hours now taking up 100% CPU usage.

When I do a zfs list tank it's still shown as

mounted. What's going on here? Should it really be
taking this long?

$ zfs list tank
NAME   USED  AVAIL  REFER  MOUNTPOINT
tank  1.10T  1.19T  36.7K  /tank

$ zpool status tank
  pool: tank
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
tankONLINE   0 0 0
  raidz1ONLINE   0 0 0
c7t0d0  ONLINE   0 0 0
c7t1d0  ONLINE   0 0 0
c7t2d0  ONLINE   0 0 0
c7t3d0  ONLINE   0 0 0
c7t4d0  ONLINE   0 0 0

errors: No known data errors

Can you run the following command and post the
output:

# echo ::pgrep zpool | ::walk thread | ::findstack
-v | mdb -k


Thanks,
George
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discu
ss


Here's what I get

# echo ::pgrep zpool | ::walk thread | ::findstack -v | mdb -k
stack pointer for thread ff00f717b020: ff0003684cf0
  ff0003684d60 restore_mstate+0x129(fb8568ee)


It might be best to generate a live crash dump so we can see what might 
be hanging up. You can also try running the command above multiple times 
and even run 'pstack pid of zpool' to see if we get additional 
information.


Thanks,
George
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] feature proposal

2009-07-29 Thread Nicolas Williams

On Wed, Jul 29, 2009 at 03:35:06PM +0100, Darren J Moffat wrote:
 Andriy Gapon wrote:
 What do you think about the following feature?
 
 Subdirectory is automatically a new filesystem property - an 
 administrator turns
 on this magic property of a filesystem, after that every mkdir *in the 
 root* of
 that filesystem creates a new filesystem. The new filesystems have
 default/inherited properties except for the magic property which is off.
 
 This has been brought up before and I thought there was an open CR for 
 it but I can't find it.

I'd want this to be something one could set per-directory, and I'd want
it to not be inherittable (or to have control over whether it is
inherittable).

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] feature proposal

2009-07-29 Thread Kyle McDonald


Darren J Moffat wrote:

Kyle McDonald wrote:

Andriy Gapon wrote:

What do you think about the following feature?

Subdirectory is automatically a new filesystem property - an 
administrator turns
on this magic property of a filesystem, after that every mkdir *in 
the root* of

that filesystem creates a new filesystem. The new filesystems have
default/inherited properties except for the magic property which is 
off.


Right now I see this as being mostly useful for /home. Main benefit 
in this case
is that various user administration tools can work unmodified and do 
the right

thing when an administrator wants a policy of a separate fs per user
But I am sure that there could be other interesting uses for this.

  
But now that quotas are working properly, Why would you want to 
continue the hack of 1 FS per user?


hack ?  Different usage cases!


Why bother? What's the benefit?


The benefit is that users can control their own snapshot policy, they 
can create and destroy their own sub datasets, send and recv them etc.

We can also delegate specific properties to users if we want as well.

This is exactly how I have the builds area setup on our ONNV build 
machines for the Solaris security team.Sure the output of zfs list 
is long - but I don't care about that.
I can imagine a use for a builds. 1 FS per build - I don't know. But why 
link it to the mkdir? Why not make the build scripts do the zfs create 
out right?


When encryption comes along having a separate filesystem per user is 
an useful deployment case because it means we can deploy with separate 
keys for each user (granted may be less interesting if they only 
access their home dir over NFS/CIFS but still useful).  I have a 
prototype PAM module
that uses the users login password as the ZFS dataset wrapping key and 
keeps that in sync with the users login password on password change.


Encryption is an interesting case. User Snapshots I'd need to think 
about more.

Couldn't the other properties be delegated on directories?

Maybe I'm just getting old. ;) I still think having the zpool not 
automatically include a filesystem, and having ZFS containers was a 
useful concept. And I still use share (and now sharemgr) to manage my 
shares, and not ZFS share. Oh well. :)


 -Kyle



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] feature proposal

2009-07-29 Thread Ross

I can think of a different feature where this would be useful - storing virtual 
machines.

With an automatic 1fs per folder, each virtual machine would be stored in its 
own filesystem, allowing for rapid snapshots, and instant restores of any 
machine.

One big limitation for me of zfs is that although I can restore an entire 
filesystem in seconds, restoring any individual folder takes much, much longer 
as it's treated as a standard copy.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08

2009-07-29 Thread Bob Friesenhahn


On Wed, 29 Jul 2009, Jorgen Lundman wrote:


So, it is slower than the CF test. This is disappointing. Everyone else seems 
to use Intel X25-M, which have a write-speed of 170MB/s (2nd generation) so 
perhaps that is why it works better for them. It is curious that it is slower 
than the CF card. Perhaps because it shares with so many other SATA devices?


Something to be aware of is that not all SSDs are the same.  In fact, 
some faster SSDs may use a RAM write cache (they all do) and then 
ignore a cache sync request while not including hardware/firmware 
support to ensure that the data is persisted if there is power loss. 
Perhaps your fast CF device does that.  If so, that would be really 
bad for zfs if your server was to spontaneously reboot or lose power. 
This is why you really want a true enterprise-capable SSD device for 
your slog.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs send/recv syntax

2009-07-29 Thread Joseph L. Casale

I apologize for replying in the middle of this thread, but I never
saw the initial snapshot syntax of mypool2, which needs to be
recursive (zfs snapshot -r mypo...@snap) to snapshot all the
datasets in mypool2. Then, use zfs send -R to pick up and
restore all the dataset properties.

What was the original snapshot syntax?

Cindy,
You figured it out! I forgot the -r :)
I don't have the room to try to send locally so I reran the ssh
and it's showing what would get transferred with Ian's syntax.

I just ran the following:
zfs send -vR mypo...@snap |ssh j...@host pfexec /usr/sbin/zfs recv -Fdnv 
mypool/zfsname
Looking at the man page, it doesn't explicitly state the behavior I am
noticing but looking at the switch's, I can see a _lot_ of traffic going
from the sending host to the receiving host. Does the -n just not write it
but allow it to be sent? The command has not returned...

Thanks everyone!
jlc
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] feature proposal

2009-07-29 Thread Michael Schuster


On 29.07.09 07:56, Andre van Eyssen wrote:

On Wed, 29 Jul 2009, Mark J Musante wrote:


Yes, if it's local. Just use df -n $path and it'll spit out the 
filesystem type.  If it's mounted over NFS, it'll just say something 
like nfs or autofs, though.


$ df -n /opt
Filesystemkbytesused   avail capacity  Mounted on
/dev/md/dsk/d24  33563061 11252547 2197488434%/opt
$ df -n /sata750
Filesystemkbytesused   avail capacity  Mounted on
sata750  2873622528  77 322671575 1%/sata750

Not giving the filesystem type. It's easy to spot the zfs with the lack 
of recognisable device path, though.




which df are you using?

Michael
--
Michael Schusterhttp://blogs.sun.com/recursion
Recursion, n.: see 'Recursion'
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work

2009-07-29 Thread Richard Elling

On Jul 28, 2009, at 6:34 PM, Eric D. Mudama wrote:

On Mon, Jul 27 at 13:50, Richard Elling wrote:

On Jul 27, 2009, at 10:27 AM, Eric D. Mudama wrote:

Can *someone* please name a single drive+firmware or RAID
controller+firmware that ignores FLUSH CACHE / FLUSH CACHE EXT
commands? Or worse, responds ok when the flush hasn't occurred?

two seconds with google shows
http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=183771NewLang=enHilite=cache+flush

Give it up. These things happen. Not much you can do about it, other
than design around it.
-- richard

That example is a windows-specific, and is a software driver, where
the data integrity feature must be manually disabled by the end user.
The default behavior was always maximum data protection.

I don't think you read the post. It specifically says, Previous
versions

of the Promise drivers ignored the flush cache command until system
power down. Promise makes RAID controllers and has a firmware
fix for this. This is the kind of thing we face: some performance
engineer tries to get an edge by assuming there is only one case
where cache flush matters.

Another 2 seconds with google shows:
http://sunsolve.sun.com/search/document.do?assetkey=1-66-27-1
(interestingly, for this one, fsck also fails)

http://sunsolve.sun.com/search/document.do?assetkey=1-21-103622-06-1

http://forums.seagate.com/stx/board/message?board.id=freeagentmessage.id=5060query.id=3999#M5060

But they also get cache flush code wrong in the opposite direction. A
good example of that is the notorious Seagate 1.5 TB disk stutter
problem.

NB, for the most part, vendors do not air their dirty laundry (eg bug
reports)
on the internet for those without support contracts. If you have a
support

contract, your search may show many more cases.

While perhaps analagous at some level, the perpetual your hardware
must be crappy/cheap/not-as-expensive-as-mine doesn't seem to be a
sufficient explanation when things go wrong, like complete loss of a
pool.

As I said before, it is a systems engineering problem. If you do your
own systems engineering, then you should make sure the components
you select work as you expect.
-- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Tunable iSCSI timeouts - ZFS over iSCSI fix

2009-07-29 Thread Dave

Anyone (Ross?) creating ZFS pools over iSCSI connections will want to 
pay attention to snv_121 which fixes the 3 minute hang after iSCSI disk 
problems:


http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=649

Yay!
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Tunable iSCSI timeouts - ZFS over iSCSI fix

2009-07-29 Thread Ross Smith

Yup, somebody pointed that out to me last week and I can't wait :-)


On Wed, Jul 29, 2009 at 7:48 PM, Davedave-...@dubkat.com wrote:
 Anyone (Ross?) creating ZFS pools over iSCSI connections will want to pay
 attention to snv_121 which fixes the 3 minute hang after iSCSI disk
 problems:

 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=649

 Yay!

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs send/recv syntax

2009-07-29 Thread Ian Collins


Joseph L. Casale wrote:

I apologize for replying in the middle of this thread, but I never
saw the initial snapshot syntax of mypool2, which needs to be
recursive (zfs snapshot -r mypo...@snap) to snapshot all the
datasets in mypool2. Then, use zfs send -R to pick up and
restore all the dataset properties.

What was the original snapshot syntax?



Cindy,
You figured it out! I forgot the -r :)
I don't have the room to try to send locally so I reran the ssh
and it's showing what would get transferred with Ian's syntax.

I just ran the following:
zfs send -vR mypo...@snap |ssh j...@host pfexec /usr/sbin/zfs recv -Fdnv 
mypool/zfsname
Looking at the man page, it doesn't explicitly state the behavior I am
noticing but looking at the switch's, I can see a _lot_ of traffic going
from the sending host to the receiving host. Does the -n just not write it
but allow it to be sent? The command has not returned...

  

Correct, the sending side will be happily sending into a void.

Kill it and re-run without the -n.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] avail drops to 32.1T from 40.8T after create -o mountpoint

2009-07-29 Thread Scott Lawson




Glen Gunselman wrote:

Here is the output from my J4500 with 48 x 1 TB
disks. It is almost the 
exact same configuration as

yours. This is used for Netbackup. As Mario just
pointed out, zpool 
list includes the parity drive

in the space calculation whereas zfs list doesn't.

[r...@xxx /]# zpool status




Scoot,

Thanks for the sample zpool status output.  I will be using the storage for 
NetBackup, also.  (I am booting the X4500 from a SAN - 6140 - and using a SL48 
w/2 LTO4 drives.)

Glen
  

Glen,

If you want any more info about our configuration drop me a line. It 
works ver very well and we have had

no issues at all.

This System is a T5220 (323 GB RAM)with the 48 TB J4500 connected via 
SAS. System also has 3 dual port fibre channel
HBA's feeding 6 LTO4 drives in a 540 slot SL500. The server is 10 gig 
attached straight to our network core routers and
needless to say achieves very high throughput. I have seen it pushing 
the full capacity of the SAS link to the J4500 quite

commonly. This is probably the choke point for this system.

/Scott

--
___


Scott Lawson
Systems Architect
Manukau Institute of Technology
Information Communication Technology Services Private Bag 94006 Manukau
City Auckland New Zealand

Phone  : +64 09 968 7611
Fax: +64 09 968 7641
Mobile : +64 27 568 7611

mailto:sc...@manukau.ac.nz

http://www.manukau.ac.nz




perl -e 'print
$i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);'

 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Install and boot from USB stick?

2009-07-29 Thread tore

Hello,

Ive tried to find any hard information on how to install, and boot, opensolaris 
from a USB stick. Ive seen a few people written a few sucessfull stories about 
this, but I cant seem to get it to work.

The procedure:
Boot from LiveCD, insert USB drive, find it using `format', start installer. 
The USB stick it not found (just stands on Finding disks). Remove USB stick, 
hit back in installer, insert USB stick again, USB stick found, start 
installing.

At 19%, it just stands there. Have no idea why.

Suggestions?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] feature proposal

2009-07-29 Thread Andriy Gapon

on 29/07/2009 17:52 Andre van Eyssen said the following:
 On Wed, 29 Jul 2009, Andriy Gapon wrote:
 
 Well, I specifically stated that this property should not be
 recursive, i.e. it
 should work only in a root of a filesystem.
 When setting this property on a filesystem an administrator should
 carefully set
 permissions to make sure that only trusted entities can create
 directories there.
 
 Even limited to the root of a filesystem, it still gives a user the
 ability to consume resources rapidly. While I appreciate the fact that
 it would be restricted by permissions, I can think of a number of usage
 cases where it could suddenly tank a host. One use that might pop up,
 for example, would be cache spools - which often contain *many*
 directories. One runaway and kaboom.

Well, the feature would not be on by default.
So careful evaluation and planning should prevent abuses.

 We generally use hosts now with plenty of RAM and the per-filesystem
 overhead for ZFS doesn't cause much concern. However, on a scratch box,
 try creating a big stack of filesystems - you can end up with a pool
 that consumes so much memory you can't import it!
 
 'rmdir' question requires some thinking, my first reaction is it
 should do zfs
 destroy...
 
 .. which will fail if there's a snapshot, for example. The problem seems
 to be reasonably complex - compounded by the fact that many programs
 that create or remove directories do so directly - not by calling
 externals that would be ZFS aware.

Well, snapshots could be destroyed too, nothing stops from doing that.
BTW, I am not proposing to implement this feature in mkdir/rmdir userland 
utility,
I am proposing to implement the feature in ZFS kernel code responsible for
directory creation/removal.

-- 
Andriy Gapon
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] feature proposal

2009-07-29 Thread Roman V Shaposhnik

On Wed, 2009-07-29 at 15:06 +0300, Andriy Gapon wrote:
 What do you think about the following feature?
 
 Subdirectory is automatically a new filesystem property - an administrator 
 turns
 on this magic property of a filesystem, after that every mkdir *in the root* 
 of
 that filesystem creates a new filesystem. The new filesystems have
 default/inherited properties except for the magic property which is off.
 
 Right now I see this as being mostly useful for /home. Main benefit in this 
 case
 is that various user administration tools can work unmodified and do the right
 thing when an administrator wants a policy of a separate fs per user
 But I am sure that there could be other interesting uses for this.

This feature request touches upon a very generic observation that my
group made a long time ago: ZFS is a wonderful filesystem, the only
trouble is that (almost) all the cool features have to be asked for
using non-filesystem (POSIX) APIs. Basically everytime you have
to do anything with ZFS you have to do it on a host where ZFS runs.

The sole exception from this rule is .zfs subdirectory that lets you
have access to snapshots without explicit calls to zfs(1M). 

Basically .zfs subdirectory is your POSIX FS way to request two bits
of ZFS functionality. In general, however, we all want more.

On the read-only front: wouldn't it be cool to *not* run zfs sends 
explicitly but have:
.zfs/send/snap name
.zfs/sendr/from-snap-name-to-snap-name
give you the same data automagically? 

On the read-write front: wouldn't it be cool to be able to snapshot
things by:
$ mkdir .zfs/snapshot/snap-name
?

The list goes on...

Thanks,
Roman.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] cleaning up cloned zones

2009-07-29 Thread Anil

I create a couple of zones. I have a zone path like this:

r...@vps1:~# zfs list -r zones/cars
NAME  USED  AVAIL  REFER  MOUNTPOINT
zones/fans   1.22G  3.78G22K  /zones/fans
zones/fans/ROOT  1.22G  3.78G19K  legacy
zones/fans/ROOT/zbe  1.22G  3.78G  1.22G  legacy

I then upgrade the global zone, this creates the zfs clones/snapshots for the 
zones:

r...@vps1:~# zfs list -r zones/fans
NAMEUSED  AVAIL  REFER  MOUNTPOINT
zones/fans 4.78G  5.22G22K  /zones/fans
zones/fans/ROOT4.78G  5.22G19K  legacy
zones/fans/ROOT/zbe2.64G  5.22G  2.64G  legacy
zones/fans/ROOT/zbe-1  2.13G  5.22G  3.99G  legacy

I create a couple of new zones, the mounted zfs tree looks like this:

r...@vps1:~# zfs list -r zones/cars
NAME  USED  AVAIL  REFER  MOUNTPOINT
zones/cars   1.22G  3.78G22K  /zones/cars
zones/cars/ROOT  1.22G  3.78G19K  legacy
zones/cars/ROOT/zbe  1.22G  3.78G  1.22G  legacy


So, now the problem is, I have some zones that have a zbe-1 and some that have 
a zfs clone with just zbe name.

After making sure everything works for a month now, I want to clean up that. I 
want to promote all of them to be just zbe. I understand I won't be able to 
revert back to original zone bits, but I could have 40+ zones on this system, 
and I prefer them all to be consistent looking.

Here is a full hierarchy now:
r...@vps1:~# zfs get -r mounted,origin,mountpoint zones/fans
NAME   PROPERTYVALUE  SOURCE
zones/fans mounted yes-
zones/fans origin  -  -
zones/fans mountpoint  /zones/fansdefault
zones/fans/ROOTmounted no -
zones/fans/ROOTorigin  -  -
zones/fans/ROOTmountpoint  legacy local
zones/fans/ROOT/zbemounted no -
zones/fans/ROOT/zbeorigin  -  -
zones/fans/ROOT/zbemountpoint  legacy local
zones/fans/ROOT/z...@zbe-1  mounted -  -
zones/fans/ROOT/z...@zbe-1  origin  -  -
zones/fans/ROOT/z...@zbe-1  mountpoint  -  -
zones/fans/ROOT/zbe-1  mounted yes-
zones/fans/ROOT/zbe-1  origin  zones/fans/ROOT/z...@zbe-1  -
zones/fans/ROOT/zbe-1  mountpoint  legacy local


How do I go about renaming and destroying the original zbe fs? I believe this 
will involve me to promote the zbe-1 and then destroy zbe followed by renaming 
zbe-1 to zbe. But this is a live system, I don't have something to play with 
first. Any tips?

Thanks!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] cleaning up cloned zones

2009-07-29 Thread Edward Pilatowicz

hey anil,

given that things work, i'd recommend leaving them alone.

if you really want to insist on cleaning things up aesthetically
then you need to do multiple zfs operation and you'll need to shutdown
the zones.

assuming you haven't cloned any zones (because if you did that
complicates things), you could do:

- shutdown your zones

- zfs promote of the latest zbe

- destroy all of the new snapshots of the promoted zbe (and the
  old zbe filesystems which are now dependants of those snapshots.)

- a rename of the promoted zbe to whatever name you want to
  standardize on.

note that i haven't tested any of this, but in theory it should work.

it may be the case that some of the zfs operations above may fail due to
the zoned bit being set for zbes.  if this is the case then you'll need
to clear the zoned bit, do the operations, and then reset the zoned bit.

please don't come crying to me if this doesn't work.  ;)

ed


On Wed, Jul 29, 2009 at 07:44:37PM -0700, Anil wrote:
 I create a couple of zones. I have a zone path like this:
 
 r...@vps1:~# zfs list -r zones/cars
 NAME  USED  AVAIL  REFER  MOUNTPOINT
 zones/fans   1.22G  3.78G22K  /zones/fans
 zones/fans/ROOT  1.22G  3.78G19K  legacy
 zones/fans/ROOT/zbe  1.22G  3.78G  1.22G  legacy
 
 I then upgrade the global zone, this creates the zfs clones/snapshots for the 
 zones:
 
 r...@vps1:~# zfs list -r zones/fans
 NAMEUSED  AVAIL  REFER  MOUNTPOINT
 zones/fans 4.78G  5.22G22K  /zones/fans
 zones/fans/ROOT4.78G  5.22G19K  legacy
 zones/fans/ROOT/zbe2.64G  5.22G  2.64G  legacy
 zones/fans/ROOT/zbe-1  2.13G  5.22G  3.99G  legacy
 
 I create a couple of new zones, the mounted zfs tree looks like this:
 
 r...@vps1:~# zfs list -r zones/cars
 NAME  USED  AVAIL  REFER  MOUNTPOINT
 zones/cars   1.22G  3.78G22K  /zones/cars
 zones/cars/ROOT  1.22G  3.78G19K  legacy
 zones/cars/ROOT/zbe  1.22G  3.78G  1.22G  legacy
 
 
 So, now the problem is, I have some zones that have a zbe-1 and some that 
 have a zfs clone with just zbe name.
 
 After making sure everything works for a month now, I want to clean up that. 
 I want to promote all of them to be just zbe. I understand I won't be able to 
 revert back to original zone bits, but I could have 40+ zones on this system, 
 and I prefer them all to be consistent looking.
 
 Here is a full hierarchy now:
 r...@vps1:~# zfs get -r mounted,origin,mountpoint zones/fans
 NAME   PROPERTYVALUE  SOURCE
 zones/fans mounted yes-
 zones/fans origin  -  -
 zones/fans mountpoint  /zones/fansdefault
 zones/fans/ROOTmounted no -
 zones/fans/ROOTorigin  -  -
 zones/fans/ROOTmountpoint  legacy local
 zones/fans/ROOT/zbemounted no -
 zones/fans/ROOT/zbeorigin  -  -
 zones/fans/ROOT/zbemountpoint  legacy local
 zones/fans/ROOT/z...@zbe-1  mounted -  -
 zones/fans/ROOT/z...@zbe-1  origin  -  -
 zones/fans/ROOT/z...@zbe-1  mountpoint  -  -
 zones/fans/ROOT/zbe-1  mounted yes-
 zones/fans/ROOT/zbe-1  origin  zones/fans/ROOT/z...@zbe-1  -
 zones/fans/ROOT/zbe-1  mountpoint  legacy local
 
 
 How do I go about renaming and destroying the original zbe fs? I believe this 
 will involve me to promote the zbe-1 and then destroy zbe followed by 
 renaming zbe-1 to zbe. But this is a live system, I don't have something to 
 play with first. Any tips?
 
 Thanks!
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Solaris10+ and Online Media services

2009-07-29 Thread Brandon Barker

It seems like a lot of meida services are starting to catch on about ZFS.  I 
knew Last.fm makes use of it, and I also found out grooveshark (see this blog: 
http://www.facebook.com/notes.php?id=7354446700start=200hash=fb219332a992a64f12d200435b3d24f2
 ).

Grooveshark looks nice for end users as well (which I was I sent it to 
desktop-discuss; I may start using it also for songs I own that I cannot listen 
to on Last.fm.

Unfortunately, I couldn't upload mp3s or play the available songs in 
OpenSolaris (the software is a java applet).  Did anyone have better luck?

Do you know of other media services that have big investments in Solaris or ZFS?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] feature proposal

2009-07-29 Thread Pawel Jakub Dawidek

On Wed, Jul 29, 2009 at 05:34:53PM -0700, Roman V Shaposhnik wrote:
 On Wed, 2009-07-29 at 15:06 +0300, Andriy Gapon wrote:
  What do you think about the following feature?
  
  Subdirectory is automatically a new filesystem property - an 
  administrator turns
  on this magic property of a filesystem, after that every mkdir *in the 
  root* of
  that filesystem creates a new filesystem. The new filesystems have
  default/inherited properties except for the magic property which is off.
  
  Right now I see this as being mostly useful for /home. Main benefit in this 
  case
  is that various user administration tools can work unmodified and do the 
  right
  thing when an administrator wants a policy of a separate fs per user
  But I am sure that there could be other interesting uses for this.
 
 This feature request touches upon a very generic observation that my
 group made a long time ago: ZFS is a wonderful filesystem, the only
 trouble is that (almost) all the cool features have to be asked for
 using non-filesystem (POSIX) APIs. Basically everytime you have
 to do anything with ZFS you have to do it on a host where ZFS runs.
 
 The sole exception from this rule is .zfs subdirectory that lets you
 have access to snapshots without explicit calls to zfs(1M). 
 
 Basically .zfs subdirectory is your POSIX FS way to request two bits
 of ZFS functionality. In general, however, we all want more.
 
 On the read-only front: wouldn't it be cool to *not* run zfs sends 
 explicitly but have:
 .zfs/send/snap name
 .zfs/sendr/from-snap-name-to-snap-name
 give you the same data automagically? 
 
 On the read-write front: wouldn't it be cool to be able to snapshot
 things by:
 $ mkdir .zfs/snapshot/snap-name
 ?

Are you sure this doesn't work on Solaris/OpenSolaris? From looking at
the code you should be able to do exactly that as well as destroy
snapshot by rmdir'ing this entry.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpZJahRvw8OH.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

53 matches

Mail list logo