[zfs-discuss] How to set zfs:zfs_recover=1 and aok=1 in GRUB at startup?

2010-09-14 Thread Stephan Ferraro
I can't edit now my /etc/system file because system is not booting.
Is there a way to force this parameters to Solaris kernel on booting with Grub?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to set zfs:zfs_recover=1 and aok=1 in GRUB at startup?

2010-09-14 Thread Stephan Ferraro
If I launch opensolaris with -kd I'm able to do this:
aok/W 1

but if I type:
zfs_recover/W 1

then I get an unkown symbol name error.

Any idea how I could force this variables?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to set zfs:zfs_recover=1 and aok=1 in GRUB at startup?

2010-09-14 Thread Stephan Ferraro
when I execute
::load zfs

I get kernel panic because of this $...@#(*($...@# space_map_add problem.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver = defrag?

2010-09-14 Thread Edward Ned Harvey
 From: Haudy Kazemi [mailto:kaze0...@umn.edu]
 
 With regard to multiuser systems and how that negates the need to
 defragment, I think that is only partially true.  As long as the files
 are defragmented enough so that each particular read request only
 requires one seek before it is time to service the next read request,
 further defragmentation may offer only marginal benefit.  On the other

Here's a great way to quantify how much fragmentation is acceptable:

Suppose you want to ensure at least 99% efficiency of the drive.  At most 1%
time wasted by seeking.
Suppose you're talking about 7200rpm sata drives, which sustain 500Mbit/s
transfer, and have average seek time 8ms.

8ms is 1% of 800ms.
In 800ms, the drive could read 400 Mbit of sequential data.
That's 40 MB

So as long as the fragment size of your files are approx 40 MB or larger,
then fragmentation has a negligible effect on performance.  One seek per
every 40MB read/written will yield less than 1% performance impact.

For the heck of it, let's see how that would have computed with 15krpm SAS
drives.
Sustained transfer 1Gbit/s, and average seek 3.5ms
3.5ms is 1% of 350ms
In 350ms, the drive could read 350 Mbit (call it 43MB)

That's certainly in the ballpark of 40 MB.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver = defrag?

2010-09-14 Thread Edward Ned Harvey
 From: Richard Elling [mailto:rich...@nexenta.com]
  With appropriate write caching and grouping or re-ordering of writes
 algorithms, it should be possible to minimize the amount of file
 interleaving and fragmentation on write that takes place.
 
 To some degree, ZFS already does this.  The dynamic block sizing tries
 to ensure
 that a file is written into the largest block[1]

Yes, but the block sizes in question are typically up to 128K.
As computed in my email 1 minute ago ... The fragment size needs to be on
the order of 40 MB in order to effectively eliminate performance loss of
fragmentation.


 Also, ZFS has an intelligent prefetch algorithm that can hide some
 performance
 aspects of defragmentation on HDDs.

Unfortunately, prefetch can only hide fragmentation on systems that have
idle disk time.  Prefetch isn't going to help you if you actually need to
transfer a whole file as fast as possible.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver = defrag?

2010-09-14 Thread Marty Scholes
Richard Elling wote:
 Define fragmentation?

Maybe this is the wrong thread.  I have noticed that an old pool can take 4 
hours to scrub, with a large portion of the time reading from the pool disks at 
the rate of 150+ MB/s but zpool iostat reports 2 MB/s read speed.  My naive 
interpretation is that the data scrub is looking for has become fragmented.

Should I refresh the pool by zfs sending it to another pool then zfs receiving 
the data back again, the same scrub can take less than an hour with zpool 
iostat reporting more sane throughput.

On an old pool which had lots of snapshots come and go, the scrub throughput is 
awful.  On that same data, refreshed via zfs send/receive, the throughput much 
better.

It would appear to me that this is an artifact of fragmentation, although I 
have nothing scientific on which to base this.  Additional unscientific 
observations leads me to believe these same refreshed pools also perform 
better for non-scrub activities.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] dedicated ZIL/L2ARC

2010-09-14 Thread Wolfraider
We are looking into the possibility of adding a dedicated ZIL and/or L2ARC 
devices to our pool. We are looking into getting 4 – 32GB  Intel X25-E SSD 
drives. Would this be a good solution to slow write speeds? We are currently 
sharing out different slices of the pool to windows servers using comstar and 
fibrechannel. We are currently getting around 300MB/sec performance with 
70-100% disk busy.

Opensolaris snv_134
Dual 3.2GHz quadcores with hyperthreading
16GB ram
Pool_1 – 18 raidz2 groups with 5 drives a piece and 2 hot spares
Disks are around 30% full
No dedup
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] dedicated ZIL/L2ARC

2010-09-14 Thread Ray Van Dolson
On Tue, Sep 14, 2010 at 06:59:07AM -0700, Wolfraider wrote:
 We are looking into the possibility of adding a dedicated ZIL and/or
 L2ARC devices to our pool. We are looking into getting 4 – 32GB
 Intel X25-E SSD drives. Would this be a good solution to slow write
 speeds? We are currently sharing out different slices of the pool to
 windows servers using comstar and fibrechannel. We are currently
 getting around 300MB/sec performance with 70-100% disk busy.
 
 Opensolaris snv_134
 Dual 3.2GHz quadcores with hyperthreading
 16GB ram
 Pool_1 – 18 raidz2 groups with 5 drives a piece and 2 hot spares
 Disks are around 30% full
 No dedup

It'll probably help.

I'd get two X-25E's for ZIL (and mirror them) and one or two of Intel's
lower end X-25M for L2ARC.

There are some SSD devices out there with a super-capacitor and
significantly higher IOPs ratings than the X-25E that might be a better
choice for a ZIL device, but the X-25E is a solid drive and we have
many of them deployed as ZIL devices here.

Ray
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to set zfs:zfs_recover=1 and aok=1 in GRUB at startup?

2010-09-14 Thread Stephan Ferraro
Here is the solution (thanks to Gavin Maltby from mdb forum):

Boot with -kd option to enter in kmdb and type the following commands:
aok/W 1
::bp zfs`zfs_panic_recover
:c

wait that it stops at breakpoint then type this:
zfs_recover/W1
:z
:c
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] dedicated ZIL/L2ARC

2010-09-14 Thread Wolfraider
Cool, we can get the Intel X25-E's for around $300 a piece from HP with the 
sled. I don't see the X25-M available so we will look at 4 of the X25-E's.

Thanks :)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] dedicated ZIL/L2ARC

2010-09-14 Thread Pasi Kärkkäinen
On Tue, Sep 14, 2010 at 08:08:42AM -0700, Ray Van Dolson wrote:
 On Tue, Sep 14, 2010 at 06:59:07AM -0700, Wolfraider wrote:
  We are looking into the possibility of adding a dedicated ZIL and/or
  L2ARC devices to our pool. We are looking into getting 4 ??? 32GB
  Intel X25-E SSD drives. Would this be a good solution to slow write
  speeds? We are currently sharing out different slices of the pool to
  windows servers using comstar and fibrechannel. We are currently
  getting around 300MB/sec performance with 70-100% disk busy.
  
  Opensolaris snv_134
  Dual 3.2GHz quadcores with hyperthreading
  16GB ram
  Pool_1 ??? 18 raidz2 groups with 5 drives a piece and 2 hot spares
  Disks are around 30% full
  No dedup
 
 It'll probably help.
 
 I'd get two X-25E's for ZIL (and mirror them) and one or two of Intel's
 lower end X-25M for L2ARC.
 
 There are some SSD devices out there with a super-capacitor and
 significantly higher IOPs ratings than the X-25E that might be a better
 choice for a ZIL device, but the X-25E is a solid drive and we have
 many of them deployed as ZIL devices here.
 

I thought Intel SSDs didn't respect CACHE FLUSH command and thus
are subject to ZIL corruption if the server crashes or runs out of electricity?

-- Pasi

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] What is the 1000 bit?

2010-09-14 Thread Linder, Doug
I recently created a test zpool (RAIDZ) on some iSCSI shares.  I made a few 
test directories and files.  When I do a listing, I see something I've never 
seen before:

[r...@hostname anewdir] # ls -la
total 6160
drwxr-xr-x   2 root other  4 Sep 14 14:16 .
drwxr-xr-x   4 root root   5 Sep 14 15:04 ..
-rw--T   1 root other2097152 Sep 14 14:16 barfile1
-rw--T   1 root other1048576 Sep 14 14:16 foofile1

I looked up the T bit in the man page for ls, and it says that T means  
The 1000 bit is turned on, and execution is off (undefined bit-state).  Which 
is as clear as mud.

I've googled around a lot but still can't find any real info about what this 
menas.  I've been doing unix for a long time and have never seen it.  Can 
anyone explain, or at least tell me if I should worry?

Thanks.
--
Learn more about Merchant Link at www.merchantlink.com.

THIS MESSAGE IS CONFIDENTIAL.  This e-mail message and any attachments are 
proprietary and confidential information intended only for the use of the 
recipient(s) named above.  If you are not the intended recipient, you may not 
print, distribute, or copy this message or any attachments.  If you have 
received this communication in error, please notify the sender by return e-mail 
and delete this message and any attachments from your computer.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What is the 1000 bit?

2010-09-14 Thread Nicolas Williams
On Tue, Sep 14, 2010 at 04:13:31PM -0400, Linder, Doug wrote:
 I recently created a test zpool (RAIDZ) on some iSCSI shares.  I made
 a few test directories and files.  When I do a listing, I see
 something I've never seen before:
 
 [r...@hostname anewdir] # ls -la
 total 6160
 drwxr-xr-x   2 root other  4 Sep 14 14:16 .
 drwxr-xr-x   4 root root   5 Sep 14 15:04 ..
 -rw--T   1 root other2097152 Sep 14 14:16 barfile1
 -rw--T   1 root other1048576 Sep 14 14:16 foofile1
 
 I looked up the T bit in the man page for ls, and it says that T
 means  The 1000 bit is turned on, and execution is off (undefined
 bit-state).  Which is as clear as mud.

It's the sticky bit.  Nowadays it's only useful on directories, and
really it's generally only used with 777 permissions.  The chmod(1) (man
-M/usr/man chmod) and chmod(2) (man -s 2 chmod)  manpages describe the
sticky bit.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver = defrag?

2010-09-14 Thread David Dyer-Bennet
The difference between multi-user thinking and single-user thinking is
really quite dramatic in this area.  I came up the time-sharing side
(PDP-8, PDP-11, DECSYSTEM-20); TOPS-20 didn't have any sort of disk
defragmenter, and nobody thought one was particularly desirable, because
the normal access pattern of a busy system was spread all across the disk
packs anyway.

On a desktop workstation, it makes some sense to think about loading big
executable files fast -- that's something the user is sitting there
waiting for, and there's often nothing else going on at that exact moment.
 (There *could* be significant things happening in the background, but
quite often there aren't.)  Similarly, loading a big document
(single-file book manuscript, bitmap image, or whatever) happens at a
point where the user has requested it and is waiting for it right then,
and there's mostly nothing else going on.

But on really shared disk space (either on a timesharing system, or a
network file server serving a good-sized user base), the user is competing
for disk activity (either bandwidth or IOPs, depending on the access
pattern of the users).  Generally you don't get to load your big DLL in
one read -- and to the extent that you don't, it doesn't matter much how
it's spread around the disk, because the head won't be in the same spot
when you get your turn again.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Unwanted filesystem mounting when using send/recv

2010-09-14 Thread Peter Jeremy
I am looking at backing up my fileserver by replicating the
filesystems onto an external disk using send/recv with something
similar to:
  zfs send ... myp...@snapshot | zfs recv -d backup
but have run into a bit of a gotcha with the mountpoint property:
- If I use zfs send -R ... then the mountpoint gets replicated and
  the backup gets mounted over the top of my real filesystems.
- If I skip the '-R' then none of the properties get backed up.

Is there some way to have zfs recv not automatically mount filesystems
when it creates them?

-- 
Peter Jeremy


pgpOliK2tC1Vs.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Unwanted filesystem mounting when using send/recv

2010-09-14 Thread Ian Collins

On 09/15/10 12:56 PM, Peter Jeremy wrote:

I am looking at backing up my fileserver by replicating the
filesystems onto an external disk using send/recv with something
similar to:
   zfs send ... myp...@snapshot | zfs recv -d backup
but have run into a bit of a gotcha with the mountpoint property:
- If I use zfs send -R ... then the mountpoint gets replicated and
   the backup gets mounted over the top of my real filesystems.
- If I skip the '-R' then none of the properties get backed up.

Is there some way to have zfs recv not automatically mount filesystems
when it creates them?

   

Use -u with zfs receive.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Unwanted filesystem mounting when using send/recv

2010-09-14 Thread Xin LI
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On 2010/09/14 17:56, Peter Jeremy wrote:
 I am looking at backing up my fileserver by replicating the
 filesystems onto an external disk using send/recv with something
 similar to:
   zfs send ... myp...@snapshot | zfs recv -d backup
 but have run into a bit of a gotcha with the mountpoint property:
 - If I use zfs send -R ... then the mountpoint gets replicated and
   the backup gets mounted over the top of my real filesystems.
 - If I skip the '-R' then none of the properties get backed up.
 
 Is there some way to have zfs recv not automatically mount filesystems
 when it creates them?

zfs receive have a '-u' option to specify that no mount should be done.

By the way it might be a good idea not to specify mountpoint at the
sending site, which makes replication easier (one way is to have the
topmost layer mountpoint=/ but canmount=off).

Cheers,
- -- 
Xin LI delp...@delphij.nethttp://www.delphij.net/
FreeBSD - The Power to Serve!  Live free or die
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.16 (FreeBSD)

iQEcBAEBCAAGBQJMkB5AAAoJEATO+BI/yjfB1QkIAIJ8IsZYkdUAH5ciqzVN/JPM
Kvoc4Thk2YyVixBlh7ev3q40+EHOKRxr3GtNXNBjN6K3YqcKlVXVWK4ntU08RnwL
f5bm2JQJoDjA/z2J+mDVmtsbI4kG9TavaOou9f7Bek9ql/UFowH48dMFTf0klR/3
/S1GtLoLma3eCOwxKPjy1gEj+EcxXB2C6Ip116y1MnNxlGXe80i+tGVFfAAwO416
1EGDJcvs3wDvU9s1/F9VZS4LSadEVOUkWfSLKa8toaB8GKhwWNIP0ZK2jSxPFRyN
PryJOgE+N8tBGEce4TtMtouZ8wM/dPL0dB86YFk4OjAkkx4uNoY7PhWLMMveCdY=
=TvJl
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] dedicated ZIL/L2ARC

2010-09-14 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Wolfraider
 
 We are looking into the possibility of adding a dedicated ZIL and/or
 L2ARC devices to our pool. We are looking into getting 4 – 32GB  Intel
 X25-E SSD drives. Would this be a good solution to slow write speeds?

If you have slow write speeds, a dedicated log device might help.  (log devices 
are for writes, not for reads.)

It sounds like your machine is an iscsi target.  In which case, you're 
certainly doing a lot of sync writes, and therefore hitting your ZIL hard.  So 
it's all but certain adding dedicated log devices will help.

One thing to be aware of:  Once you add dedicated log, *all* of your sync 
writes will hit that log device.  While a single SSD or pair of SSD's will have 
fast IOPS, they can easily become a new bottleneck with worse performance than 
what you had before ... If you've got 80 spindle disks now, and by any chance, 
you perform sequential sync writes, then a single pair of SSD's won't compete.  
I'd suggest adding several SSD's for log devices, and no mirroring.  Perhaps 
one SSD for every raidz2 vdev, or every other, or every third, depending on 
what you can afford.

If you have slow reads, l2arc cache might help.  (cache devices are for read, 
not write.)


 We are currently sharing out different slices of the pool to windows
 servers using comstar and fibrechannel. We are currently getting around
 300MB/sec performance with 70-100% disk busy.

You may be facing some other problem, aside from just having cache/log devices. 
 I suggest giving us some more detail here.  Such as ...  

Large sequential operations are good on raidz2.  But if you're performing 
random IO, that performs pretty poor on raidz2.

What sort of network are you using?  I know you said comstar and 
fibrechannel, and sharing slices to windows ... I assume this means you're 
doing iscsi, right?  Dual 4Gbit links per server?  You're getting 2.4 Gbit and 
you expect what?

You have a pool made up of 18 raidz2 vdev's with 5 drives each (capacity of 3 
disks each) ... Is each vdev on its own bus?  What type of bus is it?  
(Generally speaking, it is preferable to spread vdev's across buses, instead of 
making 1vdev on 1 bus, for reliability purposes) ...  How many disks, of what 
type, on each bus?  What type of bus, at what speed?

What are the usage characteristics, how are you making your measurement?


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss