[zfs-discuss] problem with zpool import - zil and cache drive are not displayed?

2010-08-03 Thread Darren Taylor
I'm at a loss, I've managed to get myself into a fix. I'm not sure where the 
problem is, but essentially i have a zpool i cannot import. This particular 
pool used to have a two drives (not shown below), one for cache and another for 
log. I'm unsure why they are no longer detected on zpool import...  the disks 
are still connected to the system and show up when running format for a list. 

dar...@lexx:~# zpool import
  pool: tank
id: 15136317365944618902
 state: UNAVAIL
status: The pool was last accessed by another system.
action: The pool cannot be imported due to damaged devices or data.
   see: http://www.sun.com/msg/ZFS-8000-EY
config:

tank UNAVAIL  missing device
  raidz1-0   ONLINE
c6t4d0   ONLINE
c6t5d0   ONLINE
c6t6d0   ONLINE
c6t7d0   ONLINE
  raidz1-1   ONLINE
c6t0d0   ONLINE
c6t1d0   ONLINE
c6t2d0   ONLINE
c6t3d0   ONLINE
dar...@lexx:~# 

The above disks are the data disks which appear to be online without issue. i 
was running version 22 on this pool. 

Any help appreciated
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] pool scrub clean, filesystem broken

2010-08-03 Thread Brian Merrell
Cindy,

Thanks for the quick response.  Consulting ZFS history I note the following
actions:

imported my three disk raid-z pool originally created on the most recent
version of OpenSolaris but now running NexantaStor 3.03
upgraded my pool
destroyed two file systems I was no longer using (neither of these were of
course the file system at issue)
destroyed a snapshot on another filesystem
played around with permissions (these were my only actions directly on the
file system)

None of these actions seemed to have a negative impact on the filesystem and
it was working well when I gracefully shutdown (to physically move the
computer).

I am a bit at a loss.  With copy-on-write and a clean pool how can I have
corruption?

-brian



On Mon, Aug 2, 2010 at 12:52 PM, Cindy Swearingen 
cindy.swearin...@oracle.com wrote:

 Brian,

 You might try using zpool history -il to see what ZFS operations,
 if any, might have lead up to this problem.

 If zpool history doesn't provide any clues, then what other
 operations might have occurred prior to this state?

 It looks like something trappled this file system...

 Thanks,

 Cindy

 On 08/02/10 10:26, Brian wrote:

 Thanks Preston.  I am actually using ZFS locally, connected directly to 3
 sata drives in a raid-z pool. The filesystem is ZFS and it mounts without
 complaint and the pool is clean.  I am at a loss as to what is happening.
 -brian




-- 
Brian Merrell, Director of Technology
Backstop LLP
1455 Pennsylvania Ave., N.W.
Suite 400
Washington, D.C.  20004
202-628-BACK (2225)
merre...@backstopllp.com
www.backstopllp.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] iScsi slow

2010-08-03 Thread Roch Bourbonnais

Le 27 mai 2010 à 07:03, Brent Jones a écrit :

 On Wed, May 26, 2010 at 5:08 AM, Matt Connolly
 matt.connolly...@gmail.com wrote:
 I've set up an iScsi volume on OpenSolaris (snv_134) with these commands:
 
 sh-4.0# zfs create rpool/iscsi
 sh-4.0# zfs set shareiscsi=on rpool/iscsi
 sh-4.0# zfs create -s -V 10g rpool/iscsi/test
 
 The underlying zpool is a mirror of two SATA drives. I'm connecting from a 
 Mac client with global SAN initiator software, connected via Gigabit LAN. It 
 connects fine, and I've initialiased a mac format volume on that iScsi 
 volume.
 
 Performance, however, is terribly slow, about 10 times slower than an SMB 
 share on the same pool. I expected it would be very similar, if not faster 
 than SMB.
 
 Here's my test results copying 3GB data:
 
 iScsi:  44m01s  1.185MB/s
 SMB share:  4m2711.73MB/s
 
 Reading (the same 3GB) is also worse than SMB, but only by a factor of about 
 3:
 
 iScsi:  4m3611.34MB/s
 SMB share:  1m4529.81MB/s
 

cleaning up some old mail 

Not unexpected. Filesystems have readahead code to prefetch enough to cover the 
latency of the read request. iSCSI only responds to the request.
Put a filesystem on top of iscsi and try again.

For writes, iSCSI is synchronous and SMB is not. 

-r



 
 Is there something obvious I've missed here?
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 
 Try jumbo frames, and making sure flow control is enabled on your
 iSCSI switches and all network cards
 
 -- 
 Brent Jones
 br...@servuhome.net
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] When is the L2ARC refreshed if on a separate drive?

2010-08-03 Thread valrh...@gmail.com
I'm running a mirrored pair of 2 TB SATA drives as my data storage drives on my 
home workstation, a Core i7-based machine with 10 GB of RAM. I recently added a 
sandforce-based 60 GB SSD (OCZ Vertex 2, NOT the pro version) as an L2ARC to 
the single mirrored pair. I'm running B134, with ZFS pool version 22, with 
dedup enabled. If I understand correctly, the dedup table should be in the 
L2ARC on the SSD, and I should have enough RAM to keep the references to that 
table in memory, and that this is therefore a well-performing solution.

My question is what happens at power off. Does the cache device essentially get 
cleared, and the machine has to rebuild it when it boots? Or is it persistent. 
That is, should performance improve after a little while following a reboot, or 
is it always constant once it builds the L2ARC once? 

Rather informally, it sometimes seems that the hard drives are a bit slower the 
first time they load a program now, vs. when I didn't have the SSD installed as 
a cache device on the pool. But this is mainly an impression. Thanks for your 
help!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Using multiple logs on single SSD devices

2010-08-03 Thread Roy Sigurd Karlsbakk





- Second question, how about this: partition the two X25E drives into two, and 
then mirror each half of each drive as log devices for each pool. Am I missing 
something with this scheme? On boot, will the GUID for each pool get found by 
the system from the partitioned log drives? IIRC several posts in here, some by 
Cindy, have been about using devices shared among pools, and what's said is 
that this is not recommended because of potential deadlocks. If I were you, I'd 
get another couple of SSDs for the new pool. 

Vennlige hilsener / Best regards 

roy 
-- 
Roy Sigurd Karlsbakk 
(+47) 97542685 
r...@karlsbakk.net 
http://blogg.karlsbakk.net/ 
-- 
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk. 


-- 
Vennlige hilsener / Best regards 

roy 
-- 
Roy Sigurd Karlsbakk 
(+47) 97542685 
r...@karlsbakk.net 
http://blogg.karlsbakk.net/ 
-- 
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk. 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] LTFS and LTO-5 Tape Drives

2010-08-03 Thread valrh...@gmail.com
Has anyone looked into the new LTFS on LTO-5 for tape backups? Any idea how 
this would work with ZFS? I'm presuming ZFS send / receive are not going to 
work. But it seems rather appealing to have the metadata properly with the 
data, and being able to browse files directly instead of having to rely on 
backup software, however nice tar may be. Has anyone used this with 
OpenSolaris, or have an opinion on how this would work in practice? Thanks!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] problem with zpool import - zil and cache drive are not displayed?

2010-08-03 Thread George Wilson

Darren,

It looks like you've lost your log device. The newly integrated missing 
log support will help once it's available. In the meantime, you should 
run 'zdb -l' on your log device to make sure the label is still intact.


Thanks,
George

Darren Taylor wrote:
I'm at a loss, I've managed to get myself into a fix. I'm not sure where the problem is, but essentially i have a zpool i cannot import. This particular pool used to have a two drives (not shown below), one for cache and another for log. I'm unsure why they are no longer detected on zpool import...  the disks are still connected to the system and show up when running format for a list. 


dar...@lexx:~# zpool import
  pool: tank
id: 15136317365944618902
 state: UNAVAIL
status: The pool was last accessed by another system.
action: The pool cannot be imported due to damaged devices or data.
   see: http://www.sun.com/msg/ZFS-8000-EY
config:

tank UNAVAIL  missing device
  raidz1-0   ONLINE
c6t4d0   ONLINE
c6t5d0   ONLINE
c6t6d0   ONLINE
c6t7d0   ONLINE
  raidz1-1   ONLINE
c6t0d0   ONLINE
c6t1d0   ONLINE
c6t2d0   ONLINE
c6t3d0   ONLINE
dar...@lexx:~# 

The above disks are the data disks which appear to be online without issue. i was running version 22 on this pool. 


Any help appreciated


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Which ZFS events/errors appears in FMA?

2010-08-03 Thread Andras Spitzer
Hi,

Is there a summary somewhere which describes exactly which ZFS related 
events/errors appears in FMA today, also some sort of roadmap about 
events/errors that are planned to be reported via FMA in the future?

Regards,
sendai
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Restripe

2010-08-03 Thread Eduardo Bragatto

Hi,

I have a large pool (~50TB total, ~42TB usable), composed of 4 raidz1  
volumes (of 7 x 2TB disks each):


# zpool iostat -v | grep -v c4
 capacity operationsbandwidth
pool   used  avail   read  write   read  write
  -  -  -  -  -  -
backup35.2T  15.3T602272  15.3M  11.1M
  raidz1  11.6T  1.06T138 49  2.99M  2.33M
  raidz1  11.8T   845G163 54  3.82M  2.57M
  raidz1  6.00T  6.62T161 84  4.50M  3.16M
  raidz1  5.88T  6.75T139 83  4.01M  3.09M
  -  -  -  -  -  -

Originally there were only the first two raidz1 volumes, and the two  
from the bottom were added later.


You can notice that by the amount of used / free space. The first two  
volumes have ~11TB used and ~1TB free, while the other two have around  
~6TB used and ~6TB free.


I have hundreds of zfs'es storing backups from several servers. Each  
ZFS has about 7 snapshots of older backups.


I have the impression I'm getting degradation in performance due to  
the limited space in the first two volumes, specially the second,  
which has only 845GB free.


Is there any way to re-stripe the pool, so I can take advantage of all  
spindles across the raidz1 volumes? Right now it looks like the newer  
volumes are doing the heavy while the other two just hold old data.


Thanks,
Eduardo Bragatto
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Using multiple logs on single SSD devices

2010-08-03 Thread Jonathan Loran

On Aug 2, 2010, at 8:18 PM, Edward Ned Harvey wrote:

 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Jonathan Loran
 
 Because you're at pool v15, it does not matter if the log device fails while
 you're running, or you're offline and trying to come online, or whatever.
 Simply if the log device fails, unmirrored, and the version is less than 19,
 the pool is simply lost.  There are supposedly techniques to recover, so
 it's not necessarily a data unrecoverable by any means situation, but you
 certainly couldn't recover without a server crash, or at least shutdown.
 And it would certainly be a nightmare, at best.  The system will not fall
 back to ZIL in the main pool.  That was a feature created in v19.

Yes, after sending my query yesterday, I found the zfs best practices guide, 
which I haven't read for a long time, many update w/r to SSD devices (many by 
you Ed, no?).  I also found the long thread on this list, which somehow I 
missed in my first pass about SSD best practices.  After reading this, I became 
much more nervious.  My previous assumption when I added the log was based upon 
the IOP rate I saw to the ZIL, and the number of IOP an Intel X25e could take, 
and it looked like the drive should last a few years, at least.  But of course, 
that ssumes no other failure modes.  Given the high price of failure, now that 
I know the system will suddenly go south, I realized that action needed to be 
taken ASAP to mirror the log.

 I'm afraid it's too late for that, unless you're willing to destroy 
 recreate your pool.  You cannot remove the existing log device.  You cannot
 shrink it.  You cannot replace it with a smaller one.  The only things you
 can do right now are:
 
 (a) Start mirroring that log device with another device of the same size or
 larger.
 or
 (b) Buy another SSD which is larger than the first.  Create a slice on the
 2nd which is equal to the size of the first.  Mirror the first onto the
 slice of the 2nd.  After resilver, detach the first drive, and replace it
 with another one of the larger drives.  Slice the 3rd drive just like the
 2nd, and mirror the 2nd drive slice onto it.  Now you've got a mirrored 
 sliced device, without any downtime, but you had to buy 2x 2x larger drives
 in order to do it.
 or
 (c) Destroy  recreate your whole pool, but learn from your mistake.  This
 time, slice each SSD, and mirror the slices to form the log device.
 
 BTW, ask me how I know this in such detail?  It's cuz I made the same
 mistake last year.  There was one interesting possibility we considered, but
 didn't actually implement:
 
 We are running a stripe of mirrors.  We considered the possibility of
 breaking the mirrors, creating a new pool out of the other half using the
 SSD properly sliced.  Using zfs send to replicate all the snapshots over
 to the new pool, up to a very recent time. 
 
 Then, we'd be able to make a very short service window.  Shutdown briefly,
 send that one final snapshot to the new pool, destroy the old pool, rename
 the new pool to take the old name, and bring the system back up again.
 Instead of scheduling a long service window.  As soon as the system is up
 again, start mirroring and resilvering (er ... initial silvering), and of
 course, slice the SSD before attaching the mirror.
 
 Naturally there is some risk, running un-mirrored long enough to send the
 snaps... and so forth.
 
 Anyway, just an option to consider.
 

Destroying this pool is very much off the table.  It holds home directories for 
our whole lab, about 375 of them.  If I take the system offline, then no one 
works until it's back up.  You could say this machine is mission critical.  The 
host has been very reliable.  Everyone is now spoiled by how it never goes 
down, and I'm very proud of that fact.  The only way I could recreate the pool 
would be through some clever means like you give, or I thought perhaps using 
AVS to replicate one side of the mirror, then everything could be done through 
a quick reboot.

One other idea I had was using a sparse zvol for the log, but I think 
eventually, the sparse volume would fill up beyond its physical capacity.  On 
top of that, this would mean we would have a log that is a zvol from another 
zpool, which I think could a cause boot race condition.  

I think the real solution to my immediate problem is this:  Bite the bullet, 
and add storage to the existing pool.  It won't be as clean as I like, and it 
would disturb my nicely balanced mirror stripe with new large empty vdevs, 
which I fear could impact performance down the road when the original stripe 
fills up, and all writes go to the new vdevs.  Perhaps by the time that 
happens, the feature to rebalance the pool will be available, if that's even 
being worked on.  Maybe that's wishful thinking.  At any rate, if I don't have 
to add another pool, I can mirror the logs I have: problem solved. 

Finally, I'm told by my SE that ZFS in 

[zfs-discuss] snapshot space - miscalculation?

2010-08-03 Thread R. Nippes
zfs get all claims that i have 523G used by snapshot.
i want to get rid of it. 
but when i look at the space used by each snapshot i can't find the one that 
can occupy so much space 

daten/backups used  
  959G -
daten/backups 
usedbysnapshots 523G
 -
daten/backups usedbydataset 
  437G -
daten/backups 
usedbychildren  0   
 -
daten/backups 
usedbyrefreservation0   
 -
daten/back...@zfs-auto-snap_hourly-2009-12-20-16_00   used  
  228M -
daten/back...@zfs-auto-snap_hourly-2009-12-20-17_00   used  
  150K -
daten/back...@zfs-auto-snap_monthly-2009-12-25-21_43  used  
  7,94M-
daten/back...@zfs-auto-snap_monthly-2010-02-01-00_00  used  
  60,3M-
daten/back...@zfs-auto-snap:daily-2010-03-01-19:20used  
  0-
daten/back...@zfs-auto-snap:monthly-2010-03-01-19:20  used  
  0-
daten/back...@zfs-auto-snap:weekly-2010-03-01-19:20   used  
  0-
daten/back...@zfs-auto-snap:daily-2010-03-02-00:00used  
  0-
daten/back...@zfs-auto-snap:daily-2010-03-03-00:00used  
  0-
daten/back...@zfs-auto-snap:daily-2010-03-04-00:00used  
  0-
daten/back...@zfs-auto-snap_monthly-2010-03-04-20_27  used  
  0-
daten/back...@zfs-auto-snap_monthly-2010-04-01-00_00  used  
  57,4M-
daten/back...@zfs-auto-snap_monthly-2010-05-01-00_00  used  
  57,5M-
daten/back...@zfs-auto-snap_monthly-2010-06-01-00_00  used  
  57,4M-
daten/back...@zfs-auto-snap_monthly-2010-07-01-00_00  used  
  0-
daten/back...@zfs-auto-snap_daily-2010-07-02-00_00used  
  0-
daten/back...@zfs-auto-snap_daily-2010-07-03-00_00used  
  0-
daten/back...@zfs-auto-snap_daily-2010-07-04-00_00used  
  0-
daten/back...@zfs-auto-snap_daily-2010-07-05-00_00used  
  2,92M-
daten/back...@zfs-auto-snap_daily-2010-07-06-00_00used  
  132K -
daten/back...@zfs-auto-snap_daily-2010-07-07-00_00used  
  0-
daten/back...@zfs-auto-snap_daily-2010-07-08-00_00used  
  0-
daten/back...@zfs-auto-snap_daily-2010-07-09-00_00used  
  0-
daten/back...@zfs-auto-snap_daily-2010-07-10-00_00used  
  0-
daten/back...@zfs-auto-snap_daily-2010-07-11-00_00used  
  0 

Re: [zfs-discuss] When is the L2ARC refreshed if on a separate drive?

2010-08-03 Thread Tomas Ögren
On 03 August, 2010 - valrh...@gmail.com sent me these 1,2K bytes:

 I'm running a mirrored pair of 2 TB SATA drives as my data storage drives on 
 my home workstation, a Core i7-based machine with 10 GB of RAM. I recently 
 added a sandforce-based 60 GB SSD (OCZ Vertex 2, NOT the pro version) as an 
 L2ARC to the single mirrored pair. I'm running B134, with ZFS pool version 
 22, with dedup enabled. If I understand correctly, the dedup table should be 
 in the L2ARC on the SSD, and I should have enough RAM to keep the references 
 to that table in memory, and that this is therefore a well-performing 
 solution.
 
 My question is what happens at power off. Does the cache device essentially 
 get cleared, and the machine has to rebuild it when it boots? Or is it 
 persistent. That is, should performance improve after a little while 
 following a reboot, or is it always constant once it builds the L2ARC once? 

L2ARC is currently cleared at boot. There is an RFE to make it
persistent.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Using multiple logs on single SSD devices

2010-08-03 Thread Richard Elling
On Aug 3, 2010, at 9:29 AM, Roy Sigurd Karlsbakk wrote:
 
 - Second question, how about this: partition the two X25E drives into two, 
 and then mirror each half of each drive as log devices for each pool.  Am I 
 missing something with this scheme?  On boot, will the GUID for each pool get 
 found by the system from the partitioned log drives?
 IIRC several posts in here, some by Cindy, have been about using devices 
 shared among pools, and what's said is that this is not recommended because 
 of potential deadlocks.

No, you misunderstand. The potential deadlock condition occurs when you 
use ZFS in a single system to act as both the file system and a device.  For
example, using a zvol on rpool as a ZIL for another pool.  For devices 
themselves, 
ZFS has absolutely no problem using block devices as presented by partitions or 
slices. This has been true for all file systems for all time.
 -- richard

-- 
Richard Elling
rich...@nexenta.com   +1-760-896-4422
Enterprise class storage for everyone
www.nexenta.com



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] iScsi slow

2010-08-03 Thread Ross Walker
On Aug 3, 2010, at 5:56 PM, Robert Milkowski mi...@task.gda.pl wrote:

 On 03/08/2010 22:49, Ross Walker wrote:
 On Aug 3, 2010, at 12:13 PM, Roch Bourbonnaisroch.bourbonn...@sun.com  
 wrote:
 
   
 Le 27 mai 2010 à 07:03, Brent Jones a écrit :
 
 
 On Wed, May 26, 2010 at 5:08 AM, Matt Connolly
 matt.connolly...@gmail.com  wrote:
   
 I've set up an iScsi volume on OpenSolaris (snv_134) with these commands:
 
 sh-4.0# zfs create rpool/iscsi
 sh-4.0# zfs set shareiscsi=on rpool/iscsi
 sh-4.0# zfs create -s -V 10g rpool/iscsi/test
 
 The underlying zpool is a mirror of two SATA drives. I'm connecting from 
 a Mac client with global SAN initiator software, connected via Gigabit 
 LAN. It connects fine, and I've initialiased a mac format volume on that 
 iScsi volume.
 
 Performance, however, is terribly slow, about 10 times slower than an SMB 
 share on the same pool. I expected it would be very similar, if not 
 faster than SMB.
 
 Here's my test results copying 3GB data:
 
 iScsi:  44m01s  1.185MB/s
 SMB share:  4m2711.73MB/s
 
 Reading (the same 3GB) is also worse than SMB, but only by a factor of 
 about 3:
 
 iScsi:  4m3611.34MB/s
 SMB share:  1m4529.81MB/s
 
 
 cleaning up some old mail
 
 Not unexpected. Filesystems have readahead code to prefetch enough to cover 
 the latency of the read request. iSCSI only responds to the request.
 Put a filesystem on top of iscsi and try again.
 
 For writes, iSCSI is synchronous and SMB is not.
 
 It may be with ZFS, but iSCSI is neither synchronous nor asynchronous is is 
 simply SCSI over IP.
 
 It is the application using the iSCSI protocol that determines whether it is 
 synchronous, issue a flush after write, or asynchronous, wait until target 
 flushes.
 
 I think the ZFS developers didn't quite understand that and wanted strict 
 guidelines like NFS has, but iSCSI doesn't have those, it is a lower level 
 protocol than NFS is, so they forced guidelines on it and violated the 
 standard.
 
   
 Nothing has been violated here.
 Look for WCE flag in COMSTAR where you can control how a given zvol  should 
 behave (synchronous or asynchronous). Additionally in recent build you have 
 zfs set sync={disabled|default|always} which also works with zvols.
 
 So you do have a control over how it is supposed to behave and to make it 
 nice it is even on per zvol basis.
 It is just that the default is synchronous.

Ah, ok, my experience has been with Solaris and the iscsitgt which, correct me 
if I am wrong, is still synchronous only.

-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] iScsi slow

2010-08-03 Thread Saxon, Will
 -Original Message-
 From: zfs-discuss-boun...@opensolaris.org 
 [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of 
 Robert Milkowski
 Sent: Tuesday, August 03, 2010 5:57 PM
 To: zfs-discuss@opensolaris.org
 Subject: Re: [zfs-discuss] iScsi slow
 
 On 03/08/2010 22:49, Ross Walker wrote:
  On Aug 3, 2010, at 12:13 PM, Roch 
 Bourbonnaisroch.bourbonn...@sun.com  wrote:
 
 
  Le 27 mai 2010 à 07:03, Brent Jones a écrit :
 
   
  On Wed, May 26, 2010 at 5:08 AM, Matt Connolly
  matt.connolly...@gmail.com  wrote:
 
  I've set up an iScsi volume on OpenSolaris (snv_134) 
 with these commands:
 
  sh-4.0# zfs create rpool/iscsi
  sh-4.0# zfs set shareiscsi=on rpool/iscsi
  sh-4.0# zfs create -s -V 10g rpool/iscsi/test
 
  The underlying zpool is a mirror of two SATA drives. I'm 
 connecting from a Mac client with global SAN initiator 
 software, connected via Gigabit LAN. It connects fine, and 
 I've initialiased a mac format volume on that iScsi volume.
 
  Performance, however, is terribly slow, about 10 times 
 slower than an SMB share on the same pool. I expected it 
 would be very similar, if not faster than SMB.
 
  Here's my test results copying 3GB data:
 
  iScsi:  44m01s  1.185MB/s
  SMB share:  4m2711.73MB/s
 
  Reading (the same 3GB) is also worse than SMB, but only 
 by a factor of about 3:
 
  iScsi:  4m3611.34MB/s
  SMB share:  1m4529.81MB/s
 
   
  cleaning up some old mail
 
  Not unexpected. Filesystems have readahead code to 
 prefetch enough to cover the latency of the read request. 
 iSCSI only responds to the request.
  Put a filesystem on top of iscsi and try again.
 
  For writes, iSCSI is synchronous and SMB is not.
   
  It may be with ZFS, but iSCSI is neither synchronous nor 
 asynchronous is is simply SCSI over IP.
 
  It is the application using the iSCSI protocol that 
 determines whether it is synchronous, issue a flush after 
 write, or asynchronous, wait until target flushes.
 
  I think the ZFS developers didn't quite understand that and 
 wanted strict guidelines like NFS has, but iSCSI doesn't have 
 those, it is a lower level protocol than NFS is, so they 
 forced guidelines on it and violated the standard.
 
 
 Nothing has been violated here.
 Look for WCE flag in COMSTAR where you can control how a given zvol  
 should behave (synchronous or asynchronous). Additionally in recent 
 build you have zfs set sync={disabled|default|always} which 
 also works 
 with zvols.
 
 So you do have a control over how it is supposed to behave 
 and to make 
 it nice it is even on per zvol basis.
 It is just that the default is synchronous.

And if it's synchronous, you can still accelerate performance by using L2ARC 
and SLOG devices, just like you can with NFS, correct?

-Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] iScsi slow

2010-08-03 Thread Ross Walker
On Aug 3, 2010, at 5:56 PM, Robert Milkowski mi...@task.gda.pl wrote:

 On 03/08/2010 22:49, Ross Walker wrote:
 On Aug 3, 2010, at 12:13 PM, Roch Bourbonnaisroch.bourbonn...@sun.com  
 wrote:
 
   
 Le 27 mai 2010 à 07:03, Brent Jones a écrit :
 
 
 On Wed, May 26, 2010 at 5:08 AM, Matt Connolly
 matt.connolly...@gmail.com  wrote:
   
 I've set up an iScsi volume on OpenSolaris (snv_134) with these commands:
 
 sh-4.0# zfs create rpool/iscsi
 sh-4.0# zfs set shareiscsi=on rpool/iscsi
 sh-4.0# zfs create -s -V 10g rpool/iscsi/test
 
 The underlying zpool is a mirror of two SATA drives. I'm connecting from 
 a Mac client with global SAN initiator software, connected via Gigabit 
 LAN. It connects fine, and I've initialiased a mac format volume on that 
 iScsi volume.
 
 Performance, however, is terribly slow, about 10 times slower than an SMB 
 share on the same pool. I expected it would be very similar, if not 
 faster than SMB.
 
 Here's my test results copying 3GB data:
 
 iScsi:  44m01s  1.185MB/s
 SMB share:  4m2711.73MB/s
 
 Reading (the same 3GB) is also worse than SMB, but only by a factor of 
 about 3:
 
 iScsi:  4m3611.34MB/s
 SMB share:  1m4529.81MB/s
 
 
 cleaning up some old mail
 
 Not unexpected. Filesystems have readahead code to prefetch enough to cover 
 the latency of the read request. iSCSI only responds to the request.
 Put a filesystem on top of iscsi and try again.
 
 For writes, iSCSI is synchronous and SMB is not.
 
 It may be with ZFS, but iSCSI is neither synchronous nor asynchronous is is 
 simply SCSI over IP.
 
 It is the application using the iSCSI protocol that determines whether it is 
 synchronous, issue a flush after write, or asynchronous, wait until target 
 flushes.
 
 I think the ZFS developers didn't quite understand that and wanted strict 
 guidelines like NFS has, but iSCSI doesn't have those, it is a lower level 
 protocol than NFS is, so they forced guidelines on it and violated the 
 standard.
 
   
 Nothing has been violated here.
 Look for WCE flag in COMSTAR where you can control how a given zvol  should 
 behave (synchronous or asynchronous). Additionally in recent build you have 
 zfs set sync={disabled|default|always} which also works with zvols.
 
 So you do have a control over how it is supposed to behave and to make it 
 nice it is even on per zvol basis.
 It is just that the default is synchronous.

I forgot to ask, if the ZVOL is set async with WCE will it still honor a flush 
command from the initiator and flush those TXGs held for the ZVOL?

-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] When is the L2ARC refreshed if on a separate drive?

2010-08-03 Thread valrh...@gmail.com
Thanks for the info!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] iScsi slow

2010-08-03 Thread Robert Milkowski

On 03/08/2010 23:20, Ross Walker wrote:



Nothing has been violated here.
Look for WCE flag in COMSTAR where you can control how a given zvol  should 
behave (synchronous or asynchronous). Additionally in recent build you have zfs 
set sync={disabled|default|always} which also works with zvols.

So you do have a control over how it is supposed to behave and to make it nice 
it is even on per zvol basis.
It is just that the default is synchronous.
 

Ah, ok, my experience has been with Solaris and the iscsitgt which, correct me 
if I am wrong, is still synchronous only.

   


I don't remember if it offered or not an ability to manipulate zvol's 
WCE flag but if it didn't then you can do it anyway as it is a zvol 
property. For an example see 
http://milek.blogspot.com/2010/02/zvols-write-cache.html


--
Robert Milkowski
http://milek.blogspot.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] problem with zpool import - zil and cache drive are not displayed?

2010-08-03 Thread Victor Latushkin

On Aug 4, 2010, at 12:23 AM, Darren Taylor wrote:

 Hi George, 
 
 I think you are right. The log device looks to have suffered a complete loss, 
 there is no data on the disk at all. The log device was a acard ram drive 
 (with battery backup), but somehow it has faulted clearing all data. 
 
 --victor gave me this advice, and queried about the zpool.cache--
 Looks like there's a hardware problem with c7d0 as it appears to contain 
 garbage. Do you have zpool.cache with this pool configuration available?

Besides containing garbage former log device now appears to have different 
geometry and is not able to read in the higher LBA ranges. So i'd say it is 
broken.

 c7d0 was the log device. I'm unsure what the next step is, but i'm assuming 
 there is a way to grab the drives original config from the zpool.cache file 
 and apply back to the drive?

I mocked up log device in a file, and that made zpool import more happy:

bash-4.0# zpool import
  pool: tank
id: 15136317365944618902
 state: DEGRADED
status: The pool was last accessed by another system.
action: The pool can be imported despite missing or damaged devices.  The
fault tolerance of the pool may be compromised if imported.
   see: http://www.sun.com/msg/ZFS-8000-EY
config:

tankDEGRADED
  raidz1-0  ONLINE
c6t4d0  ONLINE
c6t5d0  ONLINE
c6t6d0  ONLINE
c6t7d0  ONLINE
  raidz1-1  ONLINE
c6t0d0  ONLINE
c6t1d0  ONLINE
c6t2d0  ONLINE
c6t3d0  ONLINE
cache
  c8d1
logs
  c13d1s0   UNAVAIL  cannot open



bash-4.0# zpool import -fR / tank
cannot import 'tank': one or more devices is currently unavailable
Recovery is possible, but will result in some data loss.
Returning the pool to its state as of July 21, 2010 03:49:50 AM NZST
should correct the problem.  Approximately 91 seconds of data
must be discarded, irreversibly.  After rewind, several
persistent user-data errors will remain.  Recovery can be attempted
by executing 'zpool import -F tank'.  A scrub of the pool
is strongly recommended after recovery.
bash-4.0#

So if you are happy with the results, you can perform actual import with

zpool import -fF -R / tank

You should then be able to remove log device completely.

regards
victor

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Restripe

2010-08-03 Thread Khyron
Short answer: No.

Long answer: Not without rewriting the previously written data.  Data
is being striped over all of the top level VDEVs, or at least it should
be.  But there is no way, at least not built into ZFS, to re-allocate the
storage to perform I/O balancing.  You would basically have to do
this manually.

Either way, I'm guessing this isn't the answer you wanted but hey, you
get what you get.

On Tue, Aug 3, 2010 at 13:52, Eduardo Bragatto edua...@bragatto.com wrote:

 Hi,

 I have a large pool (~50TB total, ~42TB usable), composed of 4 raidz1
 volumes (of 7 x 2TB disks each):

 # zpool iostat -v | grep -v c4
 capacity operationsbandwidth
 pool   used  avail   read  write   read  write
   -  -  -  -  -  -
 backup35.2T  15.3T602272  15.3M  11.1M
  raidz1  11.6T  1.06T138 49  2.99M  2.33M
  raidz1  11.8T   845G163 54  3.82M  2.57M
  raidz1  6.00T  6.62T161 84  4.50M  3.16M
  raidz1  5.88T  6.75T139 83  4.01M  3.09M
   -  -  -  -  -  -

 Originally there were only the first two raidz1 volumes, and the two from
 the bottom were added later.

 You can notice that by the amount of used / free space. The first two
 volumes have ~11TB used and ~1TB free, while the other two have around ~6TB
 used and ~6TB free.

 I have hundreds of zfs'es storing backups from several servers. Each ZFS
 has about 7 snapshots of older backups.

 I have the impression I'm getting degradation in performance due to the
 limited space in the first two volumes, specially the second, which has only
 845GB free.

 Is there any way to re-stripe the pool, so I can take advantage of all
 spindles across the raidz1 volumes? Right now it looks like the newer
 volumes are doing the heavy while the other two just hold old data.

 Thanks,
 Eduardo Bragatto
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




-- 
You can choose your friends, you can choose the deals. - Equity Private

If Linux is faster, it's a Solaris bug. - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Restripe

2010-08-03 Thread Eduardo Bragatto

On Aug 3, 2010, at 10:08 PM, Khyron wrote:


Long answer: Not without rewriting the previously written data.  Data
is being striped over all of the top level VDEVs, or at least it  
should
be.  But there is no way, at least not built into ZFS, to re- 
allocate the

storage to perform I/O balancing.  You would basically have to do
this manually.

Either way, I'm guessing this isn't the answer you wanted but hey, you
get what you get.


Actually, that was the answer I was expecting, yes. The real question,  
then, is: what data should I rewrite? I want to rewrite data that's  
written on the nearly full volumes so they get spread to the volumes  
with more space available.


Should I simply do a  zfs send | zfs receive on all ZFSes I have?  
(we are talking about 400 ZFSes with about 7 snapshots each, here)...  
Or is there a way to rearrange specifically the data from the nearly  
full volumes?


Thanks,
Eduardo Bragatto
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [install-discuss] Installing on alternate hardware

2010-08-03 Thread Emily Grettel

Wow! Thanks for the information James, after consulting with my manager we're 
going to install the text-install version.

 

I'm going to try that as we're installing it on a new disk. Just curious, if I 
do an export of about 3 zvols and reimport them, the mounts will be there but 
will I have to reconfigure CIFS, permissions and users etc?

 

Sorry, I'm but a n00b.

 

Thanks,

Em
 
 Date: Tue, 3 Aug 2010 22:48:36 +1000
 From: j...@opensolaris.org
 To: emilygrettelis...@hotmail.com
 CC: carls...@workingcode.com; install-disc...@opensolaris.org
 Subject: Re: [install-discuss] Installing on alternate hardware
 
 On 3/08/10 10:20 PM, Emily Grettel wrote:
  Thanks for the reply James,
 
   If it were my system, I'd export the ZFS volumes containing my data,
   reinstall on the new motherboard, and then reimport ZFS.
 
  I was thinking that too, but unfortunately I've created quite a few
  zones and there are quite a few users on the system.
 
  Redoing the entire server will take a week :(
 
  Thanks though, I shall try driver-discuss too!
 
 The essential problem is that your new motherboard will have
 different paths to each device.
 
 As James mentioned, you could change the first line of
 /etc/path_to_inst, or.
 
 here's the _unsupported_ totally ugly hack way of getting a
 new motherboard up and running.
 
 Before you start, BE VERY GRATEFUL you're running ZFS. (I'll
 explain why a little later).
 
 
 
 * touch /reconfigure
 * poweroff
 * replace motherboard
 
 * turn system on
 * do whatever bios futzing is needed in order to find your
 primary boot device
 
 * at the grub boot menu, select your desired BE, navigate to
 the kernel$ line and hit 'e'
 
 * go to the end of this line, and hit 'a' (to add), then add
  -arvs (ie, a space, then -arvs) and hit escape
 
 * hit 'b' to boot
 
 * Unless you're prompted for where /etc/path_to_inst is,
 hit enter each time you're prompted during the boot process.
 
 * When you're asked for a username for single-user mode, type
 root and enter your root password.
 
 * Run these operations to test:
 
 format  /dev/null
 zpool status -v
 zpool import -a
 zfs list
 dladm show-link
 dladm show-ether
 
 
 
 The format test will print out the device paths for the
 devices which the kernel has probed. Note these for later.
 
 The zpool status -xv test will show you the paths to each
 vdev in your pools.
 
 The zpool import -a test will attempt to import as many
 pools as can be found. This should work seamlessly, and
 you should then see all your datasets in the zfs list test.
 
 The dladm tests will show you what NICs you have installed.
 Note the instance numbers - they almost certainly will have
 changed from what you have configured with /etc/hostname.$nic$inst.
 Change the /etc/hostname file to reflect the new instance
 number(s).
 
 Also, if you are running a graphics head on this system, and
 you've got a customised /etc/X11/xorg.conf, make sure you
 check the BusID settings to make sure that they're correct.
 Use the /usr/bin/scanpci utility for this.
 
 
 Now, why should be grateful for ZFS? Because ZFS uses the
 cXtYdZ number as a fallback for detecting and opening
 devices. What it uses as a primary method is the device id,
 or devid. This is closely related to the GUID aka Globally
 Unique IDentifier. If you want more info about devids and
 guids, you can review a presentation I wrote about them a
 while back:
 
 http://www.slideshare.net/JamesCMcPherson/what-is-a-guid
 
 
 James C. McPherson
 --
 Oracle
 http://www.jmcp.homeunix.com/blog
  ___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [install-discuss] Installing on alternate hardware

2010-08-03 Thread James C. McPherson

On  4/08/10 12:55 PM, Emily Grettel wrote:

Wow! Thanks for the information James, after consulting with my manager
we're going to install the text-install version.


Better to stick with the supportable methods, imho :-)


I'm going to try that as we're installing it on a new disk. Just
curious, if I do an export of about 3 zvols and reimport them, the
mounts will be there but will I have to reconfigure CIFS, permissions
and users etc?


I would not expect so.
You export zpools, not the the datasets within them.
ZVols are datasets within pools, and whatever properties
you have configured for them within the pool should
stick around over an export/import operation. If they
don't, I would be very, very surprised.


[note: everybody was a noob at some point]

James C. McPherson
--
Oracle
http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] problem with zpool import - zil and cache drive are not displayed?

2010-08-03 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Darren Taylor
 
 I'm not sure
 where the problem is, but essentially i have a zpool i cannot import.
 This particular pool used to have a two drives (not shown below), one
 for cache and another for log. I'm unsure why they are no longer
 detected on zpool import...  the disks are still connected to the
 system and show up when running format for a list.

Perhaps the log  cache were not using entire devices, but rather, just
slices?  I could be wrong, but I don't think zpool import will scan slices
by default.  

If slices exist on the cache  log devices, I might suggest using the -d
option of zpool import.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Restripe

2010-08-03 Thread Eduardo Bragatto

On Aug 3, 2010, at 10:57 PM, Richard Elling wrote:

Unfortunately, zpool iostat is completely useless at describing  
performance.
The only thing it can do is show device bandwidth, and everyone here  
knows

that bandwidth is not performance, right?  Nod along, thank you.


I totally understand that, I only used the output to show the space  
utilization per raidz1 volume.


Yes, and you also notice that the writes are biased towards the  
raidz1 sets
that are less full.  This is exactly what you want :-)  Eventually,  
when the less

empty sets become more empty, the writes will rebalance.


Actually, if we are going to consider the values from zpool iostats,  
they are just slightly biased towards the volumes I would want -- for  
example, on the first post I've made, the volume with less free space  
had 845GB free.. that same volume now has 833GB -- I really would like  
to just stop writing to that volume at this point as I've experience  
very bad performance in the past when a volume gets nearly full.


As a reference, here's the information I posted less than 12 hours ago:

# zpool iostat -v | grep -v c4
capacity operationsbandwidth
pool   used  avail   read  write   read  write
  -  -  -  -  -  -
backup35.2T  15.3T602272  15.3M  11.1M
 raidz1  11.6T  1.06T138 49  2.99M  2.33M
 raidz1  11.8T   845G163 54  3.82M  2.57M
 raidz1  6.00T  6.62T161 84  4.50M  3.16M
 raidz1  5.88T  6.75T139 83  4.01M  3.09M
  -  -  -  -  -  -

And here's the info from the same system, as I write now:

# zpool iostat -v | grep -v c4
 capacity operationsbandwidth
pool   used  avail   read  write   read  write
  -  -  -  -  -  -
backup35.3T  15.2T541208  9.90M  6.45M
  raidz1  11.6T  1.06T116 38  2.16M  1.41M
  raidz1  11.8T   833G122 39  2.28M  1.49M
  raidz1  6.02T  6.61T152 64  2.72M  1.78M
  raidz1  5.89T  6.73T149 66  2.73M  1.77M
  -  -  -  -  -  -

As you can see, the second raidz1 volume is not being spared and has  
been providing with almost as much space as the others (and even more  
compared to the first volume).


I have the impression I'm getting degradation in performance due to  
the limited space in the first two volumes, specially the second,  
which has only 845GB free.


Impressions work well for dating, but not so well for performance.
Does your application run faster or slower?


You're a funny guy. :)

Let me re-phrase it: I'm sure I'm getting degradation in performance  
as my applications are waiting more on I/O now than they used to do  
(based on CPU utilization graphs I have). The impression part, is that  
the reason is the limited space in those two volumes -- as I said, I  
already experienced bad performance on zfs systems running nearly out  
of space before.


Is there any way to re-stripe the pool, so I can take advantage of  
all spindles across the raidz1 volumes? Right now it looks like the  
newer volumes are doing the heavy while the other two just hold old  
data.


Yes, of course.  But it requires copying the data, which probably  
isn't feasible.


I'm willing to copy data around to get this accomplish, I'm really  
just looking for the best method -- I have more than 10TB free, so I  
have some space to play with if I have to duplicate some data and  
erase the old copy, for example.


Thanks,
Eduardo Bragatto
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] problem with zpool import - zil and cache drive are not displayed?

2010-08-03 Thread Richard Elling
On Aug 3, 2010, at 8:39 PM, Edward Ned Harvey wrote:

 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Darren Taylor
 
 I'm not sure
 where the problem is, but essentially i have a zpool i cannot import.
 This particular pool used to have a two drives (not shown below), one
 for cache and another for log. I'm unsure why they are no longer
 detected on zpool import...  the disks are still connected to the
 system and show up when running format for a list.
 
 Perhaps the log  cache were not using entire devices, but rather, just
 slices?  I could be wrong, but I don't think zpool import will scan slices
 by default.  

Entire devices do not exist, only slices.

 If slices exist on the cache  log devices, I might suggest using the -d
 option of zpool import.

The -d option allows searching in another directory.
 -- richard

-- 
Richard Elling
rich...@nexenta.com   +1-760-896-4422
Enterprise class storage for everyone
www.nexenta.com



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Restripe

2010-08-03 Thread Khyron
I notice you use the word volume which really isn't accurate or
appropriate here.

If all of these VDEVs are part of the same pool, which as I recall you
said they are, then writes are striped across all of them (with bias for
the more empty aka less full VDEVs).

You probably want to zfs send the oldest dataset (ZFS terminology
for a file system) into a new dataset.  That oldest dataset was created
when there were only 2 top level VDEVs, most likely.  If you have
multiple datasets created when you had only 2 VDEVs, then send/receive
them both (in serial fashion, one after the other).  If you have room for
the snapshots too, then send all of it and then delete the source dataset
when done.  I think this will achieve what you want.

You may want to get a bit more specific and choose from the oldest
datasets THEN find the smallest of those oldest datasets and
send/receive it first.  That way, the send/receive completes in less
time, and when you delete the source dataset, you've now created
more free space on the entire pool but without the risk of a single
dataset exceeding your 10 TiB of workspace.

ZFS' copy-on-write nature really wants no less than 20% free because
you never update data in place; a new copy is always written to disk.

You might want to consider turning on compression on your new datasets
too, especially if you have free CPU cycles to spare.  I don't know how
compressible your data is, but if it's fairly compressible, say lots of
text,
then you might get some added benefit when you copy the old data into
the new datasets.  Saving more space, then deleting the source dataset,
should help your pool have more free space, and thus influence your
writes for better I/O balancing when you do the next (and the next) dataset
copies.

HTH.

On Tue, Aug 3, 2010 at 22:48, Eduardo Bragatto edua...@bragatto.com wrote:

 On Aug 3, 2010, at 10:08 PM, Khyron wrote:

  Long answer: Not without rewriting the previously written data.  Data
 is being striped over all of the top level VDEVs, or at least it should
 be.  But there is no way, at least not built into ZFS, to re-allocate the
 storage to perform I/O balancing.  You would basically have to do
 this manually.

 Either way, I'm guessing this isn't the answer you wanted but hey, you
 get what you get.


 Actually, that was the answer I was expecting, yes. The real question,
 then, is: what data should I rewrite? I want to rewrite data that's written
 on the nearly full volumes so they get spread to the volumes with more space
 available.

 Should I simply do a  zfs send | zfs receive on all ZFSes I have? (we are
 talking about 400 ZFSes with about 7 snapshots each, here)... Or is there a
 way to rearrange specifically the data from the nearly full volumes?


 Thanks,
 Eduardo Bragatto
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




-- 
You can choose your friends, you can choose the deals. - Equity Private

If Linux is faster, it's a Solaris bug. - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Restripe

2010-08-03 Thread Richard Elling
On Aug 3, 2010, at 8:55 PM, Eduardo Bragatto wrote:

 On Aug 3, 2010, at 10:57 PM, Richard Elling wrote:
 
 Unfortunately, zpool iostat is completely useless at describing performance.
 The only thing it can do is show device bandwidth, and everyone here knows
 that bandwidth is not performance, right?  Nod along, thank you.
 
 I totally understand that, I only used the output to show the space 
 utilization per raidz1 volume.
 
 Yes, and you also notice that the writes are biased towards the raidz1 sets
 that are less full.  This is exactly what you want :-)  Eventually, when the 
 less
 empty sets become more empty, the writes will rebalance.
 
 Actually, if we are going to consider the values from zpool iostats, they are 
 just slightly biased towards the volumes I would want -- for example, on the 
 first post I've made, the volume with less free space had 845GB free.. that 
 same volume now has 833GB -- I really would like to just stop writing to that 
 volume at this point as I've experience very bad performance in the past when 
 a volume gets nearly full.

The tipping point for the change in the first fit/best fit allocation algorithm 
is
now 96%. Previously, it was 70%. Since you don't specify which OS, build, 
or zpool version, I'll assume you are on something modern.

NB, zdb -m will show the pool's metaslab allocations. If there are no 100%
free metaslabs, then it is a clue that the allocator might be working extra 
hard.

 As a reference, here's the information I posted less than 12 hours ago:
 
 # zpool iostat -v | grep -v c4
capacity operationsbandwidth
 pool   used  avail   read  write   read  write
   -  -  -  -  -  -
 backup35.2T  15.3T602272  15.3M  11.1M
 raidz1  11.6T  1.06T138 49  2.99M  2.33M
 raidz1  11.8T   845G163 54  3.82M  2.57M
 raidz1  6.00T  6.62T161 84  4.50M  3.16M
 raidz1  5.88T  6.75T139 83  4.01M  3.09M
   -  -  -  -  -  -
 
 And here's the info from the same system, as I write now:
 
 # zpool iostat -v | grep -v c4
 capacity operationsbandwidth
 pool   used  avail   read  write   read  write
   -  -  -  -  -  -
 backup35.3T  15.2T541208  9.90M  6.45M
  raidz1  11.6T  1.06T116 38  2.16M  1.41M
  raidz1  11.8T   833G122 39  2.28M  1.49M
  raidz1  6.02T  6.61T152 64  2.72M  1.78M
  raidz1  5.89T  6.73T149 66  2.73M  1.77M
   -  -  -  -  -  -
 
 As you can see, the second raidz1 volume is not being spared and has been 
 providing with almost as much space as the others (and even more compared to 
 the first volume).

Yes, perhaps 1.5-2x data written to the less full raidz1 sets.  The exact 
amount of data is not shown, because zpool iostat doesn't show how 
much data is written, it shows the bandwidth.

 I have the impression I'm getting degradation in performance due to the 
 limited space in the first two volumes, specially the second, which has 
 only 845GB free.
 
 Impressions work well for dating, but not so well for performance.
 Does your application run faster or slower?
 
 You're a funny guy. :)
 
 Let me re-phrase it: I'm sure I'm getting degradation in performance as my 
 applications are waiting more on I/O now than they used to do (based on CPU 
 utilization graphs I have). The impression part, is that the reason is the 
 limited space in those two volumes -- as I said, I already experienced bad 
 performance on zfs systems running nearly out of space before.

OK, so how long are they waiting?  Try iostat -zxCn and look at the
asvc_t column.  This will show how the disk is performing, though it 
won't show the performance delivered by the file system to the 
application.  To measure the latter, try fsstat zfs (assuming you are
on a Solaris distro)

Also, if these are HDDs, the media bandwidth decreases and seeks 
increase as they fill. ZFS tries to favor the outer cylinders (lower numbered
metaslabs) to take this into account.

 Is there any way to re-stripe the pool, so I can take advantage of all 
 spindles across the raidz1 volumes? Right now it looks like the newer 
 volumes are doing the heavy while the other two just hold old data.
 
 Yes, of course.  But it requires copying the data, which probably isn't 
 feasible.
 
 I'm willing to copy data around to get this accomplish, I'm really just 
 looking for the best method -- I have more than 10TB free, so I have some 
 space to play with if I have to duplicate some data and erase the old copy, 
 for example.

zfs send/receive is usually the best method.
 -- richard

-- 
Richard Elling
rich...@nexenta.com   +1-760-896-4422
Enterprise class storage for everyone
www.nexenta.com



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org