date:20091109

Re: [zfs-discuss] zpool resilver - error history

2009-11-09 Thread Thomas Maier-Komor

Marcel Gschwandl schrieb:
 Hi all!
 
 I'm running a Solaris 10 Update 6 (10/08) system and had to resilver a zpool. 
 It's now showing 
 
 snip
 scrub: resilver completed after 9h0m with 21 errors on Wed Nov  4 22:07:49 
 2009
 /snip
 
 but I haven't found an option to see what files where affected, Is there any 
 way to do that?
 
 Thanks in advance
 
 Marcel

Try

zpool status -v poolname
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] sparc + zfs + nfs + mac osX = fail ?

2009-11-09 Thread Peter Lees

 hi folks,
 
 i'm seeing an odd problem  wondered whether others
 had encountered it.
 
 when i try to write to a nevada NFS share from a mac
 os X (10.5) client via the mac's GUI, i get a
 permissions error - the file is 0 bytes, date set to
 jan 1, 1970, and perms set to 000.  writing to the
 share via the command line works fine, so it's not a
 normal permissions problem.
 
 here's the weird thing:
 
 a) with mac os X 10.4,  sparc snv_84 and zfs
 filesystem zhared via NFS = no problem
 
 b) mac os X 10.5, sparc snv_84, zfs =  doesn't work
 
 c) mac os X 10.5, sparc snv_84, UFS =  no problem
 
 d) mac os X 10.5,  x86 snv_115, ZFS =  no problem
 
 e) mac os X, 10.5, sparc snv_125, ZFS = doesn't
 work
 
 i haven't yet tried  sparc snv_125 + UFS,  but i'm
 wondering if there's anyone out here with a working
 combination of: mac os X 10.5, sparc snv_120+, ZFS,
 NFS?
 
 i thought at first it was a problem with the mac 10.5
 nfs client, but then i'd expect (c) and (d) to fail,
 too.
 
 i've tried existing shares, new shares, zfs-based
 sharing, straight share(1m) sharing,
 root=mac_client - none have made much difference.
 it's all been NFSv3
 
 snooping the communication hasn't yielded much
 obvious.
 
 any thoughts/suggestions/wisdom?




following up on this, i can confirm the following:

x64, zfs, snv_126= ok
sparc, zfs, snv_125  = fail
sparc, ufs, snv_125  = ok


thanks for the help offered so far; i've been shown the following by macko:

[...]

OK, it looks like e_nfs_gui has the failure case and it is failing
because the server for some reason is not correctly setting the 
mode of the new file when it is created exclusively.

At Packet #87, we see an exclusive CREATE of pwl_standard.jpg
followed immediately with a SETATTR call (#94) that specifies the new
file's mode should be 0644.  In packet #95, the server reports that
the SETATTR call succeeded, but the new attributes returned for the
file show that the mode is still .  And subsequent ACCESS requests
on that file report that no access is allowed.

The same thing happens again starting at packet #188, but in that case
the mode being set is 0666.  But the result is the same, the server reports
that the SETATTR succeeds but the attributes show that the mode is still
set to .


In e_nfs_cli and n_nfs_cli, the CREATE attempt is not done exclusively
and the problem does not happen.  The non-exclusive (unchecked) create
succeeds and the mode is set to the mode passed in the CREATE call.

In n_nfs_gui, the CREATE attempt is done exclusively like in e_nfs_gui
however the client's attempt to set the mode via SETATTR does succeed
and the new attributes show the new mode.

The difference between e_nfs_gui and n_nfs_gui appears to be in the
NFS server's handling of the SETATTR request that follows the exclusive
CREATE request.

The client attempts to set mode=0644, uid=36493, and gid=0.

The gid=0 is because the directory's gid=0 and the Mac VFS layer
considers the default behavior to be to copy the directory's gid to
the child.  Some servers may balk at this if the credential isn't a
member of the group, but the Mac NFS client will then attempt the
SETATTR again without setting the uid/gid.  This is what happens in
n_nfs_gui.

Strangely, in the e_nfs_gui trace the SETATTR request does appear to
set the gid=0 successfully even though the mode seems unchanged.

[...]


so - something weird is happening when an nfs call is made on a zfs filesystem 
to do exclusive create  setattr

given the places where this succeeds  fails, this is starting to look like a 
zfs (sparc) bug - but i'm happy to be shown that it's some sort of global 
settings problem instead.


any further suggestions?   i can provide snoops/tcpdumps if anyone is 
interested.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] sparc + zfs + nfs + mac osX = fail ?

2009-11-09 Thread Peter Lees

i wonder whether this is related to an itunes update - it tends to fiddle about 
in the library for a bit when you install ? (??)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Quick drive slicing madness question

2009-11-09 Thread Kjetil Torgrim Homme

Darren J Moffat darr...@opensolaris.org writes:

 Mauricio Tavares wrote:
 If I have a machine with two drives, could I create equal size slices
 on the two disks, set them up as boot pool (mirror) and then use the
 remaining space as a striped pool for other more wasteful
 applications?

 You could but why bother ?  Why not just create one mirrored pool.

you get half the space available...  even if you don't forego redundancy
and use mirroring on both slices, you can't extend the data pool later.

 Having two pools on the same disk (or mirroring to the same disk) is
 asking for performance pain if both are being written to heavily.

not too common with heavy writing to rpool, is it?  the main source of
writing is syslog, I guess.

-- 
Kjetil T. Homme
Redpill Linpro AS - Changing the game

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] RAID-Z and virtualization

2009-11-09 Thread Joe Auty

Tim Cook wrote:
 On Sun, Nov 8, 2009 at 2:03 AM, besson3c j...@netmusician.org
 mailto:j...@netmusician.org wrote:

 I'm entertaining something which might be a little wacky, I'm
 wondering what your general reaction to this scheme might be :)


 I would like to invest in some sort of storage appliance, and I
 like the idea of something I can grow over time, something that
 isn't tethered to my servers (i.e. not direct attach), as I'd like
 to keep this storage appliance beyond the life of my servers.
 Therefore, a RAID 5 or higher type setup in a separate 2U chassis
 is attractive to me.

 I do a lot of virtualization on my servers, and currently my VM
 host is running VMWare Server. It seems like the way forward is
 with software based RAID with sophisticated file systems such as
 ZFS or BTRFS rather than a hardware RAID card and dumber file
 system. I really like what ZFS brings to the table in terms of
 RAID-Z and more, so I'm thinking that it might be smart to skip
 getting a hardware RAID card and jump into using ZFS.

 The obvious problem at this point is that ZFS is not available for
 Linux yet, and BTRFS is not yet ready for production usage. So,
 I'm exploring some options. One option is to just get that RAID
 card and reassess all of this when BTRFS is ready, but the other
 option is the following...

 What if I were to run a FreeBSD VM and present it several vdisks,
 format these as ZFS, and serve up ZFS shares through this VM? I
 realize that I'm getting the sort of userland conveniences of ZFS
 this way since the host would still be writing to an EXT3/4
 volume, but on the other hand perhaps these conveniences and other
 benefits would be worthwhile? What would I be missing out on,
 despite no assurances of the same integrity given the underlying
 EXT3/4 volume?

 What do you think, would setting up a VM solely for hosting ZFS
 shares be worth my while as a sort of bridge to BTRFS? I realize
 that I'd have to allocate a lot of RAM to this VM, I'm prepared to
 do that.


 Is this idea retarded? Something you would recommend or do
 yourself? All of this convenience is pointless if there will be
 significant problems, I would like to eventually serve production
 servers this way. Fairly low volume ones, but still important to me.


 Why not just convert the VM's to run in virtualbox and run Solaris
 directly on the hardware?


That's another possibility, but it depends on how Virtualbox stacks up
against VMWare Server. At this point a lot of planning would be
necessary to switch to something else, although this is possibility.

How would Virtualbox stack up against VMWare Server? Last I checked it
doesn't have a remote console of any sort, which would be a deal
breaker. Can I disable allocating virtual memory to Virtualbox VMs? Can
I get my VMs to auto boot in a specific order at runlevel 3? Can I
control my VMs via the command line? I thought Virtualbox was GUI only,
designed for Desktop use primarily?

This switch will only make sense if all of this points to a net positive.



 --Tim


-- 
Joe Auty
NetMusician: web publishing software for musicians
http://www.netmusician.org
j...@netmusician.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] RAID-Z and virtualization

2009-11-09 Thread Joe Auty

Erik Ableson wrote:
 Uhhh - for an unmanaged server you can use ESXi for free. Identical
 server functionality, just requires licenses if you need multiserver
 features (ie vMotion)

How does ESXi w/o vMotion, vSphere, and vCenter server stack up against
VMWare Server? My impression was that you need these other pieces to
make such an infrastructure useful?


 Cordialement,

 Erik Ableson 

 On 8 nov. 2009, at 19:12, Tim Cook t...@cook.ms mailto:t...@cook.ms
 wrote:



 On Sun, Nov 8, 2009 at 11:48 AM, Joe Auty j...@netmusician.org
 mailto:j...@netmusician.org wrote:

 Tim Cook wrote:


 It appears that one can get more in the way of features out
 of VMWare Server for free than with ESX, which is seemingly
 a hook into buying more VMWare stuff.

 I've never looked at Sun xVM, in fact I didn't know it even
 existed, but I do now. Thank you, I will research this some
 more!

 The only other variable, I guess, is the future of said
 technologies given the Oracle takeover? There has been much
 discussion on how this impacts ZFS, but I'll have to learn
 how xVM might be affected, if at all.


 Quite frankly, I wouldn't let that stop you.  Even if Oracle
 were to pull the plug on xVM entirely (not likely), you could
 very easily just move the VM's back over to *insert your
 favorite flavor of Linux* or Citrix Xen.  Including Unbreakable
 Linux (Oracle's version of RHEL).


 I remember now why Xen was a no-go from when I last tested it. I
 rely on the 64 bit version of FreeBSD for most of my VM guest
 machines, and FreeBSD only supports running as domU on i386
 systems. This is a monkey wrench!

 Sorry, just thinking outloud here...



 I have no idea what it supports right now.  I can't even find a
 decent support matrix.  Quite frankly, I would (and do) just use a
 separate server for the fileserver than the vm box.  You can get
 64bit cpu's with 4GB of ram for awfully cheap nowadays.  That should
 be more than enough for most home workloads.

 --Tim

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org mailto:zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


-- 
Joe Auty
NetMusician: web publishing software for musicians
http://www.netmusician.org
j...@netmusician.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool resilver - error history

2009-11-09 Thread Gschwandl Marcel HSLU TA




Am 09/11/2009 09:57 schrieb Thomas Maier-Komor unter
tho...@maier-komor.de:

 Marcel Gschwandl schrieb:
 Hi all!
 
 I'm running a Solaris 10 Update 6 (10/08) system and had to resilver a zpool.
 It's now showing
 
 snip
 scrub: resilver completed after 9h0m with 21 errors on Wed Nov  4 22:07:49
 2009
 /snip
 
 but I haven't found an option to see what files where affected, Is there any
 way to do that?
 
 Thanks in advance
 
 Marcel
 
 Try
 
 zpool status -v poolname

I already tried that, it only gives me

snip
errors: No known data errors
/snip

During the resilver it showed me some files but not after finishing it.

Thanks anyway


smime.p7s
Description: S/MIME cryptographic signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Accidentally mixed-up disks in RAIDZ

2009-11-09 Thread Leandro Vanden Bosch

No, Chris, I didn't export the pool becasue I didn't expect this to
happen. It's an excellent suggestion, so I'll try it when I get my
hands on the machine.

Thank you.

Leandro.




De: Chris Murray chrismurra...@gmail.com
Para: Leandro Vanden Bosch l_vbo...@yahoo.com.ar
Enviado: sábado, 7 de noviembre, 2009 19:13:33
Asunto: RE: [zfs-discuss] Accidentally mixed-up disks in RAIDZ

 
Did you export the pool before unplugging the drives? I've had
occasions in the past where ZFS does get mixed-up if the machine is powered up
with the drives of a currently imported pool, in the wrong order. The solution
in the end was to power up without the drives, export the pool, and then import.
 
Chris
 
From:zfs-discuss-boun...@opensolaris.org
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Leandro Vanden
Bosch
Sent: 07 November 2009 20:28
To: Tim Cook
Cc: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] Accidentally mixed-up disks in RAIDZ
 
Thanks
Tim for your answer!

I'll try it in afew hours and post the outcome.

Regards,

Leandro.
 


 
De:Tim Cook t...@cook.ms
Para: Leandro Vanden Bosch l_vbo...@yahoo.com.ar
CC: zfs-discuss@opensolaris.org
Enviado: sábado, 7 de noviembre, 2009 16:40:55
Asunto: Re: [zfs-discuss] Accidentally mixed-up disks in RAIDZ



On Sat, Nov 7, 2009 at 1:38 PM, Leandro Vanden Bosch l_vbo...@yahoo.com.ar
wrote:
Hello to you all,

Here's the situation:

While doing a case replacement in my home storage server I accidentally removed
the post-it with the disk number from my three 1TB disks before connecting them
back to the corresponding SATA connector. 
The issue now is that I don't know in which order they should be connected.

Do any of you know how can I _safely_ bring the zpool on-line?

I didn't plugged'em in yet becasue I'm afraid of losing some valueable personal
information.

Thanks in advance.

Leandro.
 


 


Of course, it doesn't matter which drive is plugged in where.  When you
import a pool, zfs scans the headers of each disk to verify if they're part of
a pool or not, and if they are, does the import.

--Tim
 
 
 


 

Encontra las
mejores recetas con Yahoo! Cocina. 
http://ar.mujer.yahoo.com/cocina/


  Yahoo! Cocina

Encontra las mejores recetas con Yahoo! Cocina.


http://ar.mujer.yahoo.com/cocina/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] PSARC recover files?

2009-11-09 Thread Orvar Korvar

This new PSARC putback that allows to rollback to an earlier valid uber block 
is good.

This immediately raises a question: could we use this PSARC functionality to 
recover deleted files? Or some variation? I dont need that functionality now, 
but I am just curious...
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool resilver - error history

2009-11-09 Thread Bob Friesenhahn


On Mon, 9 Nov 2009, Gschwandl Marcel HSLU TA wrote:


zpool status -v poolname


I already tried that, it only gives me

snip
errors: No known data errors
/snip


Errors do not necessarily cause data loss.  For example, there may 
have been sufficient redundancy that the error was able to be 
automatically repaired and so there was no data loss.


Metadata always has a redundant copy, and if you are using something 
like raidz2, then your data still has a redundant copy while 
resilvering a disk.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] can't delete a zpool

2009-11-09 Thread Michael Barrett


OpenSolaris 2009.06

I have a ST2540 Fiber Array directly attached to a X4150.  There is a 
zpool on the fiber device.  The zpool went into a faulted state, but I 
can't seem to get it back via scrub or even delete it?  Do I have to 
re-install the entire OS if I want to use that device again?


Thanks,
Mike


# zpool list
NAMESIZE   USED  AVAILCAP  HEALTH  ALTROOT
fc-disk-  -  -  -  FAULTED  -
rpool68G  23.3G  44.7G34%  ONLINE  -
scsi-disk   544G97K   544G 0%  ONLINE  -



# zpool status fc-disk
  pool: fc-disk
 state: UNAVAIL
status: One or more devices could not be used because the label is missing
or invalid.  There are insufficient replicas for the pool to 
continue

functioning.
action: Destroy and re-create the pool from a backup source.
   see: http://www.sun.com/msg/ZFS-8000-5E
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
fc-disk  UNAVAIL  0 0 
   0  insufficient replicas
  c0t600A0B8000389BC904524A7AF4BAd0  UNAVAIL  0 0 
   0  corrupted data




# zpool destroy fc-disk
internal error: Invalid argument
Abort (core dumped)
r...@vdi-storage:~#
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] PSARC recover files?

2009-11-09 Thread Rob Logan



frequent snapshots offer outstanding oops protection.

Rob
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] can't delete a zpool

2009-11-09 Thread Orvar Korvar

I had the same problem recently onb125. I had a one disc zpool Movies. And 
shutdown the computer. Removed the disc Movies and inserted another one disc 
zpool Misc. Booted and imported the Misc zpool. But the Movies zpool showed 
exactly the same behaviour as you report. The Movies zpool would not be 
imported, nor destroyed. 

I dont remember how I solved the problem, but I think I inserted the Movies 
zpool disc again and then exported it, before removed the disc. Or something 
similar. Maybe you could try to dd the disc with zeroes and then create a new 
zpool?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] PSARC recover files?

2009-11-09 Thread Ellis, Mike

Maybe to create snapshots after the fact as a part of some larger disaster 
recovery effort.
(What did my pool/file-system look like at 10am?... Say 30-minutes before the 
database barffed on itself...)

With some enhancements might this functionality be extendable into a poor 
man's CDP offering that won't protect against (non-redundant) hardware 
failures, but can provide some relieve in App/Human creativity.

Seems like one of those things you never really need... Until you have to that 
one time, at which point nothing else will do.

One would think that using zdb and friends it might be possible to walk the 
chain of tx-logs backwards and each good/whole one could be a valid 
recover/reset-point.

--

This raises a more fundamental question that perhaps someone can comment on. 
Does ZFS's COW follow a fairly strict last released-block, last overwritten 
model (keeping a maximum buffer of in tact data), or do previously used 
blocks get overwritten largely based on block/physical location, 
fragmentation/best-fit, etc?). In cases of blank disks/LUNs, does for instance 
a 1TB drive get completely COW-ed onto its blank-space, or does zfs re-use 
previously used (and freed) space before burning through then entire disk-space?

Thanks,

 -- MikeE


-Original Message-
From: zfs-discuss-boun...@opensolaris.org 
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Orvar Korvar
Sent: Monday, November 09, 2009 8:36 AM
To: zfs-discuss@opensolaris.org
Subject: [zfs-discuss] PSARC recover files?

This new PSARC putback that allows to rollback to an earlier valid uber block 
is good.

This immediately raises a question: could we use this PSARC functionality to 
recover deleted files? Or some variation? I dont need that functionality now, 
but I am just curious...
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] MPxIO and removing physical devices

2009-11-09 Thread Maidak Alexander J

I'm not sure if this is exactly what you're looking for but check out the work 
around in this bug:

http://bugs.opensolaris.org/view_bug.do;jsessionid=9011b9dacffa0b615db182bbcd7b?bug_id=6559281

Basically Look through cfgadm -al and run the following command on the 
unusable attachment points, Example: 

cfgadm -o unusable_FCP_dev -c unconfigure c2::5005076801400525 

You might also try the Storage-Discuss list.

-Alex

-Original Message-
From: zfs-discuss-boun...@opensolaris.org 
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Karl Katzke
Sent: Tuesday, November 03, 2009 3:11 PM
To: zfs-discuss@opensolaris.org
Subject: [zfs-discuss] MPxIO and removing physical devices

I am a bit of a Solaris newbie. I have a brand spankin' new Solaris 10u8 
machine (x4250) that is running an attached J4400 and some internal drives. 
We're using multipathed SAS I/O (enabled via stmsboot), so the device mount 
points have been moved off from their normal c0t5d0 to long strings -- in the 
case of c0t5d0, it's now /dev/rdsk/c6t5000CCA00A274EDCd0. (I can see the 
cross-referenced devices with stmsboot -L.)

Normally, when replacing a disk on a Solaris system, I would run cfgadm -c 
unconfigure c0::dsk/c0t5d0. However, cfgadm -l does not list c6, nor does it 
list any disks. In fact, running cfgadm against the places where I think things 
are supposed to live gets me the following:

bash# cfgadm -l /dev/rdsk/c0t5d0
Ap_Id Type Receptacle Occupant Condition
/dev/rdsk/c0t5d0: No matching library found

bash# cfgadm -l /dev/rdsk/c6t5000CCA00A274EDCd0
cfgadm: Attachment point not found

bash# cfgadm -l /dev/dsk/c6t5000CCA00A274EDCd0
Ap_Id  Type Receptacle   Occupant Condition
/dev/dsk/c6t5000CCA00A274EDCd0: No matching library found

bash# cfgadm -l c6t5000CCA00A274EDCd0
Ap_Id Type Receptacle Occupant Condition
c6t5000CCA00A274EDCd0: No matching library found

I ran devfsadm -C -v and it removed all of the old attachment points for the 
/dev/dsk/c0t5d0 devices and created some for the c6 devices. Running cfgadm -al 
shows a c0, c4, and c5 -- these correspond to the actual controllers, but no 
devices are attached to the controllers. 

I found an old email on this list about MPxIO that said the solution was 
basically to yank the physical device after making sure that no I/O was 
happening to it. While this worked and allowed us to return the device to 
service as a spare in the zpool it inhabits, more concerning was what happened 
when we ran mpathadm list lu after yanking the device and returning it to 
service: 

-- 

bash# mpathadm list lu
/dev/rdsk/c6t5000CCA00A2A9398d0s2
Total Path Count: 1
Operational Path Count: 1
/dev/rdsk/c6t5000CCA00A29EE2Cd0s2
Total Path Count: 1
Operational Path Count: 1
/dev/rdsk/c6t5000CCA00A2BDBFCd0s2
Total Path Count: 1
Operational Path Count: 1
/dev/rdsk/c6t5000CCA00A2A8E68d0s2
Total Path Count: 1
Operational Path Count: 1
/dev/rdsk/c6t5000CCA00A0537ECd0s2
Total Path Count: 1
Operational Path Count: 1
mpathadm: Error: Unable to get configuration information.
mpathadm: Unable to complete operation

(Side note: Some of the disks are single path via an internal controller, and 
some of them are multi path in the J4400  via two external controllers.) 

A reboot fixed the 'issue' with mpathadm and it now outputs complete data. 

 

So -- how do I administer and remove physical devices that are in 
multipath-managed controllers on Solaris 10u8 without breaking multipath and 
causing configuration changes that interfere with the services and devices 
attached via mpathadm and the other voodoo and black magic inside? I can't seem 
to find this documented anywhere, even if the instructions to enable 
multipathing with stmsboot -e were quite complete and worked well! 

Thanks,
Karl Katzke



-- 

Karl Katzke
Systems Analyst II
TAMU - RGS


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] PSARC recover files?

2009-11-09 Thread Rob Logan



 Maybe to create snapshots after the fact

how does one quiesce a drive after the fact?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] RAID-Z and virtualization

2009-11-09 Thread Toby Thain



On 8-Nov-09, at 12:20 PM, Joe Auty wrote:


Tim Cook wrote:


On Sun, Nov 8, 2009 at 2:03 AM, besson3c j...@netmusician.org wrote:
...

Why not just convert the VM's to run in virtualbox and run Solaris  
directly on the hardware?




That's another possibility, but it depends on how Virtualbox stacks  
up against VMWare Server. At this point a lot of planning would be  
necessary to switch to something else, although this is possibility.


How would Virtualbox stack up against VMWare Server? Last I checked  
it doesn't have a remote console of any sort, which would be a deal  
breaker. Can I disable allocating virtual memory to Virtualbox VMs?  
Can I get my VMs to auto boot in a specific order at runlevel 3?  
Can I control my VMs via the command line?


Yes you certainly can. Works well, even for GUI based guests, as  
there is vm-level VRDP (VNC/Remote Desktop) access as well as  
whatever remote access the guest provides.





I thought Virtualbox was GUI only, designed for Desktop use primarily?


Not at all. Read up on VBoxHeadless.

--Toby



This switch will only make sense if all of this points to a net  
positive.





--Tim



--
Joe Auty
NetMusician: web publishing software for musicians
http://www.netmusician.org
j...@netmusician.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] PSARC recover files?

2009-11-09 Thread Bryan Allen

+--
| On 2009-11-09 12:18:04, Ellis, Mike wrote:
| 
| Maybe to create snapshots after the fact as a part of some larger disaster 
recovery effort.
| (What did my pool/file-system look like at 10am?... Say 30-minutes before the 
database barffed on itself...)
| 
| With some enhancements might this functionality be extendable into a poor 
man's CDP offering that won't protect against (non-redundant) hardware 
failures, but can provide some relieve in App/Human creativity.

Alternatively, you can write a cronjob/service that takes snapshots of your 
important
filesystems. I take hourly snaps of our all our homedirs, and five-minute
snaps of our database volumes (InnoDB and Postgres both recover adequately; I
have used these snaps to build recovery zones to pull accidentally deleted data
from before; good times).

Look at OpenSolaris' Time Slider service, although writing something that does
this is pretty trivial (we use a Perl program with YAML configs launched by
cron every minute). My one suggestion would be to ensure the automatically
taken snaps have a unique name (@auto, or whatever), so you can do bulk expiry
tomorrow or next week without worry.

Cheers.
-- 
bda
cyberpunk is dead. long live cyberpunk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ..and now ZFS send dedupe

2009-11-09 Thread Nigel Smith

More ZFS goodness putback before close of play for snv_128.

  http://mail.opensolaris.org/pipermail/onnv-notify/2009-November/010768.html

  http://hg.genunix.org/onnv-gate.hg/rev/216d8396182e

Regards
Nigel Smith
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] marvell88sx2 driver build126

2009-11-09 Thread Cindy Swearingen


Hi,

I can't find any bug-related issues with marvell88sx2 in b126.

I looked over Dave Hollister's shoulder while he searched for
marvell in his webrevs of this putback and nothing came up:

 driver change with build 126?
not for the SATA framework, but for HBAs there is:
http://hub.opensolaris.org/bin/view/Community+Group+on/2009093001

I will find a thumper, load build 125, create a raidz pool, and
upgrade to b126.

I'll also send the error messages that Tim provided to someone who
works in the driver group.

Thanks,

Cindy

On 11/07/09 14:33, Orvar Korvar wrote:

I saw the same checksum error problem when I booted into b126. I havent dared 
try b126 again, I use b125 now, without problems. Here is my hardware
Intel Q9450 + P45 Gigabyte EP45-DS3P motherboard + Ati 4850
I have the same AOC SATA controller card. And some Samsung Spinpoint F1, 1TB 
drives. Brand new.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ..and now ZFS send dedupe

2009-11-09 Thread Brent Jones

On Mon, Nov 9, 2009 at 12:45 PM, Nigel Smith
nwsm...@wilusa.freeserve.co.uk wrote:
 More ZFS goodness putback before close of play for snv_128.

  http://mail.opensolaris.org/pipermail/onnv-notify/2009-November/010768.html

  http://hg.genunix.org/onnv-gate.hg/rev/216d8396182e

 Regards
 Nigel Smith
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Are these recent developments due to help/support from Oracle?
Or is it business as usual for ZFS developments?

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ..and now ZFS send dedupe

2009-11-09 Thread Eric Schrock


On 11/09/09 12:58, Brent Jones wrote:


Are these recent developments due to help/support from Oracle?


No.


Or is it business as usual for ZFS developments?


Yes.

- Eric

--
Eric Schrock, Fishworkshttp://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ..and now ZFS send dedupe

2009-11-09 Thread Roman Naumenko

Interesting stuff.

By the way, is there  a place to watch lated news like this on zfs/opensolaris?
rss maybe?

--
Roman
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ..and now ZFS send dedupe

2009-11-09 Thread James C. McPherson


Roman Naumenko wrote:

Interesting stuff.

By the way, is there  a place to watch lated news like this on zfs/opensolaris?
rss maybe?



You could subscribe to onnv-not...@opensolaris.org...


James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs inotify?

2009-11-09 Thread Andrew Daugherity

I'd hoped this script would work for me as a snapshot diff script, but it 
seems that bart doesn't play well with large filesystems (don't know the 
cutoff, but my zfs pools (other than rpool) are all well over 4TB).  

'bart create' fails immediately with a Value too large for defined data type 
error, and this is in fact mentioned in the Solaris 10 10/09 release notes:

Possible Error With 32-bit Applications Getting File System State on Large File 
Systems (6468905)

When run on large file systems, for example ZFS, applications using statvfs(2) 
or statfs(2) to get information about the state of the file system exhibit an 
error. The following error message is displayed:

Value too large for defined data type

Workaround: Applications should use statvfs64() instead.

from 
http://docs.sun.com/app/docs/doc/821-0381/gdzmr?l=ena=view

and in fact, if I invoke bart via truss, I see it calls statvfs() and fails.  
Way to keep up with the times, Sun!


Is there a 64-bit version of bart, or a better recommendation for comparing 
snapshots?  My current backup strategy uses rsync, which I'd like to replace 
with zfs send/receive, but I need a way to see what changed in the past day.

Thanks,
 

Andrew Daugherity
Systems Analyst
Division of Research  Graduate Studies
Texas AM University






 Trevor Pretty trevor_pre...@eagle.co.nz 10/26/2009 5:16 PM  
Paul

Being a script hacker like you the only kludge I can think of.

A script that does something like

ls  /tmp/foo
sleep 
ls /tmp/foo.new
diff /tmp/foo /tmp/foo.new /tmp/files_that_have_changed
mv /tmp/foo.new /tmp/foo

Or you might be able to knock something up with bart nd zfs snapshots. I did 
write this which may help?

#!/bin/sh

#set -x

# Note: No implied warranty etc. applies. 
#   Don't cry if it does not work. I'm an SE not a programmer!
#
###
#
# Version 29th Jan. 2009
#
# GOAL: Show what files have changed between snapshots
#
# But of course it could be any two directories!!
#
###
#

## Set some variables
#
SCRIPT_NAME=$0
FILESYSTEM=$1
SNAPSHOT=$2
FILESYSTEM_BART_FILE=/tmp/filesystem.$$
SNAPSHOT_BART_FILE=/tmp/snapshot.$$
CHANGED_FILES=/tmp/changes.$$


## Declare some commands (just in case PATH is wrong, like cron)
#
BART=/bin/bart


## Usage
# 
Usage()
{
echo 
echo 
echo Usage: $SCRIPT_NAME -q filesystem snapshot 
echo 
echo  -q will stop all echos and just list the changes
echo 
echo Examples
echo $SCRIPT_NAME /home/fred/home/.zfs/snapshot/fred 
echo $SCRIPT_NAME . /home/.zfs/snapshot/fred 
echo 
echo 
exit 1
}

### Main Part ###


## Check Usage
#
if [ $# -ne 2 ]; then
Usage
fi

## Check we have different directories
#
if [ $1 = $2 ]; then
Usage
fi


##  Handle dot
#
if [ $FILESYSTEM = . ]; then
cd $FILESYSTEM ; FILESYSTEM=`pwd`
fi
if [ $SNAPSHOT = . ]; then
cd $SNAPSHOT ; SNAPSHOT=`pwd`
fi

## Check the filesystems exists It should be a directory
#  and it should have some files
#
for FS in $FILESYSTEM $SNAPSHOT
do
if [ ! -d $FS ]; then
echo 
echo ERROR file system $FS does not exist
echo 
exit 1
fi
if [ X`/bin/ls $FS` = X ]; then
echo 
echo ERROR file system $FS seems to be empty
exit 1
echo 
fi
done



## Create the bart files
#

echo 
echo Creating bart file for $FILESYSTEM can take a while..
cd $FILESYSTEM ; $BART create -R .  $FILESYSTEM_BART_FILE
echo 
echo Creating bart file for $SNAPSHOT can take a while..
cd $SNAPSHOT ; $BART create -R .  $SNAPSHOT_BART_FILE


## Compare them and report the diff
#
echo 
echo Changes
echo 
$BART compare -p $FILESYSTEM_BART_FILE $SNAPSHOT_BART_FILE | awk '{print $1}'  
$CHANGED_FILES
/bin/more $CHANGED_FILES
echo 
echo 
echo 

## Tidy kiwi
#
/bin/rm $FILESYSTEM_BART_FILE
/bin/rm $SNAPSHOT_BART_FILE
/bin/rm $CHANGED_FILES

exit 0


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ..and now ZFS send dedupe

2009-11-09 Thread Craig S. Bell

Roman, I like to check here for recent putbacks:  
http://hg.genunix.org/onnv-gate.hg/shortlog

To see new cases:  http://arc.opensolaris.org/caselog/PSARC/

Also, to see what should appear in upcoming builds (although not recently 
updated):  http://hub.opensolaris.org/bin/view/Community+Group+on/flag-days

Enjoy...  -cheers, CSB
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs inotify?

2009-11-09 Thread Robert Thurlow


Andrew Daugherity wrote:


if I invoke bart via truss, I see it calls statvfs() and fails.  Way to keep up 
with the times, Sun!


% file /bin/truss /bin/amd64/truss

/bin/truss: ELF 32-bit LSB executable 80386 Version 1 [FPU], 
dynamically linked, not stripped, no debugging information available


/bin/amd64/truss:   ELF 64-bit LSB executable AMD64 Version 1 [SSE2 
SSE FXSR CMOV FPU], dynamically linked, not stripped, no debugging 
information available


Rob T
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ..and now ZFS send dedupe

2009-11-09 Thread James C. McPherson


Craig S. Bell wrote:

Roman, I like to check here for recent putbacks:  
http://hg.genunix.org/onnv-gate.hg/shortlog

To see new cases:  http://arc.opensolaris.org/caselog/PSARC/

Also, to see what should appear in upcoming builds (although not recently 
updated):  http://hub.opensolaris.org/bin/view/Community+Group+on/flag-days


The flag days page has not been updated since the switch
to XWiki, it's on my todo list but I don't have an ETA
for when it'll be done.



James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ..and now ZFS send dedupe

2009-11-09 Thread James C. McPherson


Roman Naumenko wrote:

James C. McPherson wrote, On 09-11-09 04:40 PM:

Roman Naumenko wrote:
 

Interesting stuff.

By the way, is there  a place to watch lated news like this on 
zfs/opensolaris?

rss maybe?


You could subscribe to onnv-not...@opensolaris.org...

James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcphttp://www.jmcp.homeunix.com/blog
  

Thanks, James.

What is the subscription process? Just to send email?



http://mail.opensolaris.org/mailman/listinfo/onnv-notify
covers what's necessary (and I see you found it already).

cheers,
James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] CR6894234 -- improved sgid directory compatibility with non-Solaris NFS clients

2009-11-09 Thread Paul B. Henson

On Fri, 6 Nov 2009, James Andrewartha wrote:

 How about attacking it the other way? Sign the SCA, get a sponsor and put
 the fix into OpenSolaris, then sustaining just have to backport it.
 http://hub.opensolaris.org/bin/view/Main/participate

Do you mean the samba bug or the NFS bug?

For the samba bug, I've already submitted a patch to fix the problem.

For the NFS bug, while I have in the past pursued such options with
open-source software, considering Solaris 10 is a commercial product for
which we're paying a fairly substantial cost on for support, I'd really
prefer they fix it themselves...

 Also, since you know it's a NFS server issue now, have you tried asking
 on nfs-discuss?

Yup:

http://opensolaris.org/jive/thread.jspa?messageID=430745

No responses...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs inotify?

2009-11-09 Thread A Darren Dunham

On Mon, Nov 09, 2009 at 03:25:02PM -0700, Robert Thurlow wrote:
 Andrew Daugherity wrote:
 
 if I invoke bart via truss, I see it calls statvfs() and fails.  Way to 
 keep up with the times, Sun!
 
 % file /bin/truss /bin/amd64/truss
 
 /bin/truss: ELF 32-bit LSB executable 80386 Version 1 [FPU], 
 dynamically linked, not stripped, no debugging information available
 
 /bin/amd64/truss:   ELF 64-bit LSB executable AMD64 Version 1 [SSE2 
 SSE FXSR CMOV FPU], dynamically linked, not stripped, no debugging 
 information available

I'm pretty sure he means that 'bart' is failing, not truss.  

/bin/truss is just a link to /usr/lib/isaexec, which will run the 64-bit
version when appropriate.

-- 
Darren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs inotify?

2009-11-09 Thread Richard Elling



On Nov 9, 2009, at 2:06 PM, Andrew Daugherity wrote:

I'd hoped this script would work for me as a snapshot diff script,  
but it seems that bart doesn't play well with large filesystems  
(don't know the cutoff, but my zfs pools (other than rpool) are all  
well over 4TB).


'bart create' fails immediately with a Value too large for defined  
data type error, and this is in fact mentioned in the Solaris 10  
10/09 release notes:


Possible Error With 32-bit Applications Getting File System State on  
Large File Systems (6468905)


When run on large file systems, for example ZFS, applications using  
statvfs(2) or statfs(2) to get information about the state of the  
file system exhibit an error. The following error message is  
displayed:


Value too large for defined data type

Workaround: Applications should use statvfs64() instead.

from
http://docs.sun.com/app/docs/doc/821-0381/gdzmr?l=ena=view

and in fact, if I invoke bart via truss, I see it calls statvfs()  
and fails.  Way to keep up with the times, Sun!



Is there a 64-bit version of bart, or a better recommendation for  
comparing snapshots?  My current backup strategy uses rsync, which  
I'd like to replace with zfs send/receive, but I need a way to see  
what changed in the past day.


find /filesystem -mtime -1
 -- richard



Thanks,


Andrew Daugherity
Systems Analyst
Division of Research  Graduate Studies
Texas AM University







Trevor Pretty trevor_pre...@eagle.co.nz 10/26/2009 5:16 PM 

Paul

Being a script hacker like you the only kludge I can think of.

A script that does something like

ls  /tmp/foo
sleep
ls /tmp/foo.new
diff /tmp/foo /tmp/foo.new /tmp/files_that_have_changed
mv /tmp/foo.new /tmp/foo

Or you might be able to knock something up with bart nd zfs  
snapshots. I did write this which may help?


#!/bin/sh

#set -x

# Note: No implied warranty etc. applies.
#   Don't cry if it does not work. I'm an SE not a programmer!
#
###
#
# Version 29th Jan. 2009
#
# GOAL: Show what files have changed between snapshots
#
# But of course it could be any two directories!!
#
###
#

## Set some variables
#
SCRIPT_NAME=$0
FILESYSTEM=$1
SNAPSHOT=$2
FILESYSTEM_BART_FILE=/tmp/filesystem.$$
SNAPSHOT_BART_FILE=/tmp/snapshot.$$
CHANGED_FILES=/tmp/changes.$$


## Declare some commands (just in case PATH is wrong, like cron)
#
BART=/bin/bart


## Usage
#
Usage()
{
   echo 
   echo 
   echo Usage: $SCRIPT_NAME -q filesystem snapshot 
   echo 
   echo  -q will stop all echos and just list the changes
   echo 
   echo Examples
   echo $SCRIPT_NAME /home/fred/home/.zfs/snapshot/fred 
   echo $SCRIPT_NAME . /home/.zfs/snapshot/fred 
   echo 
   echo 
   exit 1
}

### Main Part ###


## Check Usage
#
if [ $# -ne 2 ]; then
   Usage
fi

## Check we have different directories
#
if [ $1 = $2 ]; then
   Usage
fi


##  Handle dot
#
if [ $FILESYSTEM = . ]; then
   cd $FILESYSTEM ; FILESYSTEM=`pwd`
fi
if [ $SNAPSHOT = . ]; then
   cd $SNAPSHOT ; SNAPSHOT=`pwd`
fi

## Check the filesystems exists It should be a directory
#  and it should have some files
#
for FS in $FILESYSTEM $SNAPSHOT
do
   if [ ! -d $FS ]; then
   echo 
   echo ERROR file system $FS does not exist
   echo 
   exit 1
   fi
   if [ X`/bin/ls $FS` = X ]; then
   echo 
   echo ERROR file system $FS seems to be empty
   exit 1
   echo 
   fi
done



## Create the bart files
#

echo 
echo Creating bart file for $FILESYSTEM can take a while..
cd $FILESYSTEM ; $BART create -R .  $FILESYSTEM_BART_FILE
echo 
echo Creating bart file for $SNAPSHOT can take a while..
cd $SNAPSHOT ; $BART create -R .  $SNAPSHOT_BART_FILE


## Compare them and report the diff
#
echo 
echo Changes
echo 
$BART compare -p $FILESYSTEM_BART_FILE $SNAPSHOT_BART_FILE | awk  
'{print $1}'  $CHANGED_FILES

/bin/more $CHANGED_FILES
echo 
echo 
echo 

## Tidy kiwi
#
/bin/rm $FILESYSTEM_BART_FILE
/bin/rm $SNAPSHOT_BART_FILE
/bin/rm $CHANGED_FILES

exit 0


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS + fsck

2009-11-09 Thread Nigel Smith

On Thu Nov 5 14:38:13 PST 2009, Gary Mills wrote:
 It would be nice to see this information at:
 http://hub.opensolaris.org/bin/view/Community+Group+on/126-130
 but it hasn't changed since 23 October.

Well it seems we have an answer:

http://mail.opensolaris.org/pipermail/zfs-discuss/2009-November/033672.html

On Mon Nov 9 14:26:54 PST 2009, James C. McPherson wrote:
 The flag days page has not been updated since the switch
 to XWiki, it's on my todo list but I don't have an ETA
 for when it'll be done.

Perhaps anyone interested in seeing the flags days page
resurrected can petition James to raise the priority on
his todo list.
Thanks
Nigel Smith
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS + fsck

2009-11-09 Thread James C. McPherson


Nigel Smith wrote:

On Thu Nov 5 14:38:13 PST 2009, Gary Mills wrote:

It would be nice to see this information at:
http://hub.opensolaris.org/bin/view/Community+Group+on/126-130
but it hasn't changed since 23 October.


Well it seems we have an answer:

http://mail.opensolaris.org/pipermail/zfs-discuss/2009-November/033672.html

On Mon Nov 9 14:26:54 PST 2009, James C. McPherson wrote:

The flag days page has not been updated since the switch
to XWiki, it's on my todo list but I don't have an ETA
for when it'll be done.


Perhaps anyone interested in seeing the flags days page
resurrected can petition James to raise the priority on
his todo list.


Nigel,
*everybody* is interested in the flag days page. Including me.

Asking me to raise the priority is not helpful.


James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs inotify?

2009-11-09 Thread Andrew Daugherity

 Robert Thurlow robert.thur...@sun.com 11/9/2009 4:25 PM 
% file /bin/truss /bin/amd64/truss
/bin/truss: ELF 32-bit LSB executable 80386 Version 1 [FPU], 
dynamically linked, not stripped, no debugging information available

/bin/amd64/truss:   ELF 64-bit LSB executable AMD64 Version 1 [SSE2 
SSE FXSR CMOV FPU], dynamically linked, not stripped, no debugging 
information available




It doesn't make any difference if I invoke it with the amd64 truss.  The only 
bart binary I can find on the system (Sol 10u8) is /usr/bin/bart, and it 
definitely calls statvfs().  Truss log follows at the end.

I know all about 'find -mtime ...', but that doesn't show which files have been 
deleted, whereas 'rsync -av --delete --backup-dir=`date +%Y%m%d`' does.  (When 
users delete files and then need them restored a week later, it's very helpful 
to know which day they were deleted, as I can avoid running a find that could 
take quite a while.  I think incremental zfs snapshots are a better strategy 
but there are little hurdles like this to be crossed.)

bart (or something faster than running 'gdiff -qr snap1 snap2' on a snapshots 
of a 2.1TB-and-growing FS) seems like a great idea, if I could find a working 
tool.  It looks like dircmp(1) might be a possibility, but I'm open to 
suggestions.  I suppose I could use something like AIDE or tripwire, although 
that seems a bit like swatting a fly with a sledgehammer.


Thanks,

Andrew



and...@imsfs-new:~$ /usr/bin/amd64/truss bart create -R /export/ims  
/tmp/bart-ims
execve(/usr/bin/bart, 0x08047D6C, 0x08047D80)  argc = 4
mmap(0x, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, 
-1, 0) = 0xFEFF
resolvepath(/usr/lib/ld.so.1, /lib/ld.so.1, 1023) = 12
resolvepath(/usr/bin/bart, /usr/bin/bart, 1023) = 13
sysconfig(_CONFIG_PAGESIZE) = 4096
stat64(/usr/bin/bart, 0x08047B00) = 0
open(/var/ld/ld.config, O_RDONLY) Err#2 ENOENT
stat64(/lib/libsec.so.1, 0x080473A0)  = 0
resolvepath(/lib/libsec.so.1, /lib/libsec.so.1, 1023) = 16
open(/lib/libsec.so.1, O_RDONLY)  = 3
mmap(0x0001, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_ALIGN, 3, 0) = 
0xFEFB
mmap(0x0001, 143360, PROT_NONE, 
MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFEF8
mmap(0xFEF8, 50487, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_TEXT, 3, 
0) = 0xFEF8
mmap(0xFEF9D000, 11909, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_INITDATA, 3, 53248) = 0xFEF9D000
mmap(0xFEFA, 8296, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANON, 
-1, 0) = 0xFEFA
munmap(0xFEF8D000, 65536)   = 0
memcntl(0xFEF8, 8844, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
close(3)= 0
stat64(/lib/libmd.so.1, 0x080473A0)   = 0
resolvepath(/lib/libmd.so.1, /lib/libmd.so.1, 1023) = 15
open(/lib/libmd.so.1, O_RDONLY)   = 3
mmap(0xFEFB, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 
0xFEFB
mmap(0x0001, 126976, PROT_NONE, 
MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFEF6
mmap(0xFEF6, 56424, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_TEXT, 3, 
0) = 0xFEF6
mmap(0xFEF7E000, 552, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_INITDATA, 
3, 57344) = 0xFEF7E000
munmap(0xFEF6E000, 65536)   = 0
memcntl(0xFEF6, 1464, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
close(3)= 0
stat64(/lib/libc.so.1, 0x080473A0)= 0
resolvepath(/lib/libc.so.1, /lib/libc.so.1, 1023) = 14
open(/lib/libc.so.1, O_RDONLY)= 3
mmap(0xFEFB, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 
0xFEFB
mmap(0x0001, 1208320, PROT_NONE, 
MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFEE3
mmap(0xFEE3, 1099077, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_TEXT, 
3, 0) = 0xFEE3
mmap(0xFEF4D000, 30183, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_INITDATA, 3, 1101824) = 0xFEF4D000
mmap(0xFEF55000, 4240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANON, 
-1, 0) = 0xFEF55000
munmap(0xFEF3D000, 65536)   = 0
memcntl(0xFEE3, 124080, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
close(3)= 0
stat64(/lib/libavl.so.1, 0x080473A0)  = 0
resolvepath(/lib/libavl.so.1, /lib/libavl.so.1, 1023) = 16
open(/lib/libavl.so.1, O_RDONLY)  = 3
mmap(0xFEFB, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 
0xFEFB
mmap(0x0001, 73728, PROT_NONE, 
MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFEE1
mmap(0xFEE1, 2788, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_TEXT, 3, 
0) = 0xFEE1
mmap(0xFEE21000, 204, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_INITDATA, 
3, 4096) = 0xFEE21000
munmap(0xFEE11000, 65536)   = 0
mmap(0x, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, 
-1, 0)

Re: [zfs-discuss] zfs inotify?

2009-11-09 Thread Richard Elling


Seems to me that you really want auditing.  You can configure the audit
system to only record the events you are interested in.
http://docs.sun.com/app/docs/doc/816-4557/auditov-1?l=ena=view
  -- richard


On Nov 9, 2009, at 4:55 PM, Andrew Daugherity wrote:


Robert Thurlow robert.thur...@sun.com 11/9/2009 4:25 PM 

% file /bin/truss /bin/amd64/truss
/bin/truss: ELF 32-bit LSB executable 80386 Version 1 [FPU],
dynamically linked, not stripped, no debugging information available

/bin/amd64/truss:   ELF 64-bit LSB executable AMD64 Version 1  
[SSE2

SSE FXSR CMOV FPU], dynamically linked, not stripped, no debugging
information available




It doesn't make any difference if I invoke it with the amd64 truss.   
The only bart binary I can find on the system (Sol 10u8) is /usr/bin/ 
bart, and it definitely calls statvfs().  Truss log follows at the  
end.


I know all about 'find -mtime ...', but that doesn't show which  
files have been deleted, whereas 'rsync -av --delete --backup- 
dir=`date +%Y%m%d`' does.  (When users delete files and then need  
them restored a week later, it's very helpful to know which day they  
were deleted, as I can avoid running a find that could take quite a  
while.  I think incremental zfs snapshots are a better strategy but  
there are little hurdles like this to be crossed.)


bart (or something faster than running 'gdiff -qr snap1 snap2' on a  
snapshots of a 2.1TB-and-growing FS) seems like a great idea, if I  
could find a working tool.  It looks like dircmp(1) might be a  
possibility, but I'm open to suggestions.  I suppose I could use  
something like AIDE or tripwire, although that seems a bit like  
swatting a fly with a sledgehammer.



Thanks,

Andrew



and...@imsfs-new:~$ /usr/bin/amd64/truss bart create -R /export/ims  
 /tmp/bart-ims

execve(/usr/bin/bart, 0x08047D6C, 0x08047D80)  argc = 4
mmap(0x, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE| 
MAP_ANON, -1, 0) = 0xFEFF

resolvepath(/usr/lib/ld.so.1, /lib/ld.so.1, 1023) = 12
resolvepath(/usr/bin/bart, /usr/bin/bart, 1023) = 13
sysconfig(_CONFIG_PAGESIZE) = 4096
stat64(/usr/bin/bart, 0x08047B00) = 0
open(/var/ld/ld.config, O_RDONLY) Err#2 ENOENT
stat64(/lib/libsec.so.1, 0x080473A0)  = 0
resolvepath(/lib/libsec.so.1, /lib/libsec.so.1, 1023) = 16
open(/lib/libsec.so.1, O_RDONLY)  = 3
mmap(0x0001, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_ALIGN,  
3, 0) = 0xFEFB
mmap(0x0001, 143360, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE| 
MAP_ANON|MAP_ALIGN, -1, 0) = 0xFEF8
mmap(0xFEF8, 50487, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED| 
MAP_TEXT, 3, 0) = 0xFEF8
mmap(0xFEF9D000, 11909, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED| 
MAP_INITDATA, 3, 53248) = 0xFEF9D000
mmap(0xFEFA, 8296, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED| 
MAP_ANON, -1, 0) = 0xFEFA

munmap(0xFEF8D000, 65536)   = 0
memcntl(0xFEF8, 8844, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
close(3)= 0
stat64(/lib/libmd.so.1, 0x080473A0)   = 0
resolvepath(/lib/libmd.so.1, /lib/libmd.so.1, 1023) = 15
open(/lib/libmd.so.1, O_RDONLY)   = 3
mmap(0xFEFB, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED,  
3, 0) = 0xFEFB
mmap(0x0001, 126976, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE| 
MAP_ANON|MAP_ALIGN, -1, 0) = 0xFEF6
mmap(0xFEF6, 56424, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED| 
MAP_TEXT, 3, 0) = 0xFEF6
mmap(0xFEF7E000, 552, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED| 
MAP_INITDATA, 3, 57344) = 0xFEF7E000

munmap(0xFEF6E000, 65536)   = 0
memcntl(0xFEF6, 1464, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
close(3)= 0
stat64(/lib/libc.so.1, 0x080473A0)= 0
resolvepath(/lib/libc.so.1, /lib/libc.so.1, 1023) = 14
open(/lib/libc.so.1, O_RDONLY)= 3
mmap(0xFEFB, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED,  
3, 0) = 0xFEFB
mmap(0x0001, 1208320, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE| 
MAP_ANON|MAP_ALIGN, -1, 0) = 0xFEE3
mmap(0xFEE3, 1099077, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED| 
MAP_TEXT, 3, 0) = 0xFEE3
mmap(0xFEF4D000, 30183, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED| 
MAP_INITDATA, 3, 1101824) = 0xFEF4D000
mmap(0xFEF55000, 4240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED| 
MAP_ANON, -1, 0) = 0xFEF55000

munmap(0xFEF3D000, 65536)   = 0
memcntl(0xFEE3, 124080, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
close(3)= 0
stat64(/lib/libavl.so.1, 0x080473A0)  = 0
resolvepath(/lib/libavl.so.1, /lib/libavl.so.1, 1023) = 16
open(/lib/libavl.so.1, O_RDONLY)  = 3
mmap(0xFEFB, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED,  
3, 0) = 0xFEFB
mmap(0x0001, 73728, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE| 
MAP_ANON|MAP_ALIGN, -1, 0) = 0xFEE1
mmap(0xFEE1, 2788,

[zfs-discuss] Couple questions about ZFS writes and fragmentation

2009-11-09 Thread Ilya

1. Is it true that because block sizes vary (in powers of 2 of course) on each 
write that there will be very little internal fragmentation?

2. I came upon this statement in a forum post:

[i]ZFS uses 128K data blocks by default whereas other filesystems typically 
use 4K or 8K blocks. This naturally reduces the potential for fragmentation by 
32X over 4k blocks.[/i]

How is this true? I mean, if you have a 128k default block size and you store a 
4k file within that block then you will have a ton of slack space to clear up.

3. Another statement from a post:

[i]the seek time for single-user contiguous access is essentially zero since 
the seeks occur while the application is already busy processing other data. 
When mirror vdevs are used, any device in the mirror may be used to read the 
data.[/i]

All this is saying that is when you are reading off of one physical device you 
will already be seeking for the blocks that you need from the other device so 
the seek time will no longer be an issue right?

4. In terms of where ZFS chooses to write data, is it always going to pick one 
metaslab and write to only free blocks within that metaslab? Or will it go all 
over the place?

5. When ZFS looks for a place to write data, does it look somewhere to 
intelligently see that there are some number of free blocks available within 
this particular metaslab and if so where is this located?

6. Could anyone clarify this post:

[i]ZFS uses a copy-on-write model. Copy-on-write tends to cause fragmentation 
if portions of existing files are updated. If a large portion of a file is 
overwritten in a short period of time, the result should be reasonably 
fragment-free but if parts of the file are updated over a long period of time 
(like a database) then the file is certain to be fragmented. This is not such a 
big problem as it appears to be since such files were already typically 
accessed using random access.[/i]

7. An aside question...I was reading a paper about ZFS and it stated that 
offsets are something like 8 bytes from the first vdev label. Is there any 
reason why the storage pool is after 2 vdev labels?

Thanks guys
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Couple questions about ZFS writes and fragmentation

2009-11-09 Thread Bob Friesenhahn


On Mon, 9 Nov 2009, Ilya wrote:


2. I came upon this statement in a forum post:

[i]ZFS uses 128K data blocks by default whereas other filesystems 
typically use 4K or 8K blocks. This naturally reduces the potential 
for fragmentation by 32X over 4k blocks.[/i]


How is this true? I mean, if you have a 128k default block size and 
you store a 4k file within that block then you will have a ton of 
slack space to clear up.


Short files are given a short block.  Files larger than 128K are diced 
into 128K blocks, but the last block may be shorter.


The fragmentation discussed is fragmentation at the file level.


3. Another statement from a post:

[i]the seek time for single-user contiguous access is essentially 
zero since the seeks occur while the application is already busy 
processing other data. When mirror vdevs are used, any device in the 
mirror may be used to read the data.[/i]


All this is saying that is when you are reading off of one physical 
device you will already be seeking for the blocks that you need from 
the other device so the seek time will no longer be an issue right?


The seek time becomes less of an issue for sequential reads if blocks 
are read from different disks, and the reads are scheduled in advance. 
It still consumes drive IOPS if the disk needs to seek.



6. Could anyone clarify this post:

[i]ZFS uses a copy-on-write model. Copy-on-write tends to cause 
fragmentation if portions of existing files are updated. If a large 
portion of a file is overwritten in a short period of time, the 
result should be reasonably fragment-free but if parts of the file 
are updated over a long period of time (like a database) then the 
file is certain to be fragmented. This is not such a big problem as 
it appears to be since such files were already typically accessed 
using random access.[/i]


The point here is that zfs buffers unwritten data in memory for up to 
30 seconds.  With a large amount of buffered data, zfs is able to 
write the data in a more sequential and better-optimized fashion, 
while wasting fewer IOPS.  Databases usually use random I/O and 
synchronous writes, which tends to scramble the data layout on disk 
with a copy-on-write model.  Zfs is not optimized for database 
performance.  On the other hand, the copy-on-write model reduces the 
chance of database corruption if there is a power failure or system 
crash.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] CIFS crashes when accessed with Adobe Photoshop Elements 6.0 via Vista

2009-11-09 Thread scott smallie

I have a repeatable test case for this indecent.Every time I access my ZFS 
cifs shared file system with Adobe Photoshop elements 6.0 via my Vista 
workstation the OpenSolaris server stops serving CIFS.  The share functions as 
expected for all other CIFS operations.



-Begin Configuration Data-
-scotts:zelda# cat /etc/release
 OpenSolaris 2009.06 snv_111b X86
   Copyright 2009 Sun Microsystems, Inc.  All Rights Reserved.
Use is subject to license terms.
  Assembled 07 May 2009
-scotts:zelda# uname -a
SunOS zelda 5.11 snv_111b i86pc i386 i86pc
-scotts:zelda#


-scotts:zelda# prtdiag
System Configuration: IBM IBM eServer 325 -[8835W11]-
BIOS Configuration: IBM IBM BIOS Version 1.36 -[M1E136AUS-1.36]- 01/19/05
BMC Configuration: IPMI 1.5 (KCS: Keyboard Controller Style)

 Processor Sockets 

Version  Location Tag
 --
Opteron  CPU0-Socket 940
Opteron  CPU1-Socket 940

 Memory Device Sockets 

TypeStatus Set Device Locator  Bank Locator
--- -- --- --- 
DRAMin use 1   DDR1Bank 0
DRAMin use 1   DDR2Bank 0
DRAMin use 2   DDR3Bank 1
DRAMin use 2   DDR4Bank 1
DRAMin use 3   DDR5Bank 2
DRAMin use 3   DDR6Bank 2

 On-Board Devices =

 Upgradeable Slots 

ID  StatusType Description
--- -  
1   in usePCI-XPCI-X Slot 1
2   available PCI-XPCI-X Slot 2



-scotts:zelda# zpool status
  pool: ary01
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
ary01   ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c5t8d0  ONLINE   0 0 0
c5t5d0  ONLINE   0 0 0
c5t4d0  ONLINE   0 0 0
c5t3d0  ONLINE   0 0 0
c5t2d0  ONLINE   0 0 0
c5t1d0  ONLINE   0 0 0
c5t0d0  ONLINE   0 0 0
c6t8d0  ONLINE   0 0 0
c6t5d0  ONLINE   0 0 0
c6t4d0  ONLINE   0 0 0
c6t3d0  ONLINE   0 0 0
c6t2d0  ONLINE   0 0 0
spares
  c6t1d0AVAIL

errors: No known data errors

  pool: rpool
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
rpool   ONLINE   0 0 0
  c3d0s0ONLINE   0 0 0

errors: No known data errors

-scotts:zelda#  zfs get all ary01/media
NAME PROPERTY   VALUE  SOURCE
ary01/media  type   filesystem -
ary01/media  creation   Fri Jul 11 23:24 2008  -
ary01/media  used   347G   -
ary01/media  available  1.09T  -
ary01/media  referenced 344G   -
ary01/media  compressratio  1.00x  -
ary01/media  mountedyes-
ary01/media  quota  none   default
ary01/media  reservationnone   default
ary01/media  recordsize 128K   default
ary01/media  mountpoint /shared_media  local
ary01/media  sharenfs   on local
ary01/media  checksum   on default
ary01/media  compressionoffdefault
ary01/media  atime  on default
ary01/media  deviceson default
ary01/media  exec   on default
ary01/media  setuid on default
ary01/media  readonly   offdefault
ary01/media  zoned  offlocal
ary01/media  snapdirvisiblelocal
ary01/media  aclmodegroupmask  default
ary01/media  aclinherit restricted default
ary01/media  canmount   on default
ary01/media  shareiscsi offdefault
ary01/media  xattr  on default
ary01/media  copies 1  default
ary01/media  version3  -
ary01/media  utf8only

[zfs-discuss] How to purge bad data from snapshots

2009-11-09 Thread BJ Quinn

So, I had a fun ZFS learning experience a few months ago.  A server of mine 
suddenly dropped off the network, or so it seemed.  It was an OpenSolaris 
2008.05 box serving up samba shares from a ZFS pool, but it noticed too many 
checksum errors and so decided it was time to take the pool down so as to save 
the (apparently) dying disk from further damage.  Seemed inconvenient at the 
time, but a in hindsight that's a cool feature.  Haven't actually found any 
problems with the drive (an SSD), which has worked fine ever since.  Bit rot?  
Power failure (we had a lot of those for a while)?  Who knows.  At first I was 
afraid my ZFS pool had corrupted itself until I realized that it was a unique 
feature of ZFS actually protecting me from further damage rather than ZFS 
itself being the problem.

At any rate, in this case, the corruption managed to make it over to my backup 
server replicated with SNDR.  One of the corrupted blocks happened to be 
referenced by every single one of my daily snapshots going back nearly a year.  
I had no mirrored storage and copies set to 1.  Arguably a bad setup, I'm sure, 
but that's why I had a replicated server.  At any rate, I didn't care about the 
file referencing the corrupt block.  I would just as well have deleted it, but 
it was still referenced by all the snapshots.  It was a crisis at the time, so 
I just switched over to my replicated server (in case the drive on the primary 
server actually was bad) and deleted the files containing corrupt blocks and 
then deleted all the snapshots so zfs would quit unmounting the pool and just 
to get going again.  Things have been fine ever since, but I still wonder - is 
there something different that I could have done to get rid of the corrupt 
blocks without losing all my snapshots (could have r
 estored them from backup, but it would have taken forever).  I guess I could 
just do clones and then have the capability of deleting stuff, but then I don't 
believe I'd be able to back the thing up - if I don't do incremental zfs 
send/recv, the backup takes over 24 hours since there's so many snapshots, and 
I wouldn't think clones work with incremental zfs send/recv (especially if you 
start deleting files willy-nilly).  Am I just missing something altogether, or 
is restoring from backup the only option?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Couple questions about ZFS writes and fragmentation

2009-11-09 Thread Richard Elling


On Nov 9, 2009, at 6:42 PM, Ilya wrote:

1. Is it true that because block sizes vary (in powers of 2 of  
course) on each write that there will be very little internal  
fragmentation?


Block size limit (aka recordsize) is in powers of 2. Block sizes are  
as needed.



2. I came upon this statement in a forum post:

[i]ZFS uses 128K data blocks by default whereas other filesystems  
typically use 4K or 8K blocks. This naturally reduces the potential  
for fragmentation by 32X over 4k blocks.[/i]


How is this true? I mean, if you have a 128k default block size and  
you store a 4k file within that block then you will have a ton of  
slack space to clear up.


If a file only uses 4 KB, ZFS only allocates 4 KB to the file.


3. Another statement from a post:

[i]the seek time for single-user contiguous access is essentially  
zero since the seeks occur while the application is already busy  
processing other data. When mirror vdevs are used, any device in the  
mirror may be used to read the data.[/i]


All this is saying that is when you are reading off of one physical  
device you will already be seeking for the blocks that you need from  
the other device so the seek time will no longer be an issue right?


This comment makes no sense to me. By the time the I/O request is  
handled by
the disk, the relationship to a user is long gone. Also, seeks only  
apply to HDDs.
Either side of a mirror can be used for reading... that part makes  
sense.




4. In terms of where ZFS chooses to write data, is it always going  
to pick one metaslab and write to only free blocks within that  
metaslab? Or will it go all over the place?


Yes :-)

5. When ZFS looks for a place to write data, does it look somewhere  
to intelligently see that there are some number of free blocks  
available within this particular metaslab and if so where is this  
located?


Yes, of course.


6. Could anyone clarify this post:

[i]ZFS uses a copy-on-write model. Copy-on-write tends to cause  
fragmentation if portions of existing files are updated. If a large  
portion of a file is overwritten in a short period of time, the  
result should be reasonably fragment-free but if parts of the file  
are updated over a long period of time (like a database) then the  
file is certain to be fragmented. This is not such a big problem as  
it appears to be since such files were already typically accessed  
using random access.[/i]


YMMV.  Allan Packer  Neel did a study on the affect of this on MySQL.  
But

some databases COW themselves, so it is not a given that the application
will read data sequentially.
Video http://www.youtube.com/watch?v=a31NhwzlAxs
Slides 
http://blogs.sun.com/realneel/resource/MySQL_Conference_2009_ZFS_MySQL.pdf

7. An aside question...I was reading a paper about ZFS and it stated  
that offsets are something like 8 bytes from the first vdev label.  
Is there any reason why the storage pool is after 2 vdev labels?


Historically, the first 8 KB of a slice was used to store the disk  
label.

In the bad old days, people writing applications often did not know this
and would clobber the label.  So the first 8 KB of the ZFS label is not
used to preserve any disk label.

The storage pool data starts at an offset of 4 MB, 3.5 MB past the  
second

label. This area is reserved for a boot block.  Where did you see it
documented as starting after the first two labels?
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Couple questions about ZFS writes and fragmentation

2009-11-09 Thread Ilya

Wow, this forum is great and uber-fast in response, appreciate the responses, 
makes sense.

Only, what does ZFS do to write to data? Let's say that you want to write x 
blocks somewhere, is ZFS  going to find a pointer to the space map of some 
metaslab and then write there? Is it going to find a metaslab closest to the 
outside of the HDD for higher bandwidth?

And the label thing, heh, I made a mistake in what I read, you are right. 
Within the vdev array though, after the storage pool location though, it also 
showed more vdev labels coming after it (vdev 1, vdev 2, boot block, storage 
space, vdev 3, vdev4). Would there more vdev labels after #4 or more storage 
space?

Thanks again
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Couple questions about ZFS writes and fragmentation

2009-11-09 Thread Richard Elling


On Nov 9, 2009, at 9:15 PM, Ilya wrote:

Wow, this forum is great and uber-fast in response, appreciate the  
responses, makes sense.


Nothing on TV tonight and all of my stress tests are passing :-)

Only, what does ZFS do to write to data? Let's say that you want to  
write x blocks somewhere, is ZFS  going to find a pointer to the  
space map of some metaslab and then write there? Is it going to find  
a metaslab closest to the outside of the HDD for higher bandwidth?


By default, it does start with the metaslabs on the outer cylinders.
But it may also decide to skip to another metaslab. For example, the
redundant metadata is spread further away.  Similarly, if you have
copies=2 or 3, then those will be spatially diverse as well.

And the label thing, heh, I made a mistake in what I read, you are  
right. Within the vdev array though, after the storage pool location  
though, it also showed more vdev labels coming after it (vdev 1,  
vdev 2, boot block, storage space, vdev 3, vdev4). Would there more  
vdev labels after #4 or more storage space?


The 4th label (label 3) is at the end, modulo 256 KB.
 -- richard



Thanks again
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

43 matches

Mail list logo