[zfs-discuss] Re: Motley group of discs?

2007-05-04 Thread MC
That's a lot of talking without an answer :)

> internal EIDE 320GB (boot drive), internal
250, 200 and 160 GB drives, and an external USB 2.0 600 GB drive.

> So, what's the best zfs configuration in this situation? 

RAIDZ uses disk space like RAID5.  So the best you could do here for redundant 
space is (160 * 4 or 5)-160, and then use the remaining spaces as non-redundant 
or mirrored.

If you want to play with opensolaris and zfs you can do so easily with a vmware 
or parallels virtual machine.  It sounds like that is all you want to do right 
now.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: tape-backup software (was: Very Large Filesystems)

2007-05-04 Thread Andrew Chace
The software that we use for our production backups is compatible with ZFS. I 
cannot comment on it's stability with ZFS or ability to handle multi-TB 
filesystems, as we do not have any ZFS systems in production yet. I can say 
that overall, the software is solid, stable and very well documented. Check out 
the vendor's website: 

http://www.commvault.com
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Motley group of discs?

2007-05-04 Thread Darren . Reed

Al Hopper wrote:


On Fri, 4 May 2007, mike wrote:

 


Isn't the benefit of ZFS that it will allow you to use even the most
unreliable risks and be able to inform you when they are attempting to
corrupt your data?
   



Yes - I won't argue that ZFS can be applied exactly as you state above.
However, ZFS is no substitute for bad practices that include:

- not proactively replacing mechanical components *before* they fail
 



There's a nice side benefit from this one:
- the piece of hardware you retire becomes a backup of "old data"

When I ran lots of older SPARC boxes, I made a point of upgrading
the disks, from 1GB to 4G to 9GB...it was't for disk space but to
put in place newer, quieter, faster, less power hungry drives and
had the added benefit of ensuring that in 2004, the SCA SCSI drive
in the SPARC 5 was made maybe 1 or 2 years ago, not 10 and thus
also less likely to fail.  I still try to do this with PC hard drives today,
but sometimes they fail inside my replacement window :-(

Darren


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] recovered state after system crash

2007-05-04 Thread Neil Perrin

kyusun Chang wrote On 05/04/07 19:34,:

If system crashes some time after last commit of transaction group (TxG), what
happens to the file system transactions since the last commit of TxG


They are lost, unless they were synchronous (see below).


(I presume last commit of TxG represents the last on-disk consistency)?


Correct.


Does ZFS recover all file system transactions which it returned with success
since the last commit of TxG, which implis that ZIL must flush log records for
> each successful file system transaction before it returns to caller so that 
it can replay

the filesystem transactions?


Only synchronous transactions (those forced by O_DSYNC or fsync()) are
written to the intent log.


Blogs on ZIL states (I hope I read it right) that log records are maintained
in-memory and flushed to disk only when 
1) at synchronous write request (does that mean they free in-memory

log after that),


Yes they are then freed in memory


2) when TxG is committed (and free in-memory log).

Thank you for your time.
 
 
This message posted from opensolaris.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] recovered state after system crash

2007-05-04 Thread kyusun Chang
If system crashes some time after last commit of transaction group (TxG), what
happens to the file system transactions since the last commit of TxG
(I presume last commit of TxG represents the last on-disk consistency)?
Does ZFS recover all file system transactions which it returned with success
since the last commit of TxG, which implis that ZIL must flush log records for 
each successful file system transaction before it returns to caller so that it 
can replay
the filesystem transactions?

Blogs on ZIL states (I hope I read it right) that log records are maintained
in-memory and flushed to disk only when 
1) at synchronous write request (does that mean they free in-memory
log after that),
2) when TxG is committed (and free in-memory log).

Thank you for your time.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Motley group of discs?

2007-05-04 Thread Ian Collins
Lee Fyock wrote:
> I didn't mean to kick up a fuss.
>
> I'm reasonably zfs-savvy in that I've been reading about it for a year
> or more. I'm a Mac developer and general geek; I'm excited about zfs
> because it's new and cool.
>
> At some point I'll replace my old desktop machine with something new
> and better -- probably when Unreal Tournament 2007 arrives,
> necessitating a faster processor and better graphics card. :-)
>
> In the mean time, I'd like to hang out with the system and drives I
> have. As "mike" said, my understanding is that zfs would provide error
> correction until a disc fails, if the setup is properly done. That's
> the setup for which I'm requesting a recommendation.
>
> I won't even be able to use zfs until Leopard arrives in October, but
> I want to bone up so I'll be ready when it does.
>
> Money isn't an issue here, but neither is creating an optimal zfs
> system. I'm curious what the right zfs configuration is for the system
> I have.
>
Given the odd sizes of your drives, there might not be one, unless you
are willing to sacrifice capacity.

Ian

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Motley group of discs?

2007-05-04 Thread Lee Fyock

I didn't mean to kick up a fuss.

I'm reasonably zfs-savvy in that I've been reading about it for a  
year or more. I'm a Mac developer and general geek; I'm excited about  
zfs because it's new and cool.


At some point I'll replace my old desktop machine with something new  
and better -- probably when Unreal Tournament 2007 arrives,  
necessitating a faster processor and better graphics card. :-)


In the mean time, I'd like to hang out with the system and drives I  
have. As "mike" said, my understanding is that zfs would provide  
error correction until a disc fails, if the setup is properly done.  
That's the setup for which I'm requesting a recommendation.


I won't even be able to use zfs until Leopard arrives in October, but  
I want to bone up so I'll be ready when it does.


Money isn't an issue here, but neither is creating an optimal zfs  
system. I'm curious what the right zfs configuration is for the  
system I have.


Thanks!
Lee

On May 4, 2007, at 7:41 PM, Al Hopper wrote:


On Fri, 4 May 2007, mike wrote:


Isn't the benefit of ZFS that it will allow you to use even the most
unreliable risks and be able to inform you when they are  
attempting to

corrupt your data?


Yes - I won't argue that ZFS can be applied exactly as you state  
above.

However, ZFS is no substitute for bad practices that include:

- not proactively replacing mechanical components *before* they fail
- not having maintenance policies in place

To me it sounds like he is a SOHO user; may not have a lot of  
funds to

go out and swap hardware on a whim like a company might.


You may be right - but you're simply guessing.  The original system
probably cost around $3k (?? I could be wrong).  So what I'm  
suggesting,

that he spend ~ $300, represents ~ 10% of the original system cost.

Since the OP asked for advice, I've given him the best advice I can  
come
up with.  I've also encountered many users who don't keep up to  
date with

current computer hardware capabilities and pricing, and who may be
completely unaware that you can purchase two 500Gb disk drives,  
with a 5
year warranty, for around $300.  And possibly less if you checkout  
Frys

weekly bargin disk drive offers.

Now consider the total cost of ownership solution I recommended: 500
gigabytes of storage, coupled with ZFS, which translates into $60/ 
year for
5 years of error free storage capability.  Can life get any better  
than

this! :)

Now contrast my recommendation with what you propose - re-targeting a
bunch of older disk drives, which incorporate older, less reliable
technology, with a view to saving money.  How much is your time worth?
How many hours will it take you to recover from a failure of one of  
these

older drives and the accompying increased risk of data loss.

If the ZFS savvy OP comes back to this list and says "Als' solution  
is too
expensive" I'm perfectly willing to rethink my recommendation.  For  
now, I

believe it to be the best recommendation I can devise.


ZFS in my opinion is well-suited for those without access to
continuously upgraded hardware and expensive fault-tolerant
hardware-based solutions. It is ideal for home installations where
people think their data is safe until the disk completely dies. I
don't know how many non-savvy people I have helped over the years who
has no data protection, and ZFS could offer them at least some
fault-tolerance and protection against corruption, and could help
notify them when it is time to shut off their computer and call
someone to come swap out their disk and move their data to a fresh
drive before it's completely failed...


Agreed.

One piece-of-the-puzzle that's missing right now IMHO, is a reliable,
two port, low-cost PCI SATA disk controller.  A solid/de-bugged 3124
driver would go a long way to ZFS-enabling a bunch of cost- 
constrained ZFS

users.

And, while I'm working this hardware wish list, please ... a PCI- 
Express

based version of the SuperMicro AOC-SAT2-MV8 8-port Marvell based disk
controller card.  Sun ... are you listening?



- mike


On 5/4/07, Al Hopper <[EMAIL PROTECTED]> wrote:

On Fri, 4 May 2007, Lee Fyock wrote:


Hi--

I'm looking forward to using zfs on my Mac at some point. My  
desktop
server (a dual-1.25GHz G4) has a motley collection of discs that  
has

accreted over the years: internal EIDE 320GB (boot drive), internal
250, 200 and 160 GB drives, and an external USB 2.0 600 GB drive.

My guess is that I won't be able to use zfs on the boot 320 GB  
drive,

at least this year. I'd like to favor available space over
performance, and be able to swap out a failed drive without losing
any data.

So, what's the best zfs configuration in this situation? The FAQs
I've read are usually related to matched (in size) drives.


Seriously, the best solution here is to discard any drive that is  
3 years
(or more) old[1] and purchase two new SATA 500Gb drives.  Setup  
the new
drives as a zfs mirror.  Being a believer in diversity, I'd  
recommend the

fol

Re: [zfs-discuss] Motley group of discs?

2007-05-04 Thread mike

On 5/4/07, Al Hopper <[EMAIL PROTECTED]> wrote:

Yes - I won't argue that ZFS can be applied exactly as you state above.
However, ZFS is no substitute for bad practices that include:

- not proactively replacing mechanical components *before* they fail
- not having maintenance policies in place


I mainly was speaking on behalf of the home users. If any data is
important you obviously get what you pay for. However I think ZFS can
help improve the integrity - perhaps you don't know the disk is
starting to fail until it has corrupted some data. If ZFS was in
place, some if not all of the data would still have been safe. I
replace my disks when they start to get corrupt, and I am still always
nervous and have high-stress data moves off failing disks to the new
ones/temporary storage. ZFS in my opinion is a proactive way to
minimize data loss. It's obviously not an excuse to let your hardware
rot for years.


And, while I'm working this hardware wish list, please ... a PCI-Express
based version of the SuperMicro AOC-SAT2-MV8 8-port Marvell based disk
controller card.  Sun ... are you listening?


Yeah - I've got a wishlist too; port-multiplier friendly PCI-e
adapters... Marvell or SI or anything as long as it's PCI-e and has 4
or 5 eSATA ports that can work with a port multipler (for 4-5 disks
per port) ... I don't think there is a clear fully supported option
yet or I'd be using it right now.

- mike
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Motley group of discs?

2007-05-04 Thread Al Hopper
On Fri, 4 May 2007, mike wrote:

> Isn't the benefit of ZFS that it will allow you to use even the most
> unreliable risks and be able to inform you when they are attempting to
> corrupt your data?

Yes - I won't argue that ZFS can be applied exactly as you state above.
However, ZFS is no substitute for bad practices that include:

- not proactively replacing mechanical components *before* they fail
- not having maintenance policies in place

> To me it sounds like he is a SOHO user; may not have a lot of funds to
> go out and swap hardware on a whim like a company might.

You may be right - but you're simply guessing.  The original system
probably cost around $3k (?? I could be wrong).  So what I'm suggesting,
that he spend ~ $300, represents ~ 10% of the original system cost.

Since the OP asked for advice, I've given him the best advice I can come
up with.  I've also encountered many users who don't keep up to date with
current computer hardware capabilities and pricing, and who may be
completely unaware that you can purchase two 500Gb disk drives, with a 5
year warranty, for around $300.  And possibly less if you checkout Frys
weekly bargin disk drive offers.

Now consider the total cost of ownership solution I recommended: 500
gigabytes of storage, coupled with ZFS, which translates into $60/year for
5 years of error free storage capability.  Can life get any better than
this! :)

Now contrast my recommendation with what you propose - re-targeting a
bunch of older disk drives, which incorporate older, less reliable
technology, with a view to saving money.  How much is your time worth?
How many hours will it take you to recover from a failure of one of these
older drives and the accompying increased risk of data loss.

If the ZFS savvy OP comes back to this list and says "Als' solution is too
expensive" I'm perfectly willing to rethink my recommendation.  For now, I
believe it to be the best recommendation I can devise.

> ZFS in my opinion is well-suited for those without access to
> continuously upgraded hardware and expensive fault-tolerant
> hardware-based solutions. It is ideal for home installations where
> people think their data is safe until the disk completely dies. I
> don't know how many non-savvy people I have helped over the years who
> has no data protection, and ZFS could offer them at least some
> fault-tolerance and protection against corruption, and could help
> notify them when it is time to shut off their computer and call
> someone to come swap out their disk and move their data to a fresh
> drive before it's completely failed...

Agreed.

One piece-of-the-puzzle that's missing right now IMHO, is a reliable,
two port, low-cost PCI SATA disk controller.  A solid/de-bugged 3124
driver would go a long way to ZFS-enabling a bunch of cost-constrained ZFS
users.

And, while I'm working this hardware wish list, please ... a PCI-Express
based version of the SuperMicro AOC-SAT2-MV8 8-port Marvell based disk
controller card.  Sun ... are you listening?


> - mike
>
>
> On 5/4/07, Al Hopper <[EMAIL PROTECTED]> wrote:
> > On Fri, 4 May 2007, Lee Fyock wrote:
> >
> > > Hi--
> > >
> > > I'm looking forward to using zfs on my Mac at some point. My desktop
> > > server (a dual-1.25GHz G4) has a motley collection of discs that has
> > > accreted over the years: internal EIDE 320GB (boot drive), internal
> > > 250, 200 and 160 GB drives, and an external USB 2.0 600 GB drive.
> > >
> > > My guess is that I won't be able to use zfs on the boot 320 GB drive,
> > > at least this year. I'd like to favor available space over
> > > performance, and be able to swap out a failed drive without losing
> > > any data.
> > >
> > > So, what's the best zfs configuration in this situation? The FAQs
> > > I've read are usually related to matched (in size) drives.
> >
> > Seriously, the best solution here is to discard any drive that is 3 years
> > (or more) old[1] and purchase two new SATA 500Gb drives.  Setup the new
> > drives as a zfs mirror.  Being a believer in diversity, I'd recommend the
> > following two products (one of each):
> >
> > - Western Digital Caviar RE2 WD5000YS 500GB 7200 RPM 16MB Cache SATA
> > 3.0Gb/s Hard Drive [2]
> > - Seagate Barracuda 7200.10 (Perpendicular Recording) ST3500630AS 500GB
> > 7200 RPM 16MB Cache SATA 3.0Gb/s Hard Drive
> >
> > Not being familiar with Macs - I'm not sure about your availability of
> > SATA ports on the motherboard.
> >
> > [1] it continues to amaze me that many sites, large or small, don't have a
> > (written) policy for mechanical component replacement - whether disk
> > drives or fans.
> > [2] $151 at zipzoomfly.com
> > [3] $130 at newegg.com
> >
> > Regards,
> >
> > Al Hopper  Logical Approach Inc, Plano, TX.  [EMAIL PROTECTED]
> >   Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
> > OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
> > http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
> > ___

Re: [zfs-discuss] Motley group of discs?

2007-05-04 Thread Ian Collins
mike wrote:
> Isn't the benefit of ZFS that it will allow you to use even the most
> unreliable risks and be able to inform you when they are attempting to
> corrupt your data?
>
> To me it sounds like he is a SOHO user; may not have a lot of funds to
> go out and swap hardware on a whim like a company might.
>
There's a limit to haw much even ZFS can do with bad disks, sure it can
manage a failing mirror better than SVM or low end hardware RAID, but
given the motley collection of drives in the OP's system, there aren't
that many options.  Given the silly prices of new drives (320GB are
about the best $/GB), replacement is the best long term option.
Otherwise, mirroring the largest two drives and discard the small one might be 
a good option.

Ian



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Motley group of discs?

2007-05-04 Thread mike

Isn't the benefit of ZFS that it will allow you to use even the most
unreliable risks and be able to inform you when they are attempting to
corrupt your data?

To me it sounds like he is a SOHO user; may not have a lot of funds to
go out and swap hardware on a whim like a company might.

ZFS in my opinion is well-suited for those without access to
continuously upgraded hardware and expensive fault-tolerant
hardware-based solutions. It is ideal for home installations where
people think their data is safe until the disk completely dies. I
don't know how many non-savvy people I have helped over the years who
has no data protection, and ZFS could offer them at least some
fault-tolerance and protection against corruption, and could help
notify them when it is time to shut off their computer and call
someone to come swap out their disk and move their data to a fresh
drive before it's completely failed...

- mike


On 5/4/07, Al Hopper <[EMAIL PROTECTED]> wrote:

On Fri, 4 May 2007, Lee Fyock wrote:

> Hi--
>
> I'm looking forward to using zfs on my Mac at some point. My desktop
> server (a dual-1.25GHz G4) has a motley collection of discs that has
> accreted over the years: internal EIDE 320GB (boot drive), internal
> 250, 200 and 160 GB drives, and an external USB 2.0 600 GB drive.
>
> My guess is that I won't be able to use zfs on the boot 320 GB drive,
> at least this year. I'd like to favor available space over
> performance, and be able to swap out a failed drive without losing
> any data.
>
> So, what's the best zfs configuration in this situation? The FAQs
> I've read are usually related to matched (in size) drives.

Seriously, the best solution here is to discard any drive that is 3 years
(or more) old[1] and purchase two new SATA 500Gb drives.  Setup the new
drives as a zfs mirror.  Being a believer in diversity, I'd recommend the
following two products (one of each):

- Western Digital Caviar RE2 WD5000YS 500GB 7200 RPM 16MB Cache SATA
3.0Gb/s Hard Drive [2]
- Seagate Barracuda 7200.10 (Perpendicular Recording) ST3500630AS 500GB
7200 RPM 16MB Cache SATA 3.0Gb/s Hard Drive

Not being familiar with Macs - I'm not sure about your availability of
SATA ports on the motherboard.

[1] it continues to amaze me that many sites, large or small, don't have a
(written) policy for mechanical component replacement - whether disk
drives or fans.
[2] $151 at zipzoomfly.com
[3] $130 at newegg.com

Regards,

Al Hopper  Logical Approach Inc, Plano, TX.  [EMAIL PROTECTED]
  Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Motley group of discs?

2007-05-04 Thread Toby Thain


On 4-May-07, at 6:53 PM, Al Hopper wrote:


...
[1] it continues to amaze me that many sites, large or small, don't  
have a

(written) policy for mechanical component replacement - whether disk
drives or fans.


You're not the only one. In fact, while I'm not exactly talking  
"enterprise" level here - more usually "IT" and we know what that  
means - I've seen many RAID systems purchased and set up without any  
spare disks on hand, or any thought given to what happens next when  
one fails. Likely this is a combination of low expectations (you can  
usually blame Windows and everyone will believe it) from the  
computing services department combined with a lack of feedback  
("you're fired") when massive data loss occurs.


--Toby



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Motley group of discs?

2007-05-04 Thread Al Hopper
On Fri, 4 May 2007, Lee Fyock wrote:

> Hi--
>
> I'm looking forward to using zfs on my Mac at some point. My desktop
> server (a dual-1.25GHz G4) has a motley collection of discs that has
> accreted over the years: internal EIDE 320GB (boot drive), internal
> 250, 200 and 160 GB drives, and an external USB 2.0 600 GB drive.
>
> My guess is that I won't be able to use zfs on the boot 320 GB drive,
> at least this year. I'd like to favor available space over
> performance, and be able to swap out a failed drive without losing
> any data.
>
> So, what's the best zfs configuration in this situation? The FAQs
> I've read are usually related to matched (in size) drives.

Seriously, the best solution here is to discard any drive that is 3 years
(or more) old[1] and purchase two new SATA 500Gb drives.  Setup the new
drives as a zfs mirror.  Being a believer in diversity, I'd recommend the
following two products (one of each):

- Western Digital Caviar RE2 WD5000YS 500GB 7200 RPM 16MB Cache SATA
3.0Gb/s Hard Drive [2]
- Seagate Barracuda 7200.10 (Perpendicular Recording) ST3500630AS 500GB
7200 RPM 16MB Cache SATA 3.0Gb/s Hard Drive

Not being familiar with Macs - I'm not sure about your availability of
SATA ports on the motherboard.

[1] it continues to amaze me that many sites, large or small, don't have a
(written) policy for mechanical component replacement - whether disk
drives or fans.
[2] $151 at zipzoomfly.com
[3] $130 at newegg.com

Regards,

Al Hopper  Logical Approach Inc, Plano, TX.  [EMAIL PROTECTED]
   Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] thoughts on ZFS copies

2007-05-04 Thread Richard Elling

I've put together some thoughts on the ZFS copies property.
http://blogs.sun.com/relling/entry/zfs_copies_and_data_protection

I hope that you might find this useful.  I tried to use simplified drawings
to illustrate the important points.  Feedback appreciated.

There is more work to be done to understand all of the implications presented
by the copies feature, so if you find something confusion or have questions,
then please speak up and I'll add it to the list of things to do.  I've already
got performance characterization and modeling on the list :-)
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ARC, mmap, pagecache...

2007-05-04 Thread Matty

On 5/4/07, Roch - PAE <[EMAIL PROTECTED]> wrote:


Manoj Joseph writes:
 > Hi,
 >
 > I was wondering about the ARC and its interaction with the VM
 > pagecache... When a file on a ZFS filesystem is mmaped, does the ARC
 > cache get mapped to the process' virtual memory? Or is there another copy?
 >

My understanding is,

The ARC does not get mapped to user space. The data ends up in the ARC
(recordsize chunks) and  in the page cache (in page chunks).
Both copies are updated on writes.


If that is the case, are there any plans to unify the ARC and the page cache?

Thanks,
- Ryan
--
UNIX Administrator
http://prefetch.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ARC, mmap, pagecache...

2007-05-04 Thread Roch - PAE

Manoj Joseph writes:
 > Hi,
 > 
 > I was wondering about the ARC and its interaction with the VM 
 > pagecache... When a file on a ZFS filesystem is mmaped, does the ARC 
 > cache get mapped to the process' virtual memory? Or is there another copy?
 > 

My understanding is,

The ARC does not get mapped to user space. The data ends up in the ARC
(recordsize chunks) and  in the page cache (in page chunks).
Both copies are updated on writes.

-r

 > -Manoj
 > ___
 > zfs-discuss mailing list
 > zfs-discuss@opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Motley group of discs?

2007-05-04 Thread Lee Fyock

Hi--

I'm looking forward to using zfs on my Mac at some point. My desktop  
server (a dual-1.25GHz G4) has a motley collection of discs that has  
accreted over the years: internal EIDE 320GB (boot drive), internal  
250, 200 and 160 GB drives, and an external USB 2.0 600 GB drive.


My guess is that I won't be able to use zfs on the boot 320 GB drive,  
at least this year. I'd like to favor available space over  
performance, and be able to swap out a failed drive without losing  
any data.


So, what's the best zfs configuration in this situation? The FAQs  
I've read are usually related to matched (in size) drives.


Thanks!
Lee

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: Re: gzip compression throttles system?

2007-05-04 Thread Jürgen Keil
> A couple more questions here.
...
> You still have idle time in this lockstat (and mpstat).
> 
> What do you get for a lockstat -A -D 20 sleep 30?
> 
> Do you see anyone with long lock hold times, long
> sleeps, or excessive spinning?

Hmm, I ran a series of "lockstat -A -l ph_mutex -s 16 -D 20 sleep 5"
commands while writing to the gzip compressed zpool, and noticed
these high mutex block times:


Adaptive mutex block: 8 events in 5.100 seconds (2 events/sec)

---
Count indv cuml rcnt nsec Lock   Caller  
5  62%  62% 0.00 317300109 ph_mutex+0x1380page_create_va+0x334

  nsec -- Time Distribution -- count Stack   
 536870912 |@@ 5 segkmem_page_create+0x89
 segkmem_xalloc+0xbc 
 segkmem_alloc_vn+0xcd   
 segkmem_alloc+0x20  
 vmem_xalloc+0x4fc   
 vmem_alloc+0x159
 kmem_alloc+0x4f 
 kobj_alloc+0x7e 
 kobj_zalloc+0x1c
 zcalloc+0x2d
 z_deflateInit2_+0x1b8   
 z_deflateInit_+0x32 
 z_compress_level+0x77   
 gzip_compress+0x4b  
 zio_compress_data+0xbc  
---
Count indv cuml rcnt nsec Lock   Caller  
1  12%  75% 0.00 260247717 ph_mutex+0x1a40page_create_va+0x334

  nsec -- Time Distribution -- count Stack   
 268435456 |@@ 1 segkmem_page_create+0x89
 segkmem_xalloc+0xbc 
 segkmem_alloc_vn+0xcd   
 segkmem_alloc+0x20  
 vmem_xalloc+0x4fc   
 vmem_alloc+0x159
 kmem_alloc+0x4f 
 kobj_alloc+0x7e 
 kobj_zalloc+0x1c
 zcalloc+0x2d
 z_deflateInit2_+0x1de   
 z_deflateInit_+0x32 
 z_compress_level+0x77   
 gzip_compress+0x4b  
 zio_compress_data+0xbc  
---
Count indv cuml rcnt nsec Lock   Caller  
1  12%  88% 0.00 348135263 ph_mutex+0x1380page_create_va+0x334

  nsec -- Time Distribution -- count Stack   
 536870912 |@@ 1 segkmem_page_create+0x89
 segkmem_xalloc+0xbc 
 segkmem_alloc_vn+0xcd   
 segkmem_alloc+0x20  
 vmem_xalloc+0x4fc   
 vmem_alloc+0x159
 kmem_alloc+0x4f 
 kobj_alloc+0x7e 
 kobj_zalloc+0x1c
 zcalloc+0x2d
 z_deflateInit2_+0x1a1   
 z_deflateInit_+0x32 
 z_compress_level+0x77   
 gzip_compress+0x4b  
 zio_compress_data+0xbc  
-

Re: [zfs-discuss] Force rewriting of all data, to push stripes onto newly added devices?

2007-05-04 Thread Richard Elling

Mario Goebbels wrote:

I'm just in sort of a scenario, where I've added devices to a pool and would 
now like the existing data to be spread across the new drives, to increase the 
performance. Is there a way to do it, like a scrub? Or would I have to have all 
files to copy over themselves, or similar hacks?


for the short term, cp works (or any other process which would result in a new 
write
of the files).
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Force rewriting of all data, to push stripes onto newly added devices?

2007-05-04 Thread Bart Smaalders

Mario Goebbels wrote:
I'm just in sort of a scenario, where I've added devices 
to a pool and would now like the existing data to be spread 
across the new drives, to increase the performance. Is 
there a way to do it, like a scrub? Or would I have to 
have all files to copy over themselves, or similar hacks?


Thanks,
-mg
 



This requires rewriting the block pointers; it's the same
problem as supporting vdev removal.  I would guess that
they'll be solved at the same time.

- Bart


--
Bart Smaalders  Solaris Kernel Performance
[EMAIL PROTECTED]   http://blogs.sun.com/barts
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: gzip compression throttles system?

2007-05-04 Thread Roch - PAE

Ian Collins writes:
 > Roch Bourbonnais wrote:
 > >
 > > with recent bits ZFS compression is now handled concurrently with many
 > > CPUs working on different records.
 > > So this load will burn more CPUs and acheive it's results
 > > (compression) faster.
 > >
 > Would changing (selecting a smaller) filesystem record size have any effect?
 > 

If the problem is that we just have a high kernel load
compressing blocks, then probably not. If anything small
records might be a tad less efficient (thus needing more CPU).

 > > So the observed pauses should be consistent with that of a load
 > > generating high system time.
 > > The assumption is that compression now goes faster than when is was
 > > single threaded.
 > >
 > > Is this undesirable ? We might seek a way to slow down compression in
 > > order to limit the system load.
 > >
 > I think you should, otherwise we have a performance throttle that scales
 > with the number of cores!
 > 

Again I wonder to what extent the issue becomes painful due 
to lack of write throttling. Once we have that in, we should 
revisit this. 

-r

 > Ian
 > 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: Re: gzip compression throttles system?

2007-05-04 Thread Jürgen Keil
Roch Bourbonnais wrote

> with recent bits ZFS compression is now handled concurrently with  
> many CPUs working on different records.
> So this load will burn more CPUs and acheive it's results  
> (compression) faster.

Is this done using the taskq's, created in spa_activate()?

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/spa.c#109

These threads seems to be running the gzip compression code,
and are apparently started with a priority of maxclsyspri == 99.

> So the observed pauses should be consistent with that of a load  
> generating high system time.
> The assumption is that compression now goes faster than when is was  
> single threaded.
> 
> Is this undesirable ? We might seek a way to slow
> down compression in  order to limit the system load.

Hmm, I see that the USB device drivers are also using taskq's,
see file usr/src/uts/common/io/usb/usba/usbai_pipe_mgmt.c,
function usba_init_pipe_handle().  The USB device driver is
using a priority of minclsyspri == 60 (or "maxclsyspri - 5" == 94,
in the case of isochronuous usb pipes):

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/io/usb/usba/usbai_pipe_mgmt.c#427


Could this be a problem?  That is, when zfs' taskq is filled with
lots of compression requests, there is no time left running USB
taskq that have a lower priority than zfs?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] iscsitadm local_name in ZFS

2007-05-04 Thread cedric briner

cedric briner wrote:

hello dear community,

Is there a way to have a ``local_name'' as define in iscsitadm.1m when 
you shareiscsi a zvol. This way, it will give even easier 
way to identify an device through IQN.


Ced.



Okay no reply from you so... maybe I didn't make myself well understandable.

Let me try to re-explain you what I mean:
when you use zvol and enable shareiscsi, could you add a suffix to the 
IQN (Iscsi Qualified Name). This suffix will be given by myself and will 
help me to identify which IQN correspond to which zvol : this is just a 
more human readable tag on an IQN.


Similarly, this tag is also given when you do an iscsitadm. And in the 
man page of iscsitadm it is called a .


iscsitadm iscsitadm create target -b  /dev/dsk/c0d0s5  tiger
or
iscsitadm iscsitadm create target -b  /dev/dsk/c0d0s5  hd-1

tiger and hd-1 are 

Ced.

--

Cedric BRINER
Geneva - Switzerland
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Force rewriting of all data, to push stripes onto newly added devices?

2007-05-04 Thread Mario Goebbels
I'm just in sort of a scenario, where I've added devices to a pool and would 
now like the existing data to be spread across the new drives, to increase the 
performance. Is there a way to do it, like a scrub? Or would I have to have all 
files to copy over themselves, or similar hacks?

Thanks,
-mg
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: Re: gzip compression throttles system?

2007-05-04 Thread Jürgen Keil
> A couple more questions here.
... 
> What do you have zfs compresison set to?  The gzip level is
> tunable, according to zfs set, anyway:
> 
> PROPERTY   EDIT  INHERIT   VALUES
> compression YES  YES   on | off | lzjb | gzip | gzip-[1-9]

I've used the "default" gzip compression level, that is I used

zfs set compression=gzip gzip_pool

> You still have idle time in this lockstat (and mpstat).
> 
> What do you get for a lockstat -A -D 20 sleep 30?

# lockstat -A -D 20 /usr/tmp/fill /gzip_pool/junk
lockstat: warning: 723388 aggregation drops on CPU 0
lockstat: warning: 239335 aggregation drops on CPU 1
lockstat: warning: 62366 aggregation drops on CPU 0
lockstat: warning: 51856 aggregation drops on CPU 1
lockstat: warning: 45187 aggregation drops on CPU 0
lockstat: warning: 46536 aggregation drops on CPU 1
lockstat: warning: 687832 aggregation drops on CPU 0
lockstat: warning: 575675 aggregation drops on CPU 1
lockstat: warning: 46504 aggregation drops on CPU 0
lockstat: warning: 40874 aggregation drops on CPU 1
lockstat: warning: 45571 aggregation drops on CPU 0
lockstat: warning: 33422 aggregation drops on CPU 1
lockstat: warning: 501063 aggregation drops on CPU 0
lockstat: warning: 361041 aggregation drops on CPU 1
lockstat: warning: 651 aggregation drops on CPU 0
lockstat: warning: 7011 aggregation drops on CPU 1
lockstat: warning: 61600 aggregation drops on CPU 0
lockstat: warning: 19386 aggregation drops on CPU 1
lockstat: warning: 566156 aggregation drops on CPU 0
lockstat: warning: 105502 aggregation drops on CPU 1
lockstat: warning: 25362 aggregation drops on CPU 0
lockstat: warning: 8700 aggregation drops on CPU 1
lockstat: warning: 585002 aggregation drops on CPU 0
lockstat: warning: 645299 aggregation drops on CPU 1
lockstat: warning: 237841 aggregation drops on CPU 0
lockstat: warning: 20931 aggregation drops on CPU 1
lockstat: warning: 320102 aggregation drops on CPU 0
lockstat: warning: 435898 aggregation drops on CPU 1
lockstat: warning: 115 dynamic variable drops with non-empty dirty list
lockstat: warning: 385192 aggregation drops on CPU 0
lockstat: warning: 81833 aggregation drops on CPU 1
lockstat: warning: 259105 aggregation drops on CPU 0
lockstat: warning: 255812 aggregation drops on CPU 1
lockstat: warning: 486712 aggregation drops on CPU 0
lockstat: warning: 61607 aggregation drops on CPU 1
lockstat: warning: 1865 dynamic variable drops with non-empty dirty list
lockstat: warning: 250425 aggregation drops on CPU 0
lockstat: warning: 171415 aggregation drops on CPU 1
lockstat: warning: 166277 aggregation drops on CPU 0
lockstat: warning: 74819 aggregation drops on CPU 1
lockstat: warning: 39342 aggregation drops on CPU 0
lockstat: warning: 3556 aggregation drops on CPU 1
lockstat: warning: ran out of data records (use -n for more)

Adaptive mutex spin: 4701 events in 64.812 seconds (73 events/sec)

Count indv cuml rcnt spin Lock   Caller  
---
 1726  37%  37% 0.002 vph_mutex+0x17e8   pvn_write_done+0x10c
 1518  32%  69% 0.001 vph_mutex+0x17e8   hat_page_setattr+0x70   
  264   6%  75% 0.002 vph_mutex+0x2000   page_hashin+0xad
  194   4%  79% 0.004 0xfffed2ee0a88 cv_wait+0x69
  106   2%  81% 0.002 vph_mutex+0x2000   page_hashout+0xdd   
   91   2%  83% 0.004 0xfffed2ee0a88 taskq_dispatch+0x2c9
   83   2%  85% 0.004 0xfffed2ee0a88 taskq_thread+0x1cb  
   83   2%  86% 0.001 0xfffec17a56b0 ufs_iodone+0x3d 
   47   1%  87% 0.004 0xfffec1e4ce98 vdev_queue_io+0x85  
   43   1%  88% 0.006 0xfffec139a2c0 trap+0xf66  
   38   1%  89% 0.006 0xfffecb5f8cd0 cv_wait+0x69
   37   1%  90% 0.004 0xfffec143ee90 dmult_deque+0x36
   26   1%  91% 0.002 htable_mutex+0x108 htable_release+0x79 
   26   1%  91% 0.001 0xfffec17a56b0 ufs_putpage+0xa4
   18   0%  91% 0.004 0xfffec00dca48 ghd_intr+0xa8   
   17   0%  92% 0.002 0xfffec00dca48 ghd_waitq_delete+0x35   
   12   0%  92% 0.002 htable_mutex+0x248 htable_release+0x79 
   11   0%  92% 0.008 0xfffec1e4ce98 vdev_queue_io_done+0x3b 
   10   0%  93% 0.003 0xfffec00dca48 ghd_transport+0x71  
   10   0%  93% 0.002 0xff00077dc138 
page_get_mnode_freelist+0xdb
---

Adaptive mutex block: 167 events in 64.812 seconds (3 events/sec)

Count indv cuml rcnt nsec Lock   Caller  
---
   78  47%  47% 0.0031623 vph_mutex+0x17e8   pvn_write_done+0x10c
 

Re: [zfs-discuss] gzip compression throttles system?

2007-05-04 Thread Erblichs
Darren Moffat,

Yes and no. A earlier statement within this discussion
was whether gzip is appropriate for .wav files. This just
gets a relative time to compress. And relative
sizes of the files after the compression.

My assumption is that gzip will run as a user app
in one environment. The normal r/w sys calls then take
a user buffer. So, it would be hard to believe that the 
.wav file won't be read one user buff at at time. Yes,
it could be mmap'ed, but then it would have to be
unmapped. Too many sys calls, I think for the app.
Sorry, haven't looked at it for awhile..

Overall, I am just trying to guess at the read-ahead
delay versus the user buffer versus the internal fs.
The internal FS should take it basicly one FS block
at a time (or do multiple blocks in parallel)
and the user app takes it anywhere from
one buffer to one page size, 8k at a time. So, due
to reading one buffer at a time in a loop, with
a context switch from kernel to user each time. Thus,
I would expect that the gzip app would be slwer.

So, my first step is to keep it simple (KISS)and tell
the group "what happens if" we do this simple
comparison? And how many bytes/sec is compressed?
And are they approx the same speed? Do you end up
with the same size file

Mitchell Erblich
--


Darren J Moffat wrote:
> 
> Erblichs wrote:
> >   So, my first order would be to take 1GB or 10GB .wav files
> >   AND time both the kernel implementation of Gzip and the
> >   user application. Approx the same times MAY indicate
> >   that the kernel implementation gzip funcs should
> >   be treatedly maybe more as  interactive scheduling
> >   threads and that it is too high and blocks other
> >   threads or proces from executing.
> 
> If you just run gzip(1) against the files you are operating on the whole
> file so you only incur startup costs once and are thus doing quite a
> different compression to operating on a block level.  A fairer
> comparison would be to build a userland program that compresses and then
> writes to disk in ZFS blocksize chunks, that way you are compressing the
> same sizes of data and doing the startup every time just like zio has to do.
> 
> --
> Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: Re: Re: ZFS improvements

2007-05-04 Thread Gino
> On Mon, Apr 23, 2007 at 09:38:47AM -0700, Gino wrote:
> >
> > we had 5 corrupted zpool (on different servers and
> different SANs) !
> > With Solaris up to S10U3 and Nevada up to snv59 we
> are able to corrupt
> > easily a zpool only disconnecting a few times one
> or more luns of a
> > zpool under high i/o load.
> > 
> > We are testing now snv60.
> 
> As I've mentioned before, I believe you were tripping
> over the space map
> bug (6458218) which was fixed in build 60 and will
> appear in S10u4.  Let
> us know if you are able to reproduce the problem on
> build 60 or later.

Eric,
we done our first test with snv60.
we moved over 40TB of data between 4 zpools and in the mean time we've done 
about 10 snapshots and forced 50 panic disabling ports on the fc switches.
None of the pools have been corrupted!
Also we found snv60 MUCH more stable than S10U3.

gino
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs UFS2 overhead and may be a bug?

2007-05-04 Thread Pawel Jakub Dawidek
On Thu, May 03, 2007 at 02:15:45PM -0700, Bakul Shah wrote:
> [originally reported for ZFS on FreeBSD but Pawel Jakub Dawid
>  says this problem also exists on Solaris hence this email.]

Thanks!

> Summary: on ZFS, overhead for reading a hole seems far worse
> than actual reading from a disk.  Small buffers are used to
> make this overhead more visible.
> 
> I ran the following script on both ZFS and UF2 filesystems.
> 
> [Note that on FreeBSD cat uses a 4k buffer and md5 uses a 1k
>  buffer. On Solaris you can replace them with dd with
>  respective buffer sizes for this test and you should see
>  similar results.]
> 
> $ dd SPACY# 10G zero bytes allocated
> $ truncate -s 10G HOLEY   # no space allocated
> 
> $ time dd /dev/null bs=1m # A1
> $ time dd /dev/null bs=1m # A2
> $ time cat SPACY >/dev/null   # B1
> $ time cat HOLEY >/dev/null   # B2
> $ time md5 SPACY  # C1
> $ time md5 HOLEY  # C2
> 
> I have summarized the results below.
> 
> ZFSUFS2
>   Elapsed System  Elapsed System Test
> dd SPACY bs=1m  110.26   22.52340.38   19.11  A1
> dd HOLEY bs=1m   22.44   22.41 24.24   24.13  A2
> 
> cat SPACY 119.64   33.04  342.77   17.30  B1
> cat HOLEY 222.85  222.08   22.91   22.41  B2
> 
> md5 SPACY 210.01   77.46  337.51   25.54  C1  
> md5 HOLEY 856.39  801.21   82.11   28.31  C2

This is what I see on Solaris (hole is 4GB):

# /usr/bin/time dd if=/ufs/hole of=/dev/null bs=128k
real   23.7
# /usr/bin/time dd if=/zfs/hole of=/dev/null bs=128k
real   21.2

# /usr/bin/time dd if=/ufs/hole of=/dev/null bs=4k
real   31.4
# /usr/bin/time dd if=/zfs/hole of=/dev/null bs=4k
real 7:32.2

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpHFXMS6aW7i.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Filesystem full not reported in /var/adm/messages

2007-05-04 Thread C-EDGE
Hello,

Is someone able to explain me why zfs does not report a filesystem full in 
/var/adm/messages ? Did I miss something or is it a expected behaviour ?

Tested on Solaris 11/06 (ZFS version 3)

Thank you for your feedback !
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] gzip compression throttles system?

2007-05-04 Thread Darren J Moffat

Erblichs wrote:

So, my first order would be to take 1GB or 10GB .wav files
AND time both the kernel implementation of Gzip and the
user application. Approx the same times MAY indicate
that the kernel implementation gzip funcs should
be treatedly maybe more as  interactive scheduling
threads and that it is too high and blocks other
threads or proces from executing.


If you just run gzip(1) against the files you are operating on the whole 
file so you only incur startup costs once and are thus doing quite a 
different compression to operating on a block level.  A fairer 
comparison would be to build a userland program that compresses and then 
writes to disk in ZFS blocksize chunks, that way you are compressing the 
same sizes of data and doing the startup every time just like zio has to do.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss