Re: [zfs-discuss] Heavy write IO for no apparent reason

2013-01-18 Thread Freddie Cash
On Thu, Jan 17, 2013 at 4:48 PM, Peter Blajev  wrote:

> Right on Tim. Thanks. I didn't know that. I'm sure it's documented
> somewhere and I should have read it so double thanks for explaining it.
>

When in doubt, always check the man page first:
man zpool

It's listed in the section on the "iostat" sub-command:
 zpool iostat [-T d|u] [-v] [pool] ... [interval [count]]

 Displays I/O statistics for the given pools. When given an
interval,
 the statistics are printed every interval seconds until Ctrl-C is
 pressed. If no pools are specified, statistics for every pool in
the
 system is shown. If count is specified, the command exits after
count
     reports are printed.

:D

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] any more efficient way to transfer snapshot between two hosts than ssh tunnel?

2012-12-13 Thread Freddie Cash
On Dec 13, 2012 8:02 PM, "Fred Liu"  wrote:
>
> Assuming in a secure and trusted env, we want to get the maximum transfer
speed without the overhead from ssh.

Add the HPN patches to OpenSSH and enable the NONE cipher.  We can saturate
a gigabits link (980 mbps) between two FreeBSD hosts using that.

Without it, we were only able to hit ~480 mbps on a good day.

If you want 0 overhead, there's always netcat. :)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] S11 vs illumos zfs compatiblity

2012-12-13 Thread Freddie Cash
Oracle effectively forked ZFS with the release of Solaris 11 by not
open-sourcing any of the ZFS code.

Solaris 11 includes ZFSv31 or higher.

The last open-source release of ZFS was ZFSv28.

Thus, if you create a pool on Solaris 11+ that you want to import on other
systems, you have to manually tell it to create a ZFSv28 pool, and not use
the defaults.


On Thu, Dec 13, 2012 at 8:14 AM, sol  wrote:

> Hi
>
> I've just tried to use illumos (151a5)  import a pool created on solaris
> (11.1) but it failed with an error about the pool being incompatible.
>
> Are we now at the stage where the two prongs of the zfs fork are pointing
> in incompatible directions?
>
>   --
> *From:* Matthew Ahrens 
> On Thu, Jan 5, 2012 at 6:53 AM, sol  wrote:
>
>
> I would have liked to think that there was some good-will between the ex-
> and current-members of the zfs team, in the sense that the people who
> created zfs but then left Oracle still care about it enough to want the
> Oracle version to be as bug-free as possible.
>
>
> There is plenty of good will between everyone who's worked on ZFS --
> current Oracle employees, former employees, and those never employed by
> Oracle.  We would all like to see all implementations of ZFS be the highest
> quality possible.  I'd like to think that we all try to achieve that to the
> extent that it is possible within our corporate priorities.
>
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>


-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Remove disk

2012-12-06 Thread Freddie Cash
On Thu, Dec 6, 2012 at 12:35 AM, Albert Shih  wrote:

>  Le 01/12/2012 ? 08:33:31-0700, Jan Owoc a ?crit
>
> > 2) replace the disks with larger ones one-by-one, waiting for a
> > resilver in between
>
> This is the point I don't see how to do it. I've 48 disk actually from
> /dev/da0 -> /dev/da47 (I'm under FreeBSD 9.0) lets say 3To.
>
> I've 4 raidz2 the first from /dev/da0 -> /dev/da11 etc..
>
> So I add physically a new enclosure with new 12 disks for example 4To disk.
>
> I'm going to have new /dev/da48 --> /dev/da59.
>
> Say I want remove /dev/da0 -> /dev/da11. First I pull out the /dev/da0.
> The first raidz2 going to be in «degraded state». So I going to tell the
> pool the new disk is /dev/da48.
>

zpool replace  da0 da48



> repeat this_process until /dev/da11 replace by /dev/da59.
>
> But at the end how many space I'm going to use on those /dev/da48 -->
> /dev/da51. Am I going to have 3To or 4To ? Because each time before
> complete ZFS going to use only 3 To how at the end he going to magically
> use 4To ?
>

The first disk you replace, it will use 3 TB, the size of the disk it
replaced.

The second disk you replace, it will use 3 TB, the size of the disk it
replaced.

...

The 12th disk you replace, it will use 3 TB, the size of the disk it
replaced.

However, now that all of the disks in the raidz vdev have been replaced,
the overall size of the vdev will increase to use the full 4 TB of each
disk.  This either happens automatically (autoexpand property is on), or
manually by export/import the pool.

Second question, when I'm going to pull out the first enclosure meaning the
> old /dev/da0 --> /dev/da11 and reboot the server the kernel going to give
> new number of those disk meaning
>
> old /dev/da12 --> /dev/da0
> old /dev/da13 --> /dev/da1
> etc...
> old /dev/da59 --> /dev/da47
>
> how zfs going to manage that ?
>

Every disk that is part of a ZFS pool has metadata on it, that includes
which pool it's part of, which vdev it's part of, etc.  Thus, if you do an
export followed by an import, then ZFS will read the metadata off the disks
and sort things out automatically.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Question about degraded drive

2012-11-27 Thread Freddie Cash
And you can try 'zpool online' on the failed drive to see if it comes back
online.
On Nov 27, 2012 6:08 PM, "Freddie Cash"  wrote:

> You don't use replace on mirror vdevs.
>
> 'zpool detach' the failed drive. Then 'zpool attach' the new drive.
> On Nov 27, 2012 6:00 PM, "Chris Dunbar - Earthside, LLC" <
> cdun...@earthside.net> wrote:
>
>> Hello,
>>
>> ** **
>>
>> I have a degraded mirror set and this is has happened a few times (not
>> always the same drive) over the last two years. In the past I replaced the
>> drive and and ran zpool replace and all was well. I am wondering, however,
>> if it is safe to run zpool replace without replacing the drive to see if it
>> is in fact failed. On traditional RAID systems I have had drives drop out
>> of an array, but be perfectly fine. Adding them back to the array returned
>> the drive to service and all was well. Does that approach work with ZFS? If
>> not, is there another way to test the drive before making the decision to
>> yank and replace?
>>
>> ** **
>>
>> Thank you!
>> Chris
>>
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>
>>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Question about degraded drive

2012-11-27 Thread Freddie Cash
You don't use replace on mirror vdevs.

'zpool detach' the failed drive. Then 'zpool attach' the new drive.
On Nov 27, 2012 6:00 PM, "Chris Dunbar - Earthside, LLC" <
cdun...@earthside.net> wrote:

> Hello,
>
> ** **
>
> I have a degraded mirror set and this is has happened a few times (not
> always the same drive) over the last two years. In the past I replaced the
> drive and and ran zpool replace and all was well. I am wondering, however,
> if it is safe to run zpool replace without replacing the drive to see if it
> is in fact failed. On traditional RAID systems I have had drives drop out
> of an array, but be perfectly fine. Adding them back to the array returned
> the drive to service and all was well. Does that approach work with ZFS? If
> not, is there another way to test the drive before making the decision to
> yank and replace?
>
> ** **
>
> Thank you!
> Chris
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Repairing corrupted ZFS pool

2012-11-19 Thread Freddie Cash
On Mon, Nov 19, 2012 at 9:03 AM, Peter Jeremy  wrote:
> On 2012-Nov-19 11:02:06 -0500, Ray Arachelian  wrote:
>>Is the pool importing properly at least?  Maybe you can create another
>>volume and transfer the data over for that volume, then destroy it?
>
> The pool is imported and passes all tests except "zfs diff".  Creating
> another pool _is_ an option but I'm not sure how to transfer the data
> across - using "zfs send | zfs recv" replicates the corruption and
> "tar -c | tar -x" loses all the snapshots.

Create new pool.
Create new filesystem.
rsync data from /path/to/filesystem/.zfs/snapshot/snapname/ to new filesystem
Snapshot new filesystem.
rsync data from /path/to/filesystem/.zfs/snapshot/snapname+1/ to new filesystem
Snapshot new filesystem

See if zfs diff works.

If it does, repeat the rsync/snapshot steps for the rest of the snapshots.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Intel DC S3700

2012-11-13 Thread Freddie Cash
Anandtech.com has a thorough review of it. Performance is consistent
(within 10-15% IOPS) across the lifetime of the drive, has capacitors to
flush RAM cache to disk, and doesn't store user data in the cache. It's
also cheaper per GB than the 710 it replaces.
On 2012-11-13 3:32 PM, "Jim Klimov"  wrote:

> On 2012-11-13 22:56, Mauricio Tavares wrote:
>
>> Trying again:
>>
>> Intel just released those drives. Any thoughts on how nicely they will
>> play in a zfs/hardware raid setup?
>>
>
> Seems interesting - fast, assumed reliable and consistent in its IOPS
> (according to marketing talk), addresses power loss reliability (acc.
> to datasheet):
>
> * Endurance Rating - 10 drive writes/day over 5 years while running
> JESD218 standard
>
> * The Intel SSD DC S3700 supports testing of the power loss capacitor,
> which can be monitored using the following SMART attribute: (175, AFh).
>
> Somewhat affordably priced (at least in the volume market for shops
> that buy hardware in cubic meters ;)
>
> http://newsroom.intel.com/**community/intel_newsroom/blog/**
> 2012/11/05/intel-announces-**intel-ssd-dc-s3700-series--**
> next-generation-data-center-**solid-state-drive-ssd
>
> http://download.intel.com/**newsroom/kits/ssd/pdfs/Intel_**
> SSD_DC_S3700_Product_**Specification.pdf
>
> All in all, I can't come up with anything offensive against it quickly ;)
> One possible nit regards the ratings being geared towards 4KB block
> (which is not unusual with SSDs), so it may be further from announced
> performance with other block sizes - i.e. when caching ZFS metadata.
>
> Thanks for bringing it into attention spotlight, and I hope the more
> savvy posters would overview it better.
>
> //Jim
> __**_
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/**mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS best practice for FreeBSD?

2012-10-13 Thread Freddie Cash
Ah, okay, that makes sense. I wasn't offended, just confused. :)

Thanks for the clarification
On Oct 13, 2012 2:01 AM, "Jim Klimov"  wrote:

> 2012-10-12 19:34, Freddie Cash пишет:
>
>> On Fri, Oct 12, 2012 at 3:28 AM, Jim Klimov  wrote:
>>
>>> In fact, you can (although not recommended due to balancing reasons)
>>> have tlvdevs of mixed size (like in Freddie's example) and even of
>>> different structure (i.e. mixing raidz and mirrors or even single
>>> LUNs) by forcing the disk attachment.
>>>
>>
>> My example shows 4 raidz2 vdevs, with each vdev having 6 disks, along
>> with a log vdev, and a cache vdev.  Not sure where you're seeing an
>> imbalance.  Maybe it's because the pool is currently resilvering a
>> drive, thus making it look like one of the vdevs has 7 drives?
>>
>
> No, my comment was about this pool having an 8Tb TLVDEV and
> several 5.5Tb TLVDEVs - and that this kind of setup is quite
> valid for ZFS - and that while striping data across disks
> it can actually do better than round-robin, giving more data
> to the larger components. But more weight on one side is
> called imbalance ;)
>
> Sorry if my using your example offended you somehow.
>
> //Jim
>
> __**_
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/**mailman/listinfo/zfs-discuss<http://mail.opensolaris.org/mailman/listinfo/zfs-discuss>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS best practice for FreeBSD?

2012-10-12 Thread Freddie Cash
On Fri, Oct 12, 2012 at 3:28 AM, Jim Klimov  wrote:
> In fact, you can (although not recommended due to balancing reasons)
> have tlvdevs of mixed size (like in Freddie's example) and even of
> different structure (i.e. mixing raidz and mirrors or even single
> LUNs) by forcing the disk attachment.

My example shows 4 raidz2 vdevs, with each vdev having 6 disks, along
with a log vdev, and a cache vdev.  Not sure where you're seeing an
imbalance.  Maybe it's because the pool is currently resilvering a
drive, thus making it look like one of the vdevs has 7 drives?

My home file server ran with mixed vdevs for awhile (a 2 IDE-disk
mirror vdev with a 3 SATA-disk raidz1 vdev) as it was built using
scrounged parts.

But all my work file servers have matched vdevs.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS best practice for FreeBSD?

2012-10-11 Thread Freddie Cash
On Thu, Oct 11, 2012 at 2:47 PM, andy thomas  wrote:
> According to a Sun document called something like 'ZFS best practice' I read
> some time ago, best practice was to use the entire disk for ZFS and not to
> partition or slice it in any way. Does this advice hold good for FreeBSD as
> well?

Solaris disabled the disk cache if the disk was partitioned, thus the
recommendation to always use the entire disk with ZFS.

FreeBSD's GEOM architecture allows the disk cache to be enabled
whether you use the full disk or partition it.

Personally, I find it nicer to use GPT partitions on the disk.  That
way, you can start the partition at 1 MB ("gpart add -b 2048" on 512B
disks, or "gpart add -b 512" on 4K disks), leave a little wiggle-room
at the end of the disk, and use GPT labels to identify the disk (using
gpt/label-name for the device when adding to the pool).

> Another point about the Sun ZFS paper - it mentioned optimum performance
> would be obtained with RAIDz pools if the number of disks was between 3 and
> 9. So I've always limited my pools to a maximum of 9 active disks plus
> spares but the other day someone here was talking of seeing hundreds of
> disks in a single pool! So what is the current advice for ZFS in Solaris and
> FreeBSD?

You can have multiple disks in a vdev.  And you can multiple vdevs in
a pool.  Thus, you can have hundred of disks in a pool.  :)  Just
split the disks up into multiple vdevs, where each vdev is under 9
disks each.  :)  For example, we have 25 disks in the following pool,
but only 6 disks in each vdev (plus log/cache):


[root@alphadrive ~]# zpool list -v
NAMESIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
storage24.5T  20.7T  3.76T84%  3.88x  DEGRADED  -
  raidz2   8.12T  6.78T  1.34T -
gpt/disk-a1-  -  - -
gpt/disk-a2-  -  - -
gpt/disk-a3-  -  - -
gpt/disk-a4-  -  - -
gpt/disk-a5-  -  - -
gpt/disk-a6-  -  - -
  raidz2   5.44T  4.57T   888G -
gpt/disk-b1-  -  - -
gpt/disk-b2-  -  - -
gpt/disk-b3-  -  - -
gpt/disk-b4-  -  - -
gpt/disk-b5-  -  - -
gpt/disk-b6-  -  - -
  raidz2   5.44T  4.60T   863G -
gpt/disk-c1-  -  - -
replacing  -  -  -  932G
  6255083481182904200  -  -  - -
  gpt/disk-c2  -  -  - -
gpt/disk-c3-  -  - -
gpt/disk-c4-  -  - -
gpt/disk-c5-  -  - -
gpt/disk-c6-  -  - -
  raidz2   5.45T  4.75T   720G -
gpt/disk-d1-  -  - -
gpt/disk-d2-  -  - -
gpt/disk-d3-  -  - -
gpt/disk-d4-  -  - -
gpt/disk-d5-  -  - -
gpt/disk-d6-  -  - -
  gpt/log  1.98G   460K  1.98G -
cache  -  -  -  -  -  -
  gpt/cache1   32.0G  32.0G 8M -

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] removing upgrade notice from 'zpool status -x'

2012-10-04 Thread Freddie Cash
On Thu, Oct 4, 2012 at 9:45 AM, Jim Klimov  wrote:
> 2012-10-04 20:36, Freddie Cash пишет:
>>
>> On Thu, Oct 4, 2012 at 9:14 AM, Richard Elling 
>> wrote:
>>>
>>> On Oct 4, 2012, at 8:58 AM, Jan Owoc  wrote:
>>> The return code for zpool is ambiguous. Do not rely upon it to determine
>>> if the pool is healthy. You should check the health property instead.
>>
>>
>> Huh.  Learn something new everyday.  You just simplified my "pool
>> health check" script immensely.  Thank you!
>>
>> pstatus=$( zpool get health storage | grep "health" | awk '{ print $3 }' )
>> if [ "${pstatus}" != "ONLINE" ]; then
>
>
> Simplify that too with "zpool list":
>
> # zpool list -H -o health rpool
> ONLINE

Thanks!  Was trying to figure out how to remove the heading as I use
-H a lot with zfs commands, but it didn't work for "zpool get" and I
didn't bother reading to figure it out.  :)


-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] removing upgrade notice from 'zpool status -x'

2012-10-04 Thread Freddie Cash
On Thu, Oct 4, 2012 at 9:14 AM, Richard Elling  wrote:
> On Oct 4, 2012, at 8:58 AM, Jan Owoc  wrote:
> The return code for zpool is ambiguous. Do not rely upon it to determine
> if the pool is healthy. You should check the health property instead.

Huh.  Learn something new everyday.  You just simplified my "pool
health check" script immensely.  Thank you!

pstatus=$( zpool get health storage | grep "health" | awk '{ print $3 }' )
if [ "${pstatus}" != "ONLINE" ]; then

Much simpler than the nested ifs and grep pipelines I was using before.

Not sure why I didn't see "health" in the list of pool properties all
the times I've read the zpool man page.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] vm server storage mirror

2012-09-26 Thread Freddie Cash
If you're willing to try FreeBSD, there's HAST (aka high availability
storage) for this very purpose.

You use hast to create mirror pairs using 1 disk from each box, thus
creating /dev/hast/* nodes. Then you use those to create the zpool one the
'primary' box.

All writes to the pool on the primary box are mirrored over the network to
the secondary box.

When the primary box goes down, the secondary imports the pool and carries
on. When the primary box comes online, it syncs the data back from the
secondary, and then either takes over as primary or becomes the new
secondary.
 On Sep 26, 2012 10:54 AM, "Edward Ned Harvey
(opensolarisisdeadlongliveopensolaris)" <
opensolarisisdeadlongliveopensola...@nedharvey.com> wrote:

>  Here's another one.
>
> ** **
>
> Two identical servers are sitting side by side.  They could be connected
> to each other via anything (presently using crossover ethernet cable.)  And
> obviously they both connect to the regular LAN.  You want to serve VM's
> from at least one of them, and even if the VM's aren't fault tolerant, you
> want at least the storage to be live synced.  The first obvious thing to
> do is simply cron a zfs send | zfs receive at a very frequent interval.  But
> there are a lot of downsides to that - besides the fact that you have to
> settle for some granularity, you also have a script on one system that will
> clobber the other system.  So in the event of a failure, you might
> promote the backup into production, and you have to be careful not to let
> it get clobbered when the main server comes up again.
>
> ** **
>
> I like much better, the idea of using a zfs mirror between the two
> systems.  Even if it comes with a performance penalty, as a result of
> bottlenecking the storage onto Ethernet.  But there are several ways to
> possibly do that, and I'm wondering which will be best.
>
> ** **
>
> Option 1:  Each system creates a big zpool of the local storage.  Then,
> create a zvol within the zpool, and export it iscsi to the other system.  Now
> both systems can see a local zvol, and a remote zvol, which it can use to
> create a zpool mirror.  The reasons I don't like this idea are because
> it's a zpool within a zpool, including the double-checksumming and
> everything.  But the double-checksumming isn't such a concern to me - I'm
> mostly afraid some horrible performance or reliability problem might be
> resultant.  Naturally, you would only zpool import the nested zpool on
> one system.  The other system would basically just ignore it.  But in the
> event of a primary failure, you could force import the nested zpool on
> the secondary system.
>
> ** **
>
> Option 2:  At present, both systems are using local mirroring ,3 mirror
> pairs of 6 disks.  I could break these mirrors, and export one side over
> to the other system...  And vice versa.  So neither server will be doing
> local mirroring; they will both be mirroring across iscsi to targets on
> the other host.  Once again, each zpool will only be imported on one
> host, but in the event of a failure, you could force import it on the other
> host.
>
> ** **
>
> Can anybody think of a reason why Option 2 would be stupid, or can you
> think of a better solution?
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] finding smallest drive that can be used to replace

2012-09-05 Thread Freddie Cash
Query the size of the other drives in the vdev, obviously. ;) So long as
the replacement is larger than the smallest remaining drive, it'll work.
On Sep 5, 2012 8:57 AM, "Yaverot"  wrote:

>
>
> --- skiselkov...@gmail.com wrote:
> >On 09/05/2012 05:06 AM, Yaverot wrote:
> > "What is the smallest sized drive I may use to replace this dead drive?"
> >
> > That information has to be someplace because ZFS will say that drive Q
> is too small.  Is there an easy way to query that information?
>
> >I use fdisk to find this out. For instance say your drive you want to
> find the size of is c2t4d0, then do:
>
> ># fdisk /dev/rdsk/c2t4d0p0
>
> I guess that'll teach me yet again to ask the right question.
>
> Scenario:
> machine has multiple pools (rpool, tank)
> pool tank has multiple vdevs all raidz2, the drives in the pool vary from
> 500G to 3T in capacity.
> c64t0d0 has failed and no longer responds to I/O, so it can't be queried
> with the fdisk trick above.
> I know the failed disk is 1.5T, but before I go buy the replacement, I
> want to know if I can replace it with a 1T, or if it needs to be 2T (since
> 1.5T was a stopgap size instead of a longterm not-too-big-not-too-small
> size I saw it).
> ZFS knows that a device I'm about to stick in is too small, so how do I
> query that information for required minimum size?
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZIL devices and fragmentation

2012-07-30 Thread Freddie Cash
On Mon, Jul 30, 2012 at 10:20 AM, Roy Sigurd Karlsbakk
 wrote:
> On 151a2, man page just says 'use this or that mountpoint' with import -m, 
> but the fact was zpool refused to import the pool at boot when 2 SLOG devices 
> (mirrored) and 10 L2ARC devices were offline. Should OI/Illumos be able to 
> boot cleanly without manual action with the SLOG devices gone?

>From FreeBSD 9-STABLE, which includes ZFSv28:

 zpool import [-o mntopts] [-o property=value] ... [-d dir | -c cachefile]
 [-D] [-f] [-m] [-N] [-R root] [-F [-n]] -a

 Imports all pools found in the search directories. Identical to the
 previous command, except that all pools with a sufficient number of
 devices available are imported. Destroyed pools, pools that were pre-
 viously destroyed with the "zpool destroy" command, will not be
 imported unless the -D option is specified.

 -o mntopts
 Comma-separated list of mount options to use when mounting
 datasets within the pool. See zfs(8) for a description of
 dataset properties and mount options.

 -o property=value
 Sets the specified property on the imported pool. See the
 "Properties" section for more information on the available
 pool properties.

 -c cachefile
 Reads configuration from the given cachefile that was created
 with the "cachefile" pool property. This cachefile is used
 instead of searching for devices.

 -d dir  Searches for devices or files in dir.  The -d option can be
 specified multiple times. This option is incompatible with
 the -c option.

 -D  Imports destroyed pools only. The -f option is also required.

 -f  Forces import, even if the pool appears to be potentially
 active.

 -m  Enables import with missing log devices.


-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] encfs on top of zfs

2012-07-30 Thread Freddie Cash
On Mon, Jul 30, 2012 at 5:20 AM, Tristan Klocke
 wrote:
> I want to switch to ZFS, but still want to encrypt my data. Native
> Encryption for ZFS was added in "ZFS Pool Version Number 30", but I'm using
> ZFS on FreeBSD with Version 28. My question is how would encfs (fuse
> encryption) affect zfs specific features like data Integrity and
> deduplication?

If you are using FreeBSD, why not use GELI to provide the block
devices used for the ZFS vdevs?  That's the "standard" way to get
encryption and ZFS working on FreeBSD.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZIL devices and fragmentation

2012-07-30 Thread Freddie Cash
On Mon, Jul 30, 2012 at 9:38 AM, Roy Sigurd Karlsbakk  
wrote:
>> > Also keep in mind that if you have an SLOG (ZIL on a separate
>> > device), and then lose this SLOG (disk crash etc), you will probably
>> > lose the pool. So if you want/need SLOG, you probably want two of
>> > them in a mirror…
>>
>> That's only true on older versions of ZFS. ZFSv19 (or 20?) includes
>> the ability to import a pool with a failed/missing log device. You
>> lose any data that is in the log and not in the pool, but the pool is
>> importable.
>
> Are you sure? I booted this v28 pool a couple of months back, and found it 
> didn't recognize its pool, apparently because of a missing SLOG. It turned 
> out the cache shelf was disconnected, after re-connecting it, things worked 
> as planned. I didn't try to force a new import, though, but it didn't boot up 
> normally, and told me it couldn't import its pool due to lack of SLOG devices.

Positive.  :)  I tested it with ZFSv28 on FreeBSD 9-STABLE a month or
two ago.  See the updated man page for zpool, especially the bit about
"import -m".  :)

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZIL devices and fragmentation

2012-07-30 Thread Freddie Cash
On Mon, Jul 30, 2012 at 8:58 AM, Roy Sigurd Karlsbakk  
wrote:
>> >  For several times now I've seen statements on this list implying
>> > that a dedicated ZIL/SLOG device catching sync writes for the log,
>> > also allows for more streamlined writes to the pool during normal
>> > healthy TXG syncs, than is the case with the default ZIL located
>> > within the pool.
>>
>> After reading what some others have posted, I should remind that zfs
>> always has a ZIL (unless it is specifically disabled for testing).
>> If it does not have a dedicated ZIL, then it uses the disks in the
>> main pool to construct the ZIL. Dedicating a device to the ZIL should
>> not improve the pool storage layout because the pool already had a
>> ZIL.
>
> Also keep in mind that if you have an SLOG (ZIL on a separate device), and 
> then lose this SLOG (disk crash etc), you will probably lose the pool. So if 
> you want/need SLOG, you probably want two of them in a mirror…

That's only true on older versions of ZFS.  ZFSv19 (or 20?) includes
the ability to import a pool with a failed/missing log device.  You
lose any data that is in the log and not in the pool, but the pool is
importable.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Question on 4k sectors

2012-07-19 Thread Freddie Cash
On Thu, Jul 19, 2012 at 5:29 AM, Hans J. Albertsson
 wrote:
> I think the problem is with disks that are 4k organised, but report their
> blocksize as 512.
>
> If the disk reports it's blocksize correctly as 4096, then ZFS should not
> have a problem.
> At least my 2TB Seagate Barracuda disks seemed to report their blocksizes as
> 4096, and my zpools on those machines have ashift set to 12, which is
> correct, since 2¹² = 4096
>
> You cannot mix 512 and 4096 byte blocksize disks in one pool, at least not
> in a mirror. All disks in a single pool should have the same blocksize.
>
> There is a hacked version of zpool for OpenIndiana that has a blocksize
> option to the create subcommand. I don't know if other OSes have similar
> fixes.

FreeBSD includes the gnop(8) command which can be used to create
pseudo-devices that declare any size of sectors you want.  This can be
used to create ashift=12 vdevs on top of 512B, pseudo-512B, or 4K
drives.

# gnop -S 4096 da{0,1,2,3,4,5,6,7}
# zpool create pool raidz2 da{0,1,2,3,4,5,6,7}.nop
# zpool export pool
# gnop destroy da{0,1,2,3,4,5,6,7}.nop
# zpool import -d /dev pool

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Broken ZFS filesystem

2012-05-08 Thread Freddie Cash
On Tue, May 8, 2012 at 10:24 AM, Freddie Cash  wrote:
> I have an interesting issue with one single ZFS filesystem in a pool.
> All the other filesystems are fine, and can be mounted, snapshoted,
> destroyed, etc.  But this one filesystem, if I try to do any operation
> on it (zfs mount, zfs snapshot, zfs destroy, zfs set ), it
> spins the system until all RAM is used up (wired), and then hangs the
> box.  The zfs process sits in tx -> tx_sync_done_cv state until the
> box locks up.  CTRL+T of the process only ever shows this:
>    load: 0.46  cmd: zfs 3115 [tx->tx_sync_done_cv)] 36.63r 0.00u 0.00s 0% 
> 2440k
>
> Anyone come across anything similar?  And found a way to fix it, or to
> destroy the filesystem?  Any suggestions on how to go about debugging
> this?  Any magical zdb commands to use?
>
> The filesystem only has 5 MB of data in it (log files), compressed via
> LZJB for a compressratio of ~6x.  There are no snapshots for this
> filesystem.
>
> Dedupe is enabled on the pool and all filesystems.

After more fiddling, testing, and experimenting, it all came down to
not enough RAM in the box to mount the 5 MB filesystem.  After
installing an extra 8 GB of RAM (32 GB total), everything mounted
correctly.  Took 27 GB of wired kernel memory (guessing ARC space) to
do it.

Unmount, mount, export, import, change properties all completed
successfully.  And the box is running correctly with 24 GB of RAM
again.

We'll be ordering more RAM for our ZFS boxes, now.  :)
-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Broken ZFS filesystem

2012-05-08 Thread Freddie Cash
I have an interesting issue with one single ZFS filesystem in a pool.
All the other filesystems are fine, and can be mounted, snapshoted,
destroyed, etc.  But this one filesystem, if I try to do any operation
on it (zfs mount, zfs snapshot, zfs destroy, zfs set ), it
spins the system until all RAM is used up (wired), and then hangs the
box.  The zfs process sits in tx -> tx_sync_done_cv state until the
box locks up.  CTRL+T of the process only ever shows this:
load: 0.46  cmd: zfs 3115 [tx->tx_sync_done_cv)] 36.63r 0.00u 0.00s 0% 2440k

Anyone come across anything similar?  And found a way to fix it, or to
destroy the filesystem?  Any suggestions on how to go about debugging
this?  Any magical zdb commands to use?

The filesystem only has 5 MB of data in it (log files), compressed via
LZJB for a compressratio of ~6x.  There are no snapshots for this
filesystem.

Dedupe is enabled on the pool and all filesystems.

System is running 64-bit FreeBSD 9.0:
FreeBSD alphadrive.sd73.bc.ca 9.0-RELEASE FreeBSD 9.0-RELEASE #0
r229803: Sun Jan  8 00:43:00 PST 2012
r...@alphadrive.sd73.bc.ca:/usr/obj/usr/src/sys/ZFSHOST90  amd64

Hardware is fairly generic:
  - SuperMicro H8DGi-F motherboard
  - AMD Opteron 6128 CPU (8 cores)
  - 24 GB of DDR3 RAM
  - 3x SuperMicro AOC-USAS-L8i SATA controllers
  - 24x harddrives ranging from 500 GB to 2.0 TB (6 of each kind in
raidz2 vdevs)
  - 64 GB SSD partitioned for OS, swap, with 32 GB for L2ARC

Filesystem properties:
# zfs get all storage/logs/rsync
NAMEPROPERTY  VALUE  SOURCE
storage/logs/rsync  type  filesystem -
storage/logs/rsync  creation  Tue May 10  9:55 2011  -
storage/logs/rsync  used  5.48M  -
storage/logs/rsync  available 4.61T  -
storage/logs/rsync  referenced5.48M  -
storage/logs/rsync  compressratio 5.93x  -
storage/logs/rsync  mounted   no -
storage/logs/rsync  quota none   default
storage/logs/rsync  reservation   none   default
storage/logs/rsync  recordsize128K   default
storage/logs/rsync  mountpoint/var/log/rsync local
storage/logs/rsync  sharenfs  offdefault
storage/logs/rsync  checksum  sha256
inherited from storage
storage/logs/rsync  compression   lzjb
inherited from storage
storage/logs/rsync  atime off
inherited from storage
storage/logs/rsync  devices   on default
storage/logs/rsync  exec  on default
storage/logs/rsync  setuidon default
storage/logs/rsync  readonly  offdefault
storage/logs/rsync  jailedoffdefault
storage/logs/rsync  snapdir   visible
inherited from storage
storage/logs/rsync  aclmode   discarddefault
storage/logs/rsync  aclinheritrestricted default
storage/logs/rsync  canmount  on default
storage/logs/rsync  xattr on default
storage/logs/rsync  copies1  default
storage/logs/rsync  version   5  -
storage/logs/rsync  utf8only  off-
storage/logs/rsync  normalization none   -
storage/logs/rsync  casesensitivity   sensitive  -
storage/logs/rsync  vscan offdefault
storage/logs/rsync  nbmandoffdefault
storage/logs/rsync  sharesmb  offdefault
storage/logs/rsync  refquota  none   default
storage/logs/rsync  refreservationnone   default
storage/logs/rsync  primarycache  all
inherited from storage
storage/logs/rsync  secondarycachemetadata
inherited from storage
storage/logs/rsync  usedbysnapshots   0  -
storage/logs/rsync  usedbydataset 5.48M  -
storage/logs/rsync  usedbychildren0  -
storage/logs/rsync  usedbyrefreservation  0  -
storage/logs/rsync  logbias   latencydefault
storage/logs/rsync  dedup sha256
inherited from storage
storage/logs/rsync  mlslabel -
storage/logs/rsync  sync  standard   default
storage/logs/rsync  refcompressratio  5.93x

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs

2012-04-26 Thread Freddie Cash
On Thu, Apr 26, 2012 at 4:34 AM, Deepak Honnalli
 wrote:
>    cachefs is present in Solaris 10. It is EOL'd in S11.

And for those who need/want to use Linux, the equivalent is FSCache.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Aaron Toponce: Install ZFS on Debian GNU/Linux

2012-04-18 Thread Freddie Cash
On Wed, Apr 18, 2012 at 7:54 AM, Cindy Swearingen
 wrote:
>>Hmmm, how come they have encryption and we don't?
>
> As in Solaris releases, or some other "we"?

I would guess he means Illumos, since it's mentioned in the very next
sentence.  :)

"Hmmm, how come they have encryption and we don't?
Can it be backported to illumos ..."

It's too bad Oracle hasn't followed through (yet?) with their promise
to open-source the ZFS (and other CDDL-licensed?) code in Solaris 11.
:(
-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Drive upgrades

2012-04-13 Thread Freddie Cash
On Fri, Apr 13, 2012 at 9:30 AM, Tim Cook  wrote:
> You will however have an issue replacing them if one should fail.  You need
> to have the same block count to replace a device, which is why I asked for a
> "right-sizing" years ago.  Deaf ears :/

I thought ZFSv20-something added a "if the blockcount is within 10%,
then allow the replace to succeed" feature, to work around this issue?

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Apple's ZFS-alike - Re: Does raidzN actually protect against bitrot? If yes - how?

2012-01-16 Thread Freddie Cash
On Mon, Jan 16, 2012 at 8:22 AM, Bob Friesenhahn
 wrote:
> On Mon, 16 Jan 2012, David Magda wrote:
>> http://mail.opensolaris.org/pipermail/zfs-discuss/2009-October/033125.html
>>
>> Perhaps Apple can come to an agreement with Oracle when they couldn't with
>> Sun.
>
> This seems very unlikely since the future needs of Apple show little
> requirement for zfs.  Apple only offers one computer model which provides
> ECC and a disk drive configuration which is marginally useful for zfs.  This
> computer model has a very limited user-base which is primarily people in the
> video and desktop imaging/publishing world. Apple already exited the server
> market, for which they only ever offered single limited-use model (Xserve).

As an FS for their TimeMachine NAS boxes (Time Capsule, I think),
though, ZFS would be a good fit.  Similar to how the Time Slider works
in Sun/Oracle's version of Nautilus/GNOME2.  Especially if they expand
the boxes to use 4 drives (2x mirror), and had the pool
pre-configured.

As a desktop/laptop FS, though, ZFS (in its current incarnation) is
overkill and unwieldy.  Especially since most of these machines only
have room for a single HD.

> There would likely be a market if someone was to sell pre-packaged zfs for
> Apple OS-X at a much higher price than the operating system itself.
>
> Bob
> --
> Bob Friesenhahn
> bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SAS HBA's with No Raid

2011-12-06 Thread Freddie Cash
On Tue, Dec 6, 2011 at 12:57 PM, Karl Rossing  wrote:

> I'm thinking of getting LSI 9212-4i4e(4 internal and 4 external ports) to
> replace a SUN Storagetek raid card.
>
> The StorageTek raid card seems to want to have it's drives initialized and
> volumes created on it before they are presented to zfs. I can't find a way
> of telling it just to be an HBA.
>
> I'd prefer to have the ability to move the sas/sata drives to another
> system and has the sas/sata controller read the drives without if there a
> problem with the card.
>
> Is it possible to disable the raid on an LSI 9212-4i4e and have the drives
> read by a simple sas/sata card? I'm open to other brands but I need
> internal and external ports.
>

If you only need 3 Gbps SAS/SATA, there's the SuperMicro AOC-SAS-L4i.  It
uses the older LSI1068 chipset, so it's limited to 2 TB harddrives:
http://www.supermicro.com/products/accessories/addon/AOC-USAS-L4i_R.cfm

You could always check if there's an IT-mode firmware for the 921204i4e
card available on the LSI website, and flash that onto the card.  That
"disables"/removes the RAID functionality from the card, turning it into
just an HBA.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS not starting

2011-12-01 Thread Freddie Cash
>
> The system has 6GB of RAM and a 10GB swap partition. I added a 30GB
> swap file but this hasn't helped.
>

ZFS doesn't use swap for the ARC (it's wired aka unswappable memory).  And
ZFS uses the ARC for dedupe support.

You will need to find a lot of extra RAM to stuff into that machine in
order for it to boot correctly, load the dedeupe tables into ARC, process
the intent log, and then import the pool.

And, you'll need that extra RAM in order to destroy the ZFS filesystem that
has dedupe enabled.

Basically, your DDT (dedupe table) is running you out of ARC space and
livelocking (or is it deadlocking, never can keep those terms straight) the
box.

You can remove the RAM once you have things working again.  Just don't
re-enable dedupe until you have at least 16 GB of RAM in the box that can
be dedicated to ZFS.  And be sure to add a cache device to the pool.

I just went through something similar with an 8 GB ZFS box (RAM is on
order, but purchasing dept ordered from wrong supplier so we're stuck
waiting for it to arrive) where I tried to destroy dedupe'd filesystem.
 Exact same results as you.  Stole RAM out of a different server
temporarily to get things working on this box again.


> # sysctl hw.physmem
> hw.physmem: 6363394048
>
> # sysctl vfs.zfs.arc_max
> vfs.zfs.arc_max: 5045088256
>
> (I lowered arc_max to 1GB but hasn't helped)
>

DO NOT LOWER THE ARC WHEN DEDUPE ENABLED!!

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Remove corrupt files from snapshot

2011-11-15 Thread Freddie Cash
On Tue, Nov 15, 2011 at 8:07 AM,  wrote:

> Thanks anyone for the help, finally I removed corrupt files from the
> "current view" of the file system and left the snapshots as they were. This
> way at least the incremental backup continues. (It is sad that snapshots
> are so rigid that even corruption is permanent. What more interesting is
> that, if snapshots are read only, how can they become corrupted?)
>

The snapshot is read-only, meaning users cannot modify the data in the
snapshots.  However, there's nothing to prevents random bit flips in the
underlying storage.  Maybe the physical harddrive has a bad block and
gmirror copied the bad data to both disks, which flipped a bit or two in
the file you are using to back the ZFS pool.  Since ZFS only see a single
device, it has no internal redundancy and can't fix the corrupted bits,
only report that it found a block where the on-disk checksum doesn't match
the computed checksum of the block.

This is why you need to let ZFS handle redundancy via mirror vdevs, raidz
vdevs, or (at the very least) copies=2 property on the ZFS filesystem.  If
there's redundancy in the pool, then ZFS can correct the corruption.


> Would it make sense to do "zfs scrub" regularly and have a report sent,
> i.e. once a day, so discrepancy would be noticed beforehand? Is there
> anything readily available in the Freebsd ZFS package for this?
>

Without any redundancy in the pool, all a scrub will do is let you know
there is corrupted data in the pool.  It can't fix it.  Neither can gmirror
below the pool fix it.  All you can do is delete the corrupted file and
restore that file from backups.

You really should get rid of the gmirror setup, dedicate the entire disks
to ZFS, and create a pool using a mirror vdev.

File-backed ZFS vdevs really should only be used for testing purposes.
-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] about btrfs and zfs

2011-10-17 Thread Freddie Cash
On Mon, Oct 17, 2011 at 10:50 AM, Harry Putnam  wrote:

> Freddie Cash  writes:
>
> > If you only want RAID0 or RAID1, then btrfs is okay.  There's no support
> for
> > RAID5+ as yet, and it's been "in development" for a couple of years now.
>
> [...] snipped excellent information
>
> Thanks much, I've very appreciative of the good information.  Much
> better to hear from actual users than pouring thru webpages to get a
> picture.
>
> I'm googling on the citations you posted:
>
> FreeNAS and freebsd.
>
> Maybe you can give a little synopsis of those too.  I mean when it
> comes to utilizing zfs; is it much the same as if running it on
> solaris?
>
> FreeBSD 8-STABLE (what will become 8.3) and 9.0-RELEASE (will be released
hopefully this month) both include ZFSv28, the latest open-source version of
ZFS.  This includes raidz3 and dedupe support, same as OpenSolaris, Illumos,
and other OSol-based distros.  Not sure what the latest version of ZFS is in
Solaris 10.

The ZFS bits work the same as on Solaris with only 2 small differences:
  - sharenfs property just writes data to /etc/zfs/exports, which is read by
the standard NFS daemons (it's easier to just use /etc/exports to share ZFS
filesystems)
  - sharesmb property doesn't do anything; you have to use Samba to share
ZFS filesystems

The only real differences are how the OSes themselves work.  If you are
fluent in Solaris, then FreeBSD will seem strange (and vice-versa).  If you
are fluent in Linux, then FreeBSD will be similar (but a lot more cohesive
and "put-together").


> I knew freebsd had a port, but assumed it would stack up kind of sorry
> compared to Solaris zfs.
>
> Maybe something on the order of the linux fuse/zfs adaptation in usability.
>
> Is that assumption wrong?
>
> Absolutely, completely, and utterly false.  :)  The FreeBSD port of ZFS is
pretty much on par with ZFS on OpenSolaris.  The Linux port of ZFS is just
barely usable.  No comparison at all.  :)


> I actually have some experience with Freebsd, (long before there was a
> zfs port), and it is very linux like in many ways.
>
> That's like saying that OpenIndiana is very Linux-like in many ways.  :)


-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] about btrfs and zfs

2011-10-17 Thread Freddie Cash
On Mon, Oct 17, 2011 at 8:29 AM, Harry Putnam  wrote:

> This subject may have been ridden to death... I missed it if so.
>
> Not wanting to start a flame fest or whatever but
>
> As a common slob who isn't very skilled, I like to see some commentary
> from some of the pros here as to any comparison of zfs against btrfs.
>
> I realize btrfs is a lot less `finished' but I see it is starting to
> show up as an option on some linux install routines... Debian an
> ubuntu I noticed and probably many others.
>
> My main reasons for using zfs are pretty basic compared to some here
> and I wondered how btrfs stacks up on the basic qualities.
>

If you only want RAID0 or RAID1, then btrfs is okay.  There's no support for
RAID5+ as yet, and it's been "in development" for a couple of years now.

There's no working fsck tool for btrfs.  It's been "in development" and
"released in two weeks" for over a year now.  Don't put any data you need
onto btrfs.  It's extremely brittle in the face of power loss.

My biggest gripe with btrfs is that they have come up with all new
terminology that only applies to them.  Filesystem now means "a collection
of block devices grouped together".  While "sub-volume" is what we'd
normally call a "filesystem".  And there's a few other weird terms thrown in
as well.

>From all that I've read on the btrfs mailing list, and news sites around the
web, btrfs is not ready for production use on any system with data that you
can't afford to lose.

If you absolutely must run Linux on your storage server, for whatever
reason, then you probably won't be running ZFS.  For the next year or two,
it would probably be safer to run software RAID (md), with LVM on top, with
XFS or Ext4 on top.  It's not the easiest setup to manage, but it would be
safer than btrfs.

If you don't need to run Linux on your storage server, then definitely give
ZFS a try.  There are many options, depending on your level of expertise:
 FreeNAS for plug-n-play simplicity with a web GUI, FreeBSD for a simpler OS
that runs well on x86/amd64 systems, any of the OpenSolaris-based distros,
or even Solaris if you have the money.

With ZFS you get:
  - working single, dual, triple parity raidz (RAID5, RAID6, "RAID7"
equivalence)
  - n-way mirroring
  - end-to-end checksums for all data/metadata blocks
  - unlimited snapshots
  - pooled storage
  - unlimited filesystems
  - send/recv capabilities
  - built-in compression
  - built-in dedupe
  - built-in encryption (in ZFSv31, which is currently only in Solaris 11)
  - built-in CIFS/NFS sharing (on Solaris-based systems; FreeBSD uses normal
nfsd and Samba for this)
  - automatic hot-spares (on Solaris-based systems; FreeBSD only supports
manual spares)
  - and more

Maybe in another 5 years or so, Btrfs will be up to the point of ZFS today.
 Just image where ZFS will be in 5 years of so.  :)

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send and dedupe

2011-09-07 Thread Freddie Cash
Thanks for the replies everyone.  That was along the lines of what I was
thinking (-D is a "win" for network usage savings, if it works) but wanted
to double-check before I started playing with out new boxes.

Will be interesting to see whether or not -D works with ZFSv28 in FreeBSD
8-STABLE/9-BETA.  And whether or not "zfs send" is faster/better/easier/more
reliable than rsyncing snapshots (which is what we do currently).

Thanks for the info.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs send and dedupe

2011-09-06 Thread Freddie Cash
Just curious if anyone has looked into the relationship between zpool
dedupe, zfs zend dedupe, memory use, and network throughput.

For example, does 'zfs send -D' use the same DDT as the pool? Or does it
require more memory for it's own DDT, thus impacting performance of both?

If you have a deduped pool on both ends of the send, does -D make any
difference?

If neither pool is deduped, does -D make a difference?

We're waiting on a replacement backplane for our newest zfs-based storage
box, so won't be able to look into this ourselves until next week at the
earliest. Thought i'd check if anyone else has already done some comparisons
or benchmarks.

Cheers,
Freddie
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Space usage

2011-08-14 Thread Freddie Cash
On Sun, Aug 14, 2011 at 10:32 AM, Lanky Doodle wrote:

> I'm just uploading all my data to my server and the space used is much more
> than what i'm uploading;
>
> Documents = 147MB
> Videos = 11G
> Software=  1.4G
>
> By my calculations, that equals 12.547T, yet zpool list is showing 21G as
> being allocated;
>
> NAMESIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
> dpool  27.2T  21.2G  27.2T 0%  1.00x  ONLINE  -
>
> It doesn't look like any snapshots have been taken, according to zfs list
> -t snapshot. I've read about the 'copies' parameter but I didn't specify
> this when creating filesystems and I guess the default is 1?
>
> Any ideas?
>
> "zpool list" output show raw disk usage, including all redundant copies of
metadata, all redundant copies of data blocks, all redundancy accounted for
(mirror, raidz), etc.  This is the total number of physical bytes used on
the disk.

"zfs list" output shows the amount of usable pool storage allocated to the
data.  This is more indicative of what the end-user believes they are
using.  This doesn't include redundancy or anything like that, but does
include some compression and other info (I believe).

There's an excellent post in the archives that shows how "ls -l", du, df,
"zfs list", and "zpool list" work, and what each sees as "disk usage".
Don't remember exactly who wrote it, though.  It should definitely be added
to the ZFS Admin Guide, though.  :)


-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Question: adding a single drive to a mirrored zpool

2011-06-24 Thread Freddie Cash
On Fri, Jun 24, 2011 at 2:25 PM, alex stun  wrote:

> I have a zpool consisting of several mirrored vdevs. I was in the middle of
> adding another mirrored vdev today, but found out one of the new drives is
> bad. I will be receiving the replacement drive in a few days. In the mean
> time, I need the additional storage on my zpool.
>
> Is the command to add a single drive to a mirrored zpool:
> zpool add -f tank drive1?
>
> Does the -f command cause any issues?
> I realize that there will be no redundancy on that drive for a few days,
> and I can live with that as long as the rest of my zpool remains intact.
>

Note:  you will have 0 redundancy on the ENTIRE POOL, not just that one
vdev.  If that non-redundant vdev dies, you lose the entire pool.

Are you willing to take that risk, if one of the new drives is already DoA?

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] question about COW and snapshots

2011-06-16 Thread Freddie Cash
The OpenVMS filesystem is what you are looking for.

On Thu, Jun 16, 2011 at 12:09 AM, Simon Walter  wrote:

> On 06/16/2011 09:09 AM, Erik Trimble wrote:
>
>> We had a similar discussion a couple of years ago here, under the title "A
>> Versioning FS". Look through the archives for the full discussion.
>>
>> The jist is that application-level versioning (and consistency) is
>> completely orthogonal to filesystem-level snapshots and consistency.  IMHO,
>> they should never be mixed together - there are way too many corner cases
>> and application-specific memes for a filesystem to ever fully handle
>> file-level versioning and *application*-level data consistency.  Don't
>> mistake one for the other, and, don't try to *use* one for the other.
>>  They're completely different creatures.
>>
>>
> I guess that is true of the current FSs available. Though it would be nice
> to essentially have a versioning FS in the kernel rather than an application
> in userspace. But I regress. I'll use SVN and webdav.
>
> Thanks for the advice everyone.
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>



-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] changing vdev types

2011-06-01 Thread Freddie Cash
On Wed, Jun 1, 2011 at 2:34 PM, Freddie Cash  wrote:

> On Wed, Jun 1, 2011 at 12:45 PM, Eric Sproul  wrote:
>
>> On Wed, Jun 1, 2011 at 2:54 PM, Matt Harrison
>>  wrote:
>> > Hi list,
>> >
>> > I've got a pool thats got a single raidz1 vdev. I've just some more
>> disks in
>> > and I want to replace that raidz1 with a three-way mirror. I was
>> thinking
>> > I'd just make a new pool and copy everything across, but then of course
>> I've
>> > got to deal with the name change.
>> >
>> > Basically, what is the most efficient way to migrate the pool to a
>> > completely different vdev?
>>
>> Since you can't mix vdev types in a single pool
>
>
> Side note:  you most certainly can mix vdevs in a pool.  It's not
> recommended, nor even suggested, but it's most certainly possible.  You just
> have to add -f to the "zpool add" command.
>
> I use this at home, where my pool has a 3-disk raidz1 vdev (SATA1) and a
> 2-disk mirror vdev (IDE).  I can't connect to my home machine right now, so
> can't post the "zpool status" output.  But if you want to see it in action,
> I'll post when I get home.
>
> [fcash@rogue /home/fcash]$ zpool status pool
  pool: pool
 state: ONLINE
 scrub: none requested
config:

NAME   STATE READ WRITE CKSUM
pool   ONLINE   0 0 0
  raidz1   ONLINE   0 0 0
ad8ONLINE   0 0 0
ad10   ONLINE   0 0 0
ad9ONLINE   0 0 0
  mirror   ONLINE   0 0 0
ad4ONLINE   0 0 0
ad6ONLINE   0 0 0
cache
  label/cache  ONLINE   0 0 0

errors: No known data errors

[fcash@rogue /home/fcash]$ zpool get version pool
NAME  PROPERTY  VALUESOURCE
pool  version   14   default

[fcash@rogue /home/fcash]$ zfs get version pool
NAME  PROPERTY  VALUESOURCE
pool  version   3-

[fcash@rogue /home/fcash]$ uname -a
FreeBSD rogue.ashesofthe.net 8.1-RELEASE FreeBSD 8.1-RELEASE #0 r211388: Sun
Aug 22 15:18:36 PDT 2010
r...@rogue.ashesofthe.net:/usr/obj/usr/src-8/sys/ROGUE
i386

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] changing vdev types

2011-06-01 Thread Freddie Cash
On Wed, Jun 1, 2011 at 12:45 PM, Eric Sproul  wrote:

> On Wed, Jun 1, 2011 at 2:54 PM, Matt Harrison
>  wrote:
> > Hi list,
> >
> > I've got a pool thats got a single raidz1 vdev. I've just some more disks
> in
> > and I want to replace that raidz1 with a three-way mirror. I was thinking
> > I'd just make a new pool and copy everything across, but then of course
> I've
> > got to deal with the name change.
> >
> > Basically, what is the most efficient way to migrate the pool to a
> > completely different vdev?
>
> Since you can't mix vdev types in a single pool


Side note:  you most certainly can mix vdevs in a pool.  It's not
recommended, nor even suggested, but it's most certainly possible.  You just
have to add -f to the "zpool add" command.

I use this at home, where my pool has a 3-disk raidz1 vdev (SATA1) and a
2-disk mirror vdev (IDE).  I can't connect to my home machine right now, so
can't post the "zpool status" output.  But if you want to see it in action,
I'll post when I get home.

That said, you can't remove top-level vdevs (raidz*, mirror, single) from a
pool, so you can't "add" a new vdev and "remove" the old vdev to convert
between vdev types.

The only solution to the OP's question is to create a new pool, transfer the
data, and destroy the old pool.  There are several ways to do this.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Compatibility between Sun-Oracle Fishworks appliance zfs and other zfs implementations

2011-05-26 Thread Freddie Cash
On Wed, May 25, 2011 at 9:30 PM, Matthew Ahrens  wrote:

> On Wed, May 25, 2011 at 8:01 PM, Matt Weatherford 
> wrote:
>
>> pike# zpool get version internal
>> NAME  PROPERTY  VALUESOURCE
>> internal  version   28   default
>> pike# zpool get version external-J4400-12x1TB
>> NAME   PROPERTY  VALUESOURCE
>> external-J4400-12x1TB  version   28   default
>> pike#
>>
>> Can I expect to move my JBOD over to a different OS such as FreeBSD,
>> Illuminos, or Solaris  and be able to get my data off still?  (by this i
>> mean perform a zpool import on another platform)
>
>
> Yes, because zpool version 28 is supported in Illumos.  I'm sure Oracle
> Solaris does or will soon support it too.  According to Wikipedia, "the
> 9-current development branch [of FreeBSD] uses ZFS Pool version 28".
>

Correct.  FreeBSD 9-CURRENT (dev branch that will be released as 9.0 at some
point) as of March or April includes support for ZFSv28.

And FreeBSD 8-STABLE (dev branch that will be released as 8.3 at some point)
has patches available to support ZFSv28 here:
http://people.freebsd.org/~mm/patches/zfs/v28/

ZFS-on-FUSE for Linux currently only supports ZFSv23.

So you can "safely" use Illumos, Nexenta, FreeBSD, etc with ZFSv28.  You can
also use Solaris 11 Express, so long as you don't upgrade the pool version
(SolE includes ZFSv31).

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris vs FreeBSD question

2011-05-18 Thread Freddie Cash
On Wed, May 18, 2011 at 5:47 AM, Paul Kraus  wrote:
>    Over the past few months I have seen mention of FreeBSD a couple
> time in regards to ZFS. My question is how stable (reliable) is ZFS on
> this platform ?

ZFSv15, as shipped with FreeBSD 8.3, is rock stable in our uses.  We
have two servers running without any issues.  These are our backups
servers, doing rsync backups every night for ~130 remote Linux and
FreeBSD systems.  These are 5U rackmount boxes with:
  - Chenbro 5U storage chassis with 3-way redundant PSUs
  - Tyan h2000M motherboard
  - 2x AMD Opteron 2000-series CPUs (dual-core)
  - 8 GB ECC DDR2-SDRAM
  - 2x 8 GB CompactFlash (mirrored for OS install)
  - 2x 3Ware RAID controllers (12-port multi-lane)
  - 24x SATA harddrives (various sizes, configured in 3x 8-drive raidz2 vdevs)
  - FreeBSD 8.3 on both servers

ZFSv28, as shipped in FreeBSD -CURRENT (the development version that
will eventually become 9.0), is a little rough around the edges, but
is getting better over time.  There are also patches floating around
that allow you to use ZFSv28 with 8-STABLE (the development version
that will eventually become 8.4).  These are a little rougher around
the edges.

We have only been testing ZFS in storage servers for backups, but have
plans to start testing it NFS servers with an eye toward creating
NAS/SAN setups for virtual machines.

I also run it on my home media server, which is nowhere near "server
quality", without issues:
  - generic Intel motherboard
  - 2.8 GHz P4 CPU
  - 3 SATA1 harddrives connected to motherboard, in a raidz1 vdev
  - 2 IDE harddrives connected to a Promise PCI controller, in a mirror vdev
  - 2 GB non-ECC SDRAM
  - 2 GB USB stick for the OS install
  - FreeBSD 8.2

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Still no way to recover a "corrupted" pool

2011-05-16 Thread Freddie Cash
On Fri, Apr 29, 2011 at 5:17 PM, Brandon High  wrote:
> On Fri, Apr 29, 2011 at 1:23 PM, Freddie Cash  wrote:
>> Running ZFSv28 on 64-bit FreeBSD 8-STABLE.
>
> I'd suggest trying to import the pool into snv_151a (Solaris 11
> Express), which is the reference and development platform for ZFS.

Would not import in Solaris 11 Express.  :(  Could not even find any
pools to import.  Even when using "zpool import -d /dev/dsk" or any
other import commands.  Most likely due to using a FreeBSD-specific
method of labelling the disks.

I've since rebuilt the pool (a third time), using GPT partitions,
labels on the partitions, and using the labels in the pool
configuration.  That should make it importable across OSes (FreeBSD,
Solaris, Linux, etc).

It's just frustrating that it's still possible to corrupt a pool in
such a way that "nuke and pave" is the only solution.  Especially when
this same assertion was discussed in 2007 ... with no workaround or
fix or whatnot implemented, four years later.

What's most frustrating is that this is the third time I've built this
pool due to corruption like this, within three months.  :(

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Faster copy from UFS to ZFS

2011-05-03 Thread Freddie Cash
On Tue, May 3, 2011 at 12:36 PM, Erik Trimble  wrote:
> On 5/3/2011 8:55 AM, Brandon High wrote:
>>
>> On Tue, May 3, 2011 at 5:47 AM, Joerg Schilling
>>   wrote:
>>>
>>> But this is most likely slower than star and does rsync support sparse
>>> files?
>>
>> 'rsync -ASHXavP'
>>
>> -A: ACLs
>> -S: Sparse files
>> -H: Hard links
>> -X: Xattrs
>> -a: archive mode; equals -rlptgoD (no -H,-A,-X)
>>
>> You don't need to specify --whole-file, it's implied when copying on
>> the same system. --inplace can play badly with hard links and
>> shouldn't be used.
>>
>> It probably will be slower than other options but it may be more
>> accurate, especially with -H
>>
>> -B
>
> rsync is indeed slower than star; so far as I can tell, this is due almost
> exclusively to the fact that rsync needs to build an in-memory table of all
> work being done *before* it starts to copy.

rsync 2.x works that way., building a complete list of
files/directories to copy before starting the copy.

rsync 3.x doesn't.  3.x builds an initial file list for the first
directory and then starts copying files while continuing to build the
list of files, so there's only a small pause at the beginning.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Still no way to recover a "corrupted" pool

2011-04-29 Thread Freddie Cash
On Fri, Apr 29, 2011 at 5:00 PM, Alexander J. Maidak  wrote:
> On Fri, 2011-04-29 at 16:21 -0700, Freddie Cash wrote:
>> On Fri, Apr 29, 2011 at 1:23 PM, Freddie Cash  wrote:
>> > Is there anyway, yet, to import a pool with corrupted space_map
>> > errors, or "zio-io_type != ZIO_TYPE_WRITE" assertions?
>
>>...
>> Well, by commenting out the VERIFY line for zio->io_type !=
>> ZIO_TYPE_WRITE and compiling a new kernel, I can import the pool, but
>> only with -F and -o readonly=on.  :(
>>
>> It's great that I can get the pool to import read-only, so the data is
>> still available.  But that really doesn't help when I've already
>> rebuilt this pool twice due to this issue.
>>
>
> Just curious, did you try an import or recovery with Solaris 11 Express
> build 151a?  I expect it wouldn't have made a difference, but I'd be
> curious to know.

No, that's on the menu for next week, trying a couple OpenSolaris,
Solaris Express, Nexenta LiveCDs to see if they make a difference.


-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Still no way to recover a "corrupted" pool

2011-04-29 Thread Freddie Cash
On Fri, Apr 29, 2011 at 1:23 PM, Freddie Cash  wrote:
> Is there anyway, yet, to import a pool with corrupted space_map
> errors, or "zio-io_type != ZIO_TYPE_WRITE" assertions?
>
> I have a pool comprised of 4 raidz2 vdevs of 6 drives each.  I have
> almost 10 TB of data in the pool (3 TB actual disk space used due to
> dedup and compression).  While testing various failure modes, I have
> managed to corrupt the pool to the point where it won't import.  So
> much for being bulletproof.  :(
>
> If I try to import the pool normally, it give corrupted space_map errors.
>
> If I try to "import -F" the pool, it complains that "zio-io_type !=
> ZIO_TYPE_WRITE".
>
> I've also tried the above with "-o readonly=on" and "-R
> some/other/root" variations.
>
> There's also no zfs.cache file anywhere to be found, and creating a
> blank file doesn't help.
>
> Does this mean that a 10 TB pool can be lost due to a single file
> being corrupted, or a single piece of pool metadata being corrupted?
> And that there's *still* no recovery tools for situations like this?
>
> Running ZFSv28 on 64-bit FreeBSD 8-STABLE.
>
> For the curious, the failure mode that causes this?  Rebooting while 8
> simultaneous rsyncs were running, which were not killed by the
> shutdown process for some reason, which prevented 8 ZFS filesystems
> from being unmounted, which prevented the pool from being exported
> (even though I have a "zfs unmount -f" and "zpool export -f"
> fail-safe), which locked up the shutdown process requiring a power
> reset.

Well, by commenting out the VERIFY line for zio->io_type !=
ZIO_TYPE_WRITE and compiling a new kernel, I can import the pool, but
only with -F and -o readonly=on.  :(

Trying to import it read-write gives dmu_free_range errors and panics
the system.

Compiling a kernel with that assertion commented out allows the pool
to be imported read-only.  Importing it read-write gives a bunch of
other dmu panics.  :( :( :(

How can it be that after 28 pool format revisions and 5+ years of
development, ZFS is still this brittle?  I've found lots of threads
from 2007 about this very issue, with "don't do that" and "it's not an
issue" and "there's no need for a pool consistency checker" and other
similar "head in the sand" responses.  :(  But still no way to prevent
or fix this form of corruption.

It's great that I can get the pool to import read-only, so the data is
still available.  But that really doesn't help when I've already
rebuilt this pool twice due to this issue.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Still no way to recover a "corrupted" pool

2011-04-29 Thread Freddie Cash
Is there anyway, yet, to import a pool with corrupted space_map
errors, or "zio-io_type != ZIO_TYPE_WRITE" assertions?

I have a pool comprised of 4 raidz2 vdevs of 6 drives each.  I have
almost 10 TB of data in the pool (3 TB actual disk space used due to
dedup and compression).  While testing various failure modes, I have
managed to corrupt the pool to the point where it won't import.  So
much for being bulletproof.  :(

If I try to import the pool normally, it give corrupted space_map errors.

If I try to "import -F" the pool, it complains that "zio-io_type !=
ZIO_TYPE_WRITE".

I've also tried the above with "-o readonly=on" and "-R
some/other/root" variations.

There's also no zfs.cache file anywhere to be found, and creating a
blank file doesn't help.

Does this mean that a 10 TB pool can be lost due to a single file
being corrupted, or a single piece of pool metadata being corrupted?
And that there's *still* no recovery tools for situations like this?

Running ZFSv28 on 64-bit FreeBSD 8-STABLE.

For the curious, the failure mode that causes this?  Rebooting while 8
simultaneous rsyncs were running, which were not killed by the
shutdown process for some reason, which prevented 8 ZFS filesystems
from being unmounted, which prevented the pool from being exported
(even though I have a "zfs unmount -f" and "zpool export -f"
fail-safe), which locked up the shutdown process requiring a power
reset.

:(

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Faster copy from UFS to ZFS

2011-04-29 Thread Freddie Cash
On Fri, Apr 29, 2011 at 10:53 AM, Dan Shelton  wrote:
> Is anyone aware of any freeware program that can speed up copying tons of
> data (2 TB) from UFS to ZFS on same server?

rsync, with --whole-file --inplace (and other options), works well for
the initial copy.

rsync, with --no-whole-file --inplace (and other options), works
extremely fast for updates.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)

2011-04-25 Thread Freddie Cash
On Mon, Apr 25, 2011 at 10:55 AM, Erik Trimble  wrote:
> Min block size is 512 bytes.

Technically, isn't the minimum block size 2^(ashift value)?  Thus, on
4 KB disks where the vdevs have an ashift=12, the minimum block size
will be 4 KB.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A resilver record?

2011-03-21 Thread Freddie Cash
On Sun, Mar 20, 2011 at 12:57 AM, Ian Collins  wrote:
>  Has anyone seen a resilver longer than this for a 500G drive in a riadz2
> vdev?
>
> scrub: resilver completed after 169h25m with 0 errors on Sun Mar 20 19:57:37
> 2011
>              c0t0d0  ONLINE       0     0     0  769G resilvered
>
> and I told the client it would take 3 to 4 days!

Our main backups storage server has 3x 8-drive raidz2 vdevs.  Was
replacing the 500 GB drives in one vdev with 1 TB drives.  The last 2
drives took just under 300 hours each.  :(  The first couple drives
took approx 150 hours each, and then it just started taking longer and
longer for each drive.


-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] "Invisible" snapshot/clone

2011-03-16 Thread Freddie Cash
On Wed, Mar 16, 2011 at 7:23 PM, Edward Ned Harvey
 wrote:
> P.S.  If your primary goal is to use ZFS, you would probably be better
> switching to nexenta or openindiana or solaris 11 express, because they all
> support ZFS much better than freebsd.  If instead, your primary goal is to
> do something free-bsd-ish, and it's just coincidence that an old version of
> ZFS happens to be the best filesystem available in freebsd, so be it.
> Freebsd is good in its own ways...  Even an old version of ZFS is better
> than EXT3 or UFS.   ;-)

FreeBSD 9-CURRENT supports ZFSv28.

And there are patches available for testing ZFSv28 on FreeBSD 8-STABLE.

Let's keep the OS pot shots to a minimum, eh?
-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Dedup question

2011-01-28 Thread Freddie Cash
On Fri, Jan 28, 2011 at 1:38 PM, Igor P  wrote:
> I created a zfs pool with dedup with the following settings:
> zpool create data c8t1d0
> zfs create data/shared
> zfs set dedup=on data/shared
>
> The thing I was wondering about was it seems like ZFS only dedup at the file 
> level and not the block. When I make multiple copies of a file to the store I 
> see an increase in the deup ratio, but when I copy similar files the ratio 
> stays at 1.00x.

Easiest way to test it is to create a 10 MB file full of random data:
  $ dd if=/dev/random of=random.10M bs=1M count=10

Copy that to the pool a few times under different names to watch the
dedupe ratio increase, basically linearly.

Then open the file in a text editor and change the last few lines of
the files.  Copy that to the pool a few times under new names.  Watch
the dedupe ratio increase, but not linearly as the last block or three
of the file will be different.

Repeat changing different lines in the file, and watch as disk usage
only increases a little, since the files still "share" (or have in
common) a lot of blocks.

ZFS dedupe happens at the block layer, not the file layer.


-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] reliable, enterprise worthy JBODs?

2011-01-25 Thread Freddie Cash
On Tue, Jan 25, 2011 at 10:04 AM, Philip Brown  wrote:
> So, another hardware question :)
>
> ZFS has been touted as taking maximal advantage of disk hardware, to the 
> point where it can be used efficiently and cost-effectively on JBODs, rather 
> than having to throw more expensive RAID arrays at it.
>
> Only trouble is.. JBODs seem to have disappeared :(
> Sun/Oracle has discontinued its j4000 line, with no replacement that I can 
> see.
>
> IBM seems to have some nice looking hardware in the form of its EXP3500 
> "expansion trays"... but they only support it connected to an IBM (SAS) 
> controller... which is only supported when plugged into IBM server hardware :(
>
> Any other suggestions for (large-)enterprise-grade, supported JBOD hardware 
> for ZFS these days?
> Either fibre or SAS would be okay.

Define "enterprise-grade".  :)  Are you talking about price,
performance, warranty, service, support, fancy name, etc?

For example, SuperMicro has several rackmount chassis (2U, 4U, 5U)
that can act as either storage servers (motherboard inside the case)
or storage trays (no motherboard, extra drive bays, SAS connectors).
Some consider those enterprise-grade (afterall, it's 6 Gbps SAS,
multilaned, multipathed, but not multi-), some don't (it's not
IBM/Oracle/HP/etc, oh noes!!).

Chenbro also has similar setups to SuperMicro.  Again, it's not
"big-name storage company" nor "uber-expensive", but the technology is
the same.  Is that enterprise-grade?

:D

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] reliable, enterprise worthy JBODs?

2011-01-25 Thread Freddie Cash
On Tue, Jan 25, 2011 at 10:04 AM, Philip Brown  wrote:
> So, another hardware question :)
>
> ZFS has been touted as taking maximal advantage of disk hardware, to the 
> point where it can be used efficiently and cost-effectively on JBODs, rather 
> than having to throw more expensive RAID arrays at it.
>
> Only trouble is.. JBODs seem to have disappeared :(
> Sun/Oracle has discontinued its j4000 line, with no replacement that I can 
> see.
>
> IBM seems to have some nice looking hardware in the form of its EXP3500 
> "expansion trays"... but they only support it connected to an IBM (SAS) 
> controller... which is only supported when plugged into IBM server hardware :(
>
> Any other suggestions for (large-)enterprise-grade, supported JBOD hardware 
> for ZFS these days?
> Either fibre or SAS would be okay.

Define "enterprise-grade".  :)  Are you talking about price,
performance, warranty, service, support, fancy name, etc?

For example, SuperMicro has several rackmount chassis (2U, 4U, 5U)
that can act as either storage servers (motherboard inside the case)
or storage trays (no motherboard, extra drive bays, SAS connectors).
Some consider those enterprise-grade (afterall, it's 6 Gbps SAS,
multilaned, multipathed, but not multi-), some don't (it's not
IBM/Oracle/HP/etc, oh noes!!).

Chenbro also has similar setups to SuperMicro.  Again, it's not
"big-name storage company" nor "uber-expensive", but the technology is
the same.  Is that enterprise-grade?

:D

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] mixing drive sizes within a pool

2011-01-13 Thread Freddie Cash
On Wed, Jan 12, 2011 at 5:45 PM, Wim van den Berge
 wrote:
> I have a pile of aging Dell MD-1000's laying around that have been replaced 
> by new primary storage. I've been thinking of using them to create some 
> archive/backup storage for my primary ZFS systems.
>
> Unfortunately they do not all contain identical drives. Some of the older 
> MD-1000's have 15x500GB drives, some have all 750's some all 1TB's. Since 
> size and integrity matters here, not speed. I was thinking of creating one 
> large pool containing multiple RAIDZ2's. Each RAIDZ2 would be one MD-1000 and 
> would have 14 drives, reserving one drive per shelf as a spare.
>
> The question is: The final pool would have spares of 500GB, 750GB and 1TB. Is 
> ZFS smart enough to pick the right one if a drive fails? If not, is there a 
> way to make this scenario work and still combine all available storage in a 
> single pool?

While it may not be recommended as a best practise, there's nothing
"wrong" with using vdevs of different sizes.  You can even use vdevs
of different types (mirror + raidz1 + raidz2 + raidz3) in the same
pool, although you do have to force (-f) the add command.

My home ZFS box uses a 3-drive raidz1 vdev and a 2-drive mirror vdev
in the same pool, using 160 GB SATA and 120 GB IDE drives.

My work storage boxes use 8-drive raidz2 vdevs, mixed between 0.5 TB
SATA, 1.0 TB SATA, and 1.5 TB SATA.

Performance won't be as good as it could be due to the uneven
striping, especially when the smaller vdevs get to be full.  But it
works.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A few questions

2010-12-16 Thread Freddie Cash
On Thu, Dec 16, 2010 at 12:59 AM, Lanky Doodle  wrote:
> I have been playing with ZFS for a few days now on a test PC, and I plan to 
> use if for my home media server after being very impressed!

Works great for that.  Have a similar setup at home, using FreeBSD.

> Also, at present I have 5x 1TB drives to use in my home server so I plan to 
> create a RAID-Z1 pool which will have my shares on it (Movies, Music, 
> Pictures etc). I then plan to increase this in sets of 5 (so another 5x 1TB 
> drives in Jan and nother 5 in Feb/March so that I can avoid all disks being 
> from the same batch). I did plan on creating seperate zpoolz with each set of 
> 5 drives;

No no no.  Create 1 pool.

Create the pool initially with a single 5-drive raidz vdev.

Later, add the next five drives to the system, and create a new raidz
vdev *in the same pool*.  Voila.  You now have the equivalent of a
RAID50, as ZFS will stripe writes to both vdevs, increaseing the
overall size *and* speed of the pool.

Later, add the next five drives to the system, and create a new raidz
vdev in the same pool.  Voila.  You now have a pool with 3 vdevs, with
read/writes being striped across all three.

You can still lose 3 drives (1 per vdev) before losing the pool.

The commands to do this are along the lines of:

# zpool create mypool raidz disk1 disk2 disk3 disk4 disk5

# zpool add mypool raidz disk6 disk7 disk8 disk9 disk10

# zpool add mypool raidz disk11 disk12 disk13 disk14 disk15

Creating 1 pool gives you the best performance and the most
flexibility.  Use separate filesystems on top of that pool if you want
to tweak all the different properties.

Going with 1 pool also increases your chances for dedupe, as dedupe is
done at the pool level.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ... open source moving forward?

2010-12-10 Thread Freddie Cash
On Fri, Dec 10, 2010 at 5:31 AM, Edward Ned Harvey
 wrote:
> It's been a while since I last heard anybody say anything about this.
> What's the latest version of publicly released ZFS?  Has oracle made it
> closed-source moving forward?
>
> Nexenta ... openindiana ... etc ... Are they all screwed?

ZFSv28 is available for FreeBSD 9-CURRENT.

We won't know until after Oracle releases Solaris 11 whether or not
they'll live up to their promise to open the source to ZFSv31.  Until
Solaris 11 is released, there's really not much point in debating it.


-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] accidentally added a drive?

2010-12-06 Thread Freddie Cash
On Mon, Dec 6, 2010 at 7:49 AM, Tomas Ögren  wrote:
> On 05 December, 2010 - Chris Gerhard sent me these 0,3K bytes:
>
>> Alas you are hosed.  There is at the moment no way to shrink a pool which is 
>> what you now need to be able to do.
>>
>> back up and restore I am afraid.
>
> .. or add a mirror to that drive, to keep some redundancy.

And to ad4s1d as well, since it's also a stand-alone, non-redundand vdev.

Since there are two drives that are non-redundant, it would probably
be best to re-do the pool.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] how to quiesce and unquiesc zfs and zpool for array/hardware snapshots ?

2010-11-15 Thread Freddie Cash
On Sun, Nov 14, 2010 at 11:45 PM, sridhar surampudi
 wrote:
> Thanks you for the details. I am aware of export/import of zpool. but with 
> zpool export pool is not available for writes.
>
> is there a way I can freeze zfs file system at file system level.
> As an example, for JFS file system using "chfs -a freeze ..." option.
> So if I am taking a hardware snapshot, I will run chfs at file system (jfs ) 
> level then fire commands to take snapshot at harware level (or for array 
> LUNS) to get consistent backup. I thins case, no down time is required for 
> the file system.
>
> Once snapshot is done, i will do qu quiesce / freeze the file system.
>
> Looking for how to do similar freeze for zfs file system.

You would need to do it at the *pool* level, not the filesystem level.
 And the only way to guarantee that no writes will be done to a pool
is to take the pool offline via zpool export.

One more reason to stop using hardware storage systems and just let
ZFS handle the drives directly.  :)
-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recovering from corrupt ZIL

2010-10-24 Thread Freddie Cash
On Sun, Oct 24, 2010 at 6:51 AM, Roy Sigurd Karlsbakk  
wrote:
> he's using fbsd, so his pool is on v11 or something

FreeBSD before 7.3 is ZFSv6.

FreeBSD 7.3, 8.0, and 8.1 are ZFSv14.

FreeBSD 8-STABLE (which will become 8.2) is ZFSv15.

FreeBSD 9-CURRENT (which will become 9.0) has experimental patches
available for ZFSv28.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] vdev failure -> pool loss ?

2010-10-18 Thread Freddie Cash
On Mon, Oct 18, 2010 at 8:51 AM, Darren J Moffat
 wrote:
> On 18/10/2010 16:48, Freddie Cash wrote:
>>
>> On Mon, Oct 18, 2010 at 6:34 AM, Edward Ned Harvey
>>  wrote:
>>>>
>>>> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
>>>> boun...@opensolaris.org] On Behalf Of Freddie Cash
>>>>
>>>> If you lose 1 vdev, you lose the pool.
>>>
>>> As long as 1 vdev is striped and not mirrored, that's true.
>>> You can only afford to lose a vdev, if your vdev itself is mirrored.
>>>
>>> You could, for example, create 2 vdev's of raidz2, and instead of
>>> striping
>>> them together, mirror them.  Then you could lose one of the raidz2's, and
>>> the other half of the mirror is still up.
>>
>> How does one do that?  Is it as simple as using "attach" instead of
>> "add" when creating the new vdev?
>>
>> For example:
>>
>> zpool create mypool raidz2 disk0 disk1 disk2 disk3  disk4  disk5  disk6
>> zpool attach mypool raidz2 disk7 disk8 disk9 disk10 disk11 disk12 disk13
>>
>> Would it work for adding a third raidz2 vdev?
>
> There is no support in ZFS for nested vdevs like this.

That's what I though, but wanted to make sure.  Thanks.  :)

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] vdev failure -> pool loss ?

2010-10-18 Thread Freddie Cash
On Mon, Oct 18, 2010 at 6:34 AM, Edward Ned Harvey  wrote:
>> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
>> boun...@opensolaris.org] On Behalf Of Freddie Cash
>>
>> If you lose 1 vdev, you lose the pool.
>
> As long as 1 vdev is striped and not mirrored, that's true.
> You can only afford to lose a vdev, if your vdev itself is mirrored.
>
> You could, for example, create 2 vdev's of raidz2, and instead of striping
> them together, mirror them.  Then you could lose one of the raidz2's, and
> the other half of the mirror is still up.

How does one do that?  Is it as simple as using "attach" instead of
"add" when creating the new vdev?

For example:

zpool create mypool raidz2 disk0 disk1 disk2 disk3  disk4  disk5  disk6
zpool attach mypool raidz2 disk7 disk8 disk9 disk10 disk11 disk12 disk13

Would it work for adding a third raidz2 vdev?

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] vdev failure -> pool loss ?

2010-10-17 Thread Freddie Cash
On Sun, Oct 17, 2010 at 12:31 PM, Simon Breden  wrote:
> OK, thanks Ian.
>
> Another example:
>
> Would you lose all pool data if you had two vdevs: (1) a RAID-Z2 vdev and (2) 
> a two drive mirror vdev, and three drives in the RAID-Z2 vdev failed?

If you lose 1 vdev, you lose the pool.

Doesn't matter what the redundancy level used in the vdev, if you lose
more drives than the vdev can handle, you lose the whole pool.

An easy way to remember this:
  - think of vdevs as harddrives
  - think of the pool as a RAID-0 stripeset

Thus, if 1 drive dies, everything in the RAID-0 is lost.

Similar for the pool.


-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Optimal raidz3 configuration

2010-10-15 Thread Freddie Cash
On Fri, Oct 15, 2010 at 3:16 PM, Marty Scholes  wrote:
> My home server's main storage is a 22 (19 + 3) disk RAIDZ3 pool backed up 
> hourly to a 14 (11+3) RAIDZ3 backup pool.

How long does it take to resilver a disk in that pool?  And how long
does it take to run a scrub?

When I initially setup a 24-disk raidz2 vdev, it died trying to
resilver a single 500 GB SATA disk.  I/O under 1 MBps, all 24 drives
thrashing like crazy, could barely even login to the system and type
onscreen.  It was a nightmare.

That, and normal (no scrub, no resilver) disk I/O was abysmal.

Since then, I've avoided any vdev with more than 8 drives in it.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Increase size of 2-way mirror

2010-10-06 Thread Freddie Cash
On Wed, Oct 6, 2010 at 12:14 PM, Tony MacDoodle  wrote:
> Is it possible to add 2 disks to increase the size of the pool below?

Yes.  zpool add testpool mirror devname1 devname2

That will add a third mirror vdev to the pool.

> NAME STATE READ WRITE CKSUM
>   testpool ONLINE 0 0 0
> mirror-0 ONLINE 0 0 0
> c1t2d0 ONLINE 0 0 0
> c1t3d0 ONLINE 0 0 0
> mirror-1 ONLINE 0 0 0
> c1t4d0 ONLINE 0 0 0
> c1t5d0 ONLINE 0 0 0

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is there a way to limit ZFS File Data but maintain room for the ARC to cache metadata

2010-10-01 Thread Freddie Cash
On Fri, Oct 1, 2010 at 11:46 AM, David Blasingame Oracle
 wrote:
> I'm working on this scenario in which file system activity appears to cause
> the arc cache to evict meta data.  I would like to have a preference to keep
> the metadata in cache over ZFS File Data
>
> What I've notice on import of a zpool the arc_meta_used goes up
> significantly.  ZFS meta data operations usually run pretty good.  However
> over time with IO Operations the cache get's evicted and arc_no_grow get
> set.



> So, I would like to limit the amount of ZFS File Data that can be used and
> keep the arc cache warm with metadata.  Any suggestions?

Would adding a cache device (L2ARC) and setting primarycache=metadata
and secondarycache=all on the root dataset do what you need?

That way ARC is used strictly for metadata, and L2ARC is used for metadata+data.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Any zfs fault injection tools?

2010-09-24 Thread Freddie Cash
On Fri, Sep 24, 2010 at 10:33 AM, Peter Taps  wrote:
> Command "zpool status" reports disk status that includes read errors, write 
> errors, and checksum errors. These values have always been 0 in our test 
> environment. Is there any tool out there that can corrupt the state? At the 
> very least, we should be able to write to the disk directly and mess up the 
> checksum.

The following works well:
  dd if=/dev/random of=/dev/disk-node bs=1M count=1 seek=whatever

Change bs (block size) and count (number of blocks) to change the
amount of corruption you cause.  Change seek (number of blocks to
skip) to change the location of the corruption on the disk.  And check
disk-node to select the disk to corrupt.

If you have long enough cables, you can move a disk outside the case
and run a magnet over it to cause random errors.

Plugging/unplugging the SATA/SAS cable from a disk while doing normal
reads/writes is also fun.

Using the controller software (if a RAID controller) to delete
LUNs/disks is also fun.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] create mirror copy of existing zfs stack

2010-09-20 Thread Freddie Cash
On Mon, Sep 20, 2010 at 11:03 AM, sridhar surampudi
 wrote:
> I have a mirror pool tank having two devices underneath. Created in this way
>
> #zpool create tank mirror  c3t500507630E020CEAd1  c3t500507630E020CEAd0
>
> Created file system tank/home
> #zfs create tank/home
>
> Created another file system tank/home
> #zfs create tank/home/sridhar
> After that I have created files and directories under tank/home and 
> tank/home/sridhar.
>
> Now I detached 2nd device i.e c3t500507630E020CEAd0
>
> Since the above device is part of mirror pool, my guess it will have copy of 
> data which is there in other device till detach and will have metadata with 
> same pool name and file systems created.
>
> Question is is there any way I can create a new stack with renamed stack by 
> providing the new pool name to this detached device and should access the 
> same data for the c3t500507630E020CEAd0 which was created when it was in 
> mirrored pool under tank. ?

If your ZFS version is new enough, there is a "zpool split" command
you can use, for just this purpose.  It splits a mirror vdev in half
and assigns a new pool name to the drive you are removing.  You can
then use that drive to create a new pool, thus creating a duplicate of
the original pool.


-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] recordsize

2010-09-16 Thread Freddie Cash
On Thu, Sep 16, 2010 at 8:21 AM, Mike DeMarco  wrote:
> What are the ramifications to changing the recordsize of a zfs filesystem 
> that already has data on it?
>
> I want to tune down the recordsize to speed up very small reads to a size 
> that is more in line with the read size.
> can I do this on a filestystem that has data already on it and how does it 
> effect that data? zpool consists of 8 SANs Luns.

Changing any of the zfs properties only affects data written after the
change is made.  Thus, reducing the recordsize for a filesystem will
only affect newly written data.  Any existing data is not affected
until it is re-written or copied.


-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver = defrag?

2010-09-09 Thread Freddie Cash
On Thu, Sep 9, 2010 at 1:26 PM, Freddie Cash  wrote:
> On Thu, Sep 9, 2010 at 1:04 PM, Orvar Korvar
>  wrote:
>> A) Resilver = Defrag. True/false?
>
> False.  Resilver just rebuilds a drive in a vdev based on the
> redundant data stored on the other drives in the vdev.  Similar to how
> replacing a dead drive works in a hardware RAID array.
>
>> B) If I buy larger drives and resilver, does defrag happen?
>
> No.

Actually, thinking about it ... since the resilver is writing new data
to an empty drive, in essence, the drive is defragmented.

>> C) Does zfs send zfs receive mean it will defrag?
>
> No.

Same here, but only if the receiving pool has never had any snapshots
deleted or files deleted, so that there are no holes in the pool.
Then the newly written data will be contiguous (not fragmented).

> ZFS doesn't currently have a defragmenter.  That will come when the
> legendary block pointer rewrite feature is committed.
>
>
> --
> Freddie Cash
> fjwc...@gmail.com
>



-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver = defrag?

2010-09-09 Thread Freddie Cash
On Thu, Sep 9, 2010 at 1:04 PM, Orvar Korvar
 wrote:
> A) Resilver = Defrag. True/false?

False.  Resilver just rebuilds a drive in a vdev based on the
redundant data stored on the other drives in the vdev.  Similar to how
replacing a dead drive works in a hardware RAID array.

> B) If I buy larger drives and resilver, does defrag happen?

No.

> C) Does zfs send zfs receive mean it will defrag?

No.

ZFS doesn't currently have a defragmenter.  That will come when the
legendary block pointer rewrite feature is committed.


-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Suggested RaidZ configuration...

2010-09-08 Thread Freddie Cash
On Wed, Sep 8, 2010 at 6:27 AM, Edward Ned Harvey  wrote:
> Both of the above situations resilver in equal time, unless there is a bus
> bottleneck.  21 disks in a single raidz3 will resilver just as fast as 7
> disks in a raidz1, as long as you are avoiding the bus bottleneck.  But 21
> disks in a single raidz3 provides better redundancy than 3 vdev's each
> containing a 7 disk raidz1.

No, it (21-disk raidz3 vdev) most certainly will not resilver in the
same amount of time.  In fact, I highly doubt it would resilver at
all.

My first foray into ZFS resulted in a 24-disk raidz2 vdev using 500 GB
Seagate ES.2 and WD RE3 drives connected to 3Ware 9550SXU and 9650SE
multilane controllers.  Nice 10 TB storage pool.  Worked beatifully as
we filled it with data.  Had less than 50% usage when a disk died.

No problem, it's ZFS, it's meant to be easy to replace a drive, just
offline, swap, replace, wait for it to resilver.

Well, 3 days later, it was still under 10%, and every disk light was
still solid grrn.  SNMP showed over 100 MB/s of disk I/O continuously,
and the box was basically unusable (5 minutes to get the password line
to appear on the console).

Tried rebooting a few times, stopped all disk I/O to the machine (it
was our backups box, running rysnc every night for - at the time - 50+
remote servers), let it do its thing.

After 3 weeks of trying to get the resilver to complete (or even reach
50%), we pulled the plug and destroyed the pool, rebuilding it using
3x 8-drive raidz2 vdevs.  Things have been a lot smoother ever since.
Have replaced 8 of the drives (1 vdev) with 1.5 TB drives.  Have
replaced multiple dead drives.  Resilvers, while running outgoing
rsync all day and incoming rsync all night, take 3 days for a 1.5 TB
drive (with SNMP showing 300 MB/s disk I/O).

You most definitely do not want to use a single super-wide raidz vdev.
 It just won't work.

> Instead of the Best Practices Guide saying "Don't put more than ___ disks
> into a single vdev," the BPG should say "Avoid the bus bandwidth bottleneck
> by constructing your vdev's using physical disks which are distributed
> across multiple buses, as necessary per the speed of your disks and buses."

Yeah, I still don't buy it.  Even spreading disks out such that you
have 4 SATA drives per PCI-X/PCIe bus, I don't think you'd be able to
get a 500 GB SATA disk to resilver in a 24-disk raidz vdev (even a
raidz1) in a 50% full pool.  Especially if you are using the pool for
anything at the same time.


-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slog and TRIM support [SEC=UNCLASSIFIED]

2010-08-26 Thread Freddie Cash
On Wed, Aug 25, 2010 at 6:18 PM, Wilkinson, Alex
 wrote:
>    0n Wed, Aug 25, 2010 at 02:54:42PM -0400, LaoTsao ?? wrote:
>    >IMHO, U want -E for ZIL and -M for L2ARC
>
> Why ?

-E uses SLC flash, which is optimised for fast writes.  Ideal for a
ZIL which is (basically) write-only.
-M uses MLC flash, which is optimised for fast reads.  Ideal for an
L2ARC which is (basically) read-only.

-E tends to have smaller capacities, which is fine for ZIL.
-M tends to have larger capacities, which is perfect for L2ARC.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] (preview) Whitepaper - ZFS Pools Explained - feedback welcome

2010-08-26 Thread Freddie Cash
On Wed, Aug 25, 2010 at 10:57 PM, StorageConcepts
 wrote:
> Thanks for the feedback, the idea of it is to give people new to ZFS a 
> understanding of the terms and mode of operations to avoid common problems 
> (wide stripe pools etc.). Also agreed that it is a little NexentaStor 
> "tweaked" :)
>
> I think I have to rework the zil section anyhow because of 
> http://opensolaris.org/jive/thread.jspa?threadID=133294&tstart=0 - have to do 
> some experiments here - and I will also do a "dual command strategy" showing 
> nexentastor commands AND opensolaris commands when a command is shown.

I haven't finished reading it yet (okay, barely read through the
contents list), but would you be interested in the FreeBSD equivalents
for the commands, if they differ?

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Storage server hardwae

2010-08-25 Thread Freddie Cash
On Wed, Aug 25, 2010 at 12:29 PM, Dr. Martin Mundschenk
 wrote:
> I'm running a OSOL box for quite a while and I think ZFS is an amazing 
> filesystem. As a computer I use a Apple MacMini with USB and FireWire devices 
> attached. Unfortunately the USB and sometimes the FW devices just die, 
> causing the whole system to stall, forcing me to do a hard reboot.
>
> I had the worst experience with an USB-SATA bridge running an Oxford chipset, 
> in a way that the four external devices stalled randomly within a day or so. 
> I switched to a four slot raid box, also with USB bridge, but with better 
> reliability.
>
> Well, I wonder what are the components to build a stable system without 
> having an enterprise solution: eSATA, USB, FireWire, FibreChannel?

If possible to get a card to fit into a MacMini, eSATA would be a lot
better than USB or FireWire.

If there's any way to run cables from inside the case, you can "make
do" with plain SATA and longer cables.

Otherwise, you'll need to look into something other than a MacMini for
your storage box.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] shrink zpool

2010-08-25 Thread Freddie Cash
On Wed, Aug 25, 2010 at 11:34 AM, Mike DeMarco  wrote:
> Is it currently or near future possible to shrink a zpool "remove a disk"

Short answer:  no.

Long answer:  search the archives for "block pointer rewrite" for all
the gory details.  :)


-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] New Supermicro SAS/SATA controller: AOC-USAS2-L8e in SOHO NAS and HD HTPC

2010-08-16 Thread Freddie Cash
On Mon, Aug 16, 2010 at 7:13 AM, Mike DeMarco  wrote:
> What I would really like to know is why do pci-e raid controller cards cost 
> more than an entire motherboard with processor. Some cards can cost over 
> $1,000 dollars, for what.

Because they include a motherboard and processor.  :)  The high-end
RAID controllers include their own CPUs and RAM for doing all the RAID
stuff in hardware.

The low-end RAID controllers (if you can even really call them RAID
controllers) do all the RAID stuff in software via a driver installed
in the OS, running on the host computer's CPU.

And the ones in the middle have "simple" XOR engines for doing the
RAID.stuff in hardware.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Opensolaris is apparently dead

2010-08-14 Thread Freddie Cash
On Sat, Aug 14, 2010 at 5:58 AM, Russ Price  wrote:
> 4. FreeBSD. I could live with it if I had to, but I'm not fond of its
> packaging system; the last time I tried it I couldn't get the package tools
> to pull a quick binary update. Even IPS works better. I could go to the
> ports tree instead, but if I wanted to spend my time recompiling everything,
> I'd run Gentoo instead.

freebsd-update provides binary updates for the OS.
portmaster can do binary-only updates for ports (and can even run
without /usr/ports installed).  Same with portupgrade.  And if you
really don't want to use the ports tree, there's pkg_upgrade (part of
the bsdadminscripts port).

IOW, if you don't want to compile things on FreeBSD, you don't have to.  :)

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Reconfigure zpool

2010-08-06 Thread Freddie Cash
On Fri, Aug 6, 2010 at 12:18 AM, Alxen4  wrote:
> I have zpool like that
>
>  pool: tank
>  state: ONLINE
>  scrub: none requested
> config:
>
>        NAME        STATE     READ WRITE CKSUM
>        tank        ONLINE       0     0     0
>          raidz3-0  ONLINE       0     0     0
> ___c6t0d0  ONLINE       0     0     0
> ___c6t1d0  ONLINE       0     0     0
> ___c6t2d0  ONLINE       0     0     0
> ___c6t3d0  ONLINE       0     0     0
> ___c6t4d0  ONLINE       0     0     0
> ___c6t5d0  ONLINE       0     0     0
> ___c6t6d0  ONLINE       0     0     0
> ___c6t7d0  ONLINE       0     0     0
> ___vc7t0d0  ONLINE       0     0     0
> ___c7t1d0  ONLINE       0     0     0
> ___c7t2d0  ONLINE       0     0     0
> ___c7t3d0  ONLINE       0     0     0
> c7t4d0    ONLINE       0     0     0
> c7t5d0    ONLINE       0     0     0
> c7t6d0    ONLINE       0     0     0
> c7t7d0    ONLINE       0     0     0
>
>
> In my understanding last 4 drives(c7t4d0    ,c7t5d0    ,c7t6d0    ,c7t7d0    )
>
> are part of tank zpool but not of raidz3 array.
>
> Ho do I move them to be part of raidz3 ?

Backup the data in the pool, destroy the pool, create a new pool
(consider using multiple raidz vdevs instead of one giant raidz vdev),
copy the data back.

There's no other way.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Optimal Disk configuration

2010-07-22 Thread Freddie Cash
On Wed, Jul 21, 2010 at 7:43 PM, Edward Ned Harvey  wrote:
>> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
>> boun...@opensolaris.org] On Behalf Of John Andrunas
>>
>> I know this is potentially a loaded question, but what is generally
>> considered the optimal disk configuration for ZFS.  I have 48 disks on
>> 2 RAID controllers (2x24).  The RAID controller can do RAID
>> 0/1/5/6/10/50/60 or JBOD.  What is generally considered the optimal
>> configuration for the disks.  JBOD, RAID zvols on both controllers.
>
> No matter what your goals are, you should JBOD the disks, and let ZFS manage
> the raid.  It not only performs better than hardware raid, it's also more
> reliable.  When you scrub, ZFS will be able to access all the bytes on all
> the disks.  But if you had something like a hardware mirror, the hardware
> would only present a single device to the OS, and therefore the OS wouldn't
> be able to scrub all the bits on both sides of the mirror.  If there's a
> data error encountered in hardware mirror, ZFS has no 2nd device to read
> from to correct the error.  All it can do is blindly retry the hardware, and
> hope for a different answer.  Which probably isn't going to happen.
>
> Generally speaking, your choices are:
> Stripe, Mirror, Stripe & Mirror, or Raid(z,z2,z3)

You forgot "stripe & raidz", aka a pool with multiple raidz vdevs.

> Generally speaking, raid has the capacity of n-1, n-2, or n-3 disks
> depending on which config you choose.  It performs just as fast (n-1, n-2,
> or n-3) for large sequential operations.  But the performance for small
> random operations is poor.  Maybe 1 disk or 2 disks performance.  Raid is
> usually what you use when you need high capacity, and you need redundancy,
> and you're budget constrained.  You are forced to sacrifice some speed.  You
> could have gotten better performance by striping mirrors, but then you'd
> have to spend more money on disks to have the same usable space.

And, adding multiple raidz vdevs (each with under 10 disks) to a
single pool (aka stripe of raidz) will give better performance than a
single large raidz vdev.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Confused about consumer drives and zfs can someone help?

2010-07-21 Thread Freddie Cash
On Wed, Jul 21, 2010 at 1:38 PM, JavaWebDev  wrote:
> 1.  WD Caviar Black
> 
> Can they be used with in raidz or mirrors?

We use the 500 GB versions attached to 3Ware controllers (configured
as Single Disk arrays).  They work quite nicely.

> With the new models the firmware is locked and you can't change the TLER
> settings. When a drive detects an error it's going to hang a long time
> trying to correct the problem before it reports the error to ZFS right? What
> type of errors are we talking about, bad blocks or something more serious?

Haven't noticed any issues.   The RAID controller notes the odd (maybe
1 per fortnight) timeout, but then reconnects the drive without ZFS
noticing.  We have yet (2 years) to have a drive timeout and cause ZFS
to offline the drive.

> 2. WD Caviar Greens
> *
> I was hoping to use low power drives like the WD Caviar Greens. In addition
> to the TLER issue they have an issue with too many load/unload cycles that
> would wear out the drive faster when used in raid configs including zfs. WD
> put out a utility that could increase the time to reduce the load/unload
> cycles (wdidle.exe). Does this still work with the caviar green drives? Even
> if it does work, changing the idle behavior is going to make it use more
> energy right?
>
> Are Caviar Green drives (the new ones anyway) completely unsuitable for a
> ZFS based back-up or NAS server?

We use 8x 1.5 TB WD Green drives, since they were really inexpensive
at the time, and looked good on paper.  We've been regretting it ever
since.  I actively pray for these to die so that I can replace them
with Seagate 1.5 TB drives.

For a home setup where you don't care about disk throughput, these may
be workable.  Otherwise, avoid the entire Green/GP series of WD disks.
 They are, to put it nicely, crap.  Don't ask for an elaboration
unless you want a multiple page diatribe on the horrors of using these
drives.  Just don't use them outside of the home.

We also use WD Caviar Blue 500 GB, and RE2/RE3 500 GB drives attached
to 3Ware controllers without issues.

Finally, we also use Seagate Barracuda 7200.11 500 GB and 1.5 TB
drives with great success.  As wel as Seagate ES.2 500 GB drives.

Basically, anything other than the WD Green/GP drives works well.

> 5. Mirror vs raidz
> **
> Can any of the issues with consumer drives be reduced using one type of vdev
> over the other?  Will adding a seperate log or cache device help?

An L2ARC will help with reads, especially if using raidz vdevs.

> For a backup server, which would you choose, 4 drives in raidz, 4 drives in
> raidz2, 3 drives in raidz with hs, 4 drives with 2 mirrored pairs?

Only 4 drives?  Do they still make servers that only use 4 drives?  ;)

It all depends on which you prefer, raw throughput or redundancy.

For best performance, use 2x 2-drive mirrors.
For best redundancy, use 1x 4-drive raidz2.
For middle-of-the-road performance/redundancy, use 1x 4-drive raidz1.

Note:  newegg.ca has a sale on right now.  WD Caviar Black 1 TB drives
are only $85 CDN.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on Ubuntu

2010-07-20 Thread Freddie Cash
On Mon, Jul 19, 2010 at 9:40 PM, devsk  wrote:
>> On Sat, Jun 26, 2010 at 12:20 AM, Ben Miles
>>  wrote:
>> > What supporting applications are there on Ubuntu
>> for RAIDZ?
>>
>> None.  Ubuntu doesn't officially support ZFS.
>>
>> You can kind of make it work using the ZFS-FUSE
>> project.  But it's not
>> stable, nor recommended.
>
> I have been using zfs-fuse on my home server for a long time now and its rock 
> solid. Its as stable and usable as opensolaris build 143 I am using on 
> opensolaris. Yeah, write performance sucks but I do not care about seq write 
> performance that much. There may be some inertial Linuxy/Fusy quirks around 
> it as well but most have been ironed out in 0.6.9.
>
> I have successfully exported and imported pools from/to Opensolaris/Linux 
> managed pools. No issues at all.
>
> Just curious: have you tried 0.6.9 release of zfs-fuse? You should join the 
> google group of zfs-fuse and someone can help u along.

(You need to fix your quoting as I'm not listed, and I'm the one who
made the remarks about zfs-fuse instability.)

zfs-fuse 0.6.0 compiled from source, running on 64-bit Debian 5.0,
using Linux kernel 2.6.26, using ZFS v22.

We were testing dedupe to see how it would affect our data storage
once it hits FreeBSD.  We could not keep the test server up and
running for more than 3-4 days at a time.

8 GB of RAM, 12 500 GB SATA harddrives, 2x dual-core AMD CPUs.  All
the same hardware as our FreeBSD storage servers.

Running a single rsync stream from FreeBSD to Linux would wedge the
box.  Pulling a drive to see how the failure modes work would wedge
the box.  Booting without a drive would wedge the box.  Basically,
anything except slow writes would cause errors in ZFS and wedge the
box.  Definitely not a hardware problem as this box was used
previously as a VM host, and everything runs fine when zfs-fuse is
disabled.

We gave up on it after a couple of weeks.  Sure, the dedupe numbers
looked great (we can't wait for FreeBSD to get ZFSv20+).  But the
zfs-fuse system was just too unstable to be usable for even simple
testing.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] how to create a concat vdev.

2010-07-19 Thread Freddie Cash
On Mon, Jul 19, 2010 at 9:06 AM, Max Levine  wrote:
> Is it possible in ZFS to do the following.
>
> I have an 800GB lun a single device in a pool and I want to migrate
> that to 8 100GB luns. Is it possible to create an 800GB concat out of
> the 8 devices, and mirror that to the original device, then detach the
> original device? It is possible to do this online in VxVM.

I don't think you can do that within a single pool.  But you can fake
it like so:

zpool create newpool lun1 lun2 lun3 lun4 lun5 lun6 lun7 lun8
zfs send oldpool | zfs recv newpool
zpool destroy oldpool
zpool rename newpool oldpool

The commands are not exact, read the man pages to get the exact syntax
for the send/recv part.

However, doing so will make the pool extremely fragile.  Any issues
with any of the 8 LUNs, and the whole pool dies as there is no
redundancy.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send to remote any ideas for a faster way than ssh?

2010-07-19 Thread Freddie Cash
On Mon, Jul 19, 2010 at 9:06 AM, Richard Jahnel  wrote:
> I've tried ssh blowfish and scp arcfour. both are CPU limited long before the 
> 10g link is.
>
> I'vw also tried mbuffer, but I get broken pipe errors part way through the 
> transfer.
>
> I'm open to ideas for faster ways to to either zfs send directly or through a 
> compressed file of the zfs send output.

If this is across a trusted link, have a look at the HPN patches to
ZFS.  There are three main benefits to these patches:
  - increased (and dynamic) buffers internal to SSH
  - adds a multi-threaded aes cipher
  - adds the NONE cipher for non-encrypted data transfers
(authentication is still encrypted)

If one end of the SSH connection is HPN-enabled, you can increase your
bulk data transfer around 10-15%, just by adjusting the buffer size in
ssh_config or sshd_config (depending on which side has HPN).

If both ends of the SSH connection are HPN-enabled, you can increase
your bulk data transfer rate around 30% just by adjusting the buffer
in the sshd_config.

Enabling the -mtr versions of the cipher will use multiple CPU cores
for encrypting/decrypting, improving throughput.

If you trust the link completely (private link), you can enable the
NONE cipher on the ssh commandline and via sshd_config, and the data
transfer will happen unencrypted, thus maxing out the bandwidth.

We can saturate a gigabit fibre link between two ZFS storage servers
using rsync.  You should be able to saturate a 10G link using zfs
send/recv, so long as both the systems can read/write that fast.

http://www.psc.edu/networking/projects/hpn-ssh/

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recommended RAM for ZFS on various platforms

2010-07-16 Thread Freddie Cash
On Fri, Jul 16, 2010 at 10:24 AM, Michael Johnson
 wrote:
> I'm currently planning on running FreeBSD with ZFS, but I wanted to 
> double-check
> how much memory I'd need for it to be stable.  The ZFS wiki currently says you
> can go as low as 1 GB, but recommends 2 GB; however, elsewhere I've seen 
> someone
> claim that you need at least 4 GB.  Does anyone here know how much RAM FreeBSD
> would need in this case?

There's no such thing as "too much RAM" when it comes to ZFS.  The
more RAM you add to the system, the better it will perform.  ZFS will
use all the RAM you give it for the ARC, enabling it to cache more and
more data.

On the flip side, if you spend enough time tuning ZFS and FreeBSD, you
can use ZFS on a system with 512 MB of RAM (there are reports on the
FreeBSD mailing lists of various people doing thing on single-drive
laptops).

However, the "rule of thumb" for ZFS is 2 GB of RAM as a bare minimum,
using the 64-bit version of FreeBSD.  The "sweet spot" is 4 GB of RAM.

But, more is always better.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Encryption?

2010-07-11 Thread Freddie Cash
On Sun, Jul 11, 2010 at 4:21 AM, Roy Sigurd Karlsbakk  
wrote:
>> I'm planning on running FreeBSD in VirtualBox (with a Linux host) and giving
>> it raw disk access to four drives, which I plan to configure as a raidz2
>> volume.
>
> Wouldn't it be better or just as good to use fuse-zfs for such a
> configuration? I/O from VirtualBox isn't really very good, but then, I
> haven't tested the linux/fbsd configuration...

ZFS-FUSE is horribly unstable, although that's more an indication of
the stability of the storage stack on Linux.  We've been testing it at
work to see how dedupe support will affect our FreeBSD+ZFS storage
servers.  We can't keep it (Linux+ZFS) running for more than a few
days.  Drives drop off at random, the pool locks up, resilvers never
complete.

When it does work, it works nicely.  It's just hard to keep it running.

You definitely want to do the ZFS bits from within FreeBSD.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Should i enable Write-Cache ?

2010-07-08 Thread Freddie Cash
On Thu, Jul 8, 2010 at 6:10 AM, Philippe Schwarz  wrote:
> With dual-Xeon, 4GB of Ram (will be 8GB in a couple of weeks), two PCI-X
> 3Ware cards 7 Sata disks (750G & 1T) over FreeBSD 8.0 (But i think it's
> OS independant), i made some tests.
>
> The disks are exported as JBOD, but i tried enabling/disabling write-cache .

Don't use JBOD, as that disabled a lot of the advanced features of the
3Ware controllers.

Instead, create "Single Disk" arrays for each disk.  That way, you get
all the management features of the card, all the advanced features of
the card (StorSave policies, command queuing, separate read/write
cache policies, SMART monitoring, access to the onboard cache, etc).
With only the RAID hardware disabled on the controllers.

You should get better performance using Single Disk over JBOD (which
basically turns your expensive RAID controller into a "dumb" SATA
controller).

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Remove non-redundant disk

2010-07-07 Thread Freddie Cash
On Wed, Jul 7, 2010 at 3:13 PM, Peter Jeremy
 wrote:
> On 2010-Jul-08 02:39:05 +0800, Garrett D'Amore  wrote:
>>I believe that long term folks are working on solving this problem.  I
>>believe bp_rewrite is needed for this work.
>
> Accepted.
>
>>Mid/short term, the solution to me at least seems to be to migrate your
>>data to a new zpool on the newly configured array, etc.
>
> IMHO, this isn't an acceptable solution.
>
> Note that (eg) DEC/Compaq/HP AdvFS has supported vdev removal from day
> 1 and (until a couple of years ago), I had an AdvFS pool that had,
> over a decade, grown from a mirrored pair of 4.3GB disks to six pairs
> of mirrored 36GB disks - without needing any downtime for disk
> expansion.  [Adding disks was done with mirror pairs because AdvFS
> didn't support any RAID5/6 style redundancy, the big win was being
> able to remove older vdevs so those disk slots could be reused].

None of that requires removing top-level vdevs, and is entirely
possible with ZFS today:

Create the initial pool:
zpool create poolname mirror disk01 disk02

Add another mirror:
zpool add poolname mirror disk03 disk04

Replace one of the drives with a larger on (this may not be perfectly
correct, going from memory):
zpool attach poolname disk05 disk01
zpool detach poolname disk01

Carry on with the add and replace methods as needed until you have
your 6-mirror pool.

No vdev removals required.
-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on Caviar Blue (Hard Drive Recommendations)

2010-06-30 Thread Freddie Cash
On Tue, Jun 29, 2010 at 11:25 AM, Patrick Donnelly  wrote:
> I googled around but couldn't find anything on whether someone has
> good or bad experiences with the Caviar *Blue* drives? I saw in the
> archives Caviar Blacks are *not* recommended for ZFS arrays (excluding
> apparently RE3 and RE4?). Specifically I'm looking to buy Western
> Digital Caviar Blue WD10EALS 1TB drives [1]. Does anyone have any
> experience with these drives?

We use a mix of WD Caviar Blue 500 GB, Caviar Black 500 GB, and RE2
500 GB drives in one of our storage servers without any issues.
Attached to 3Ware 9550SXU and 9650SE RAID controllers, configured as
Single Drive arrays.

There's also 8 WD Caviar Green 1.5 TB drives in there, which are not
very good (even after twiddling the idle timeout setting via wdidle3).
 Definitely avoid the Green/GP line of drives.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on Ubuntu

2010-06-26 Thread Freddie Cash
On Sat, Jun 26, 2010 at 12:20 AM, Ben Miles  wrote:
> What supporting applications are there on Ubuntu for RAIDZ?

None.  Ubuntu doesn't officially support ZFS.

You can kind of make it work using the ZFS-FUSE project.  But it's not
stable, nor recommended.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on Ubuntu

2010-06-25 Thread Freddie Cash
On Fri, Jun 25, 2010 at 6:31 PM, Ben Miles  wrote:
> How much of a difference is there in supporting applications in between 
> Ubuntu and OpenSolaris?
> I was not considering Ubuntu until OpenSOlaris would not load onto my 
> machine...
>
> Any info would be great. I have not been able to find any sort of comparison 
> of ZFS on Ubuntu and OS.
>
> Thanks.
>
> (My current OS install troubleshoot thread - 
> http://opensolaris.org/jive/thread.jspa?messageID=488193񷌁)

If you want ZFS, then go with FreeBSD instead of Ubuntu.  FreeBSD 8.1
includes ZFSv14 with patches available for ZFSv15 and ZFSv16.  You'll
get a more stable, better performant system than trying to shoehorn
ZFS-FUSE into Ubuntu (we've tried with Debian, and ZFS-FUSE is good
for short-term testing, but not production use).


-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Erratic behavior on 24T zpool

2010-06-18 Thread Freddie Cash
On Fri, Jun 18, 2010 at 1:52 AM, Curtis E. Combs Jr.  wrote:
> I am new to zfs, so I am still learning. I'm using zpool iostat to
> measure performance. Would you say that smaller raidz2 sets would give
> me more reliable and better performance? I'm willing to give it a
> shot...

A ZFS pool is made up of vdevs.  ZFS stripes the vdevs together to
improve performance, similar in concept to how RAID0 works.  The more
vdevs in the pool, the better the performance will be.

A vdev is made up one or more disks, depending on the type of vdev and
the redundancy level that you want (cache, log, mirror, raidz1,
raidz2, raidz3, etc).

Due to the algorithms used for raidz, the smaller your individual
raidz vdevs (the fewer disks), the better the performance.  IOW a 6
disk raidz2 vdev will performance better than an 11 disk raidz2 vdev.

So, you want your individual vdevs to be made up of as few physical
disks as possible (for your size and redundancy requirements), and
your pool to be made up of as many vdevs as possible.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Complete Linux Noob

2010-06-16 Thread Freddie Cash
On Wed, Jun 16, 2010 at 3:03 AM, Orvar Korvar <
knatte_fnatte_tja...@yahoo.com> wrote:

> "You can't expand a normal RAID, either, anywhere I've ever seen."
> Is this true?
>
> A "vdev" can be a group of discs configured as raidz1/mirror/etc. An zfs
> raid consists of several vdev. You can add a new vdev whenever you want.
>

Close.

A vdev consists of one or more disks configured as stand-alone (no
redundancy), mirror, or raidz (along with a couple of special duty vdevs:
cache, log, or spare).

A pool consists of multiple vdevs, preferably of the same configuration (all
mirrors, all raidz1, all raidz2, etc).

You can add vdevs to the pool at anytime.

You cannot expand a raidz vdev by adding drives, though (convert a 4-drive
raidz1 to a 5-drive raidz1). Nor can you convert between raidz types
(4-drive raidz1 to a 6-drive raidz2).

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Complete Linux Noob

2010-06-15 Thread Freddie Cash
Some of my terminology may not be 100% accurate, so apologies in advance to
the pedants on this list.  ;)

On Tue, Jun 15, 2010 at 12:13 PM, CarlPalmer  wrote:

> I have been researching different types of raids, and I happened across
> raidz, and I am blown away.  I have been trying to find resources to answer
> some of my questions, but many of them are either over my head in terms of
> details, or foreign to me as I am a linux noob, and I have to admit I have
> never even looked at Solaris.
>
> Are the Parity drives just that, a drive assigned to parity, or is the
> parity shared over several drives?
>

Separate parity drives are RAID3 setups.  raidz1 is similar to RAID5 in that
it uses distributed parity (parity blocks are written out to all the disks
as needed).  raidz2 is similar to RAID6.  raidz3 (triple-parity raid) is
similar to ... RAID7?  Don't think there's actually any formal RAID levels
above RAID6, is there?


> I understand that you can build a raidz2 that will have 2 parity disks.  So
> in theory I could lose 2 disks and still rebuild my array so long as they
> are not both the parity disks correct?
>

There are no "parity disks" in raidz.  With raidz2, you can lose any 2
drives in the vdev, without losing any data.  Lose a third drive, though,
and everything is gone.

With raidz3, you can lose any 3 drives in the vdev without losing any data.
 Lose a fourth drive, though, and everything is gone.


> I understand that you can have Spares assigned to the raid, so that if a
> drive fails, it will immediately grab the spare and rebuild the damaged
> drive.  Is this correct?
>

Depending on the version of ZFS being used, and whether or not you set the
property that controls this feature, yes.  Hot-spares will start rebuilding
a degraded vdev right away.


> Now I can not find anything on how much space is taken up in the raidz1 or
> raidz2.  If all the drives are the same size, does a raidz2 take up the
> space of 2 of the drives for parity, or is the space calculation different?
>

Correct.  raidz1 loses 1 drive worth of space to parity.  raidz2 loses 2
drives worth of space.  raidz3 loses 3 drives worth of space.


> I get that you can not expand a raidz as you would a normal raid, by simply
> slapping on a drive.  Instead it seems that the preferred method is to
> create a new raidz.  Now Lets say that I want to add another raidz1 to my
> system, can I get the OS to present this as one big drive with the space
> from both raid pools?
>

Yes.  That is the whole point of pooled storage.  :)  As you add vdevs to
the pool, the available space increases.  There's no partitioning required,
you just create ZFS filesystems and volumes as needed.


> How do I share these types of raid pools across the network.  Or more
> specifically, how do I access them from Windows based systems?  Is there any
> special trick?
>

The same way you access any harddrive over the network:
  - NFS
  - SMB/CIFS
  - iSCSI
  - etc

It just depends at what level you want to access the storage (files, shares,
block devices, etc).

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool export / import discrepancy

2010-06-15 Thread Freddie Cash
On Tue, Jun 15, 2010 at 9:56 AM, Scott Squires  wrote:

> Is ZFS dependent on the order of the drives?  Will this cause any issue
> down the road?  Thank you all;
>

>From what I can see so far, ZFS doesn't care about the order of the drives
as listed in zpool output.  Just that the drives are accessible.

I originally configured a server using 3x raidz2 vdevs of 8 drives each
spread across 2 12-port controllers, with 4 drives from each vdev on 1
controller, and the other 4 drives from each vdev on the other controller.

I've since moved the drives around so that all 8 drives of the first vdev
are on the first controller, all 8 drives of the third vdev are on the
second controller, with the second vdev being split across both controllers.

Everything is still running smoothly.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Please trim posts

2010-06-11 Thread Freddie Cash
On Fri, Jun 11, 2010 at 6:08 PM, Frank Cusack
wrote:

> This list is the worst one that I am on for that kind of behavior.  Makes
> me wonder how those folks can manage complex storage systems when they
> cannot even organize their thoughts efficiently.


As mentioned earlier, gmail may be behind a lot of this.  It's very good at
hiding quoted text from the person writing replies.  Many, many, many
screens of text can be hidden behind a single "show quoted text" link.
Making it appear as if there's only a tiny bit of text.  It even does it
when reading messages if there are more then 3 or 4 levels of quoting (all
but the top 2 or 3 are hidden).

For reading, it's great.  For writing, it's handy.  For others, it's not
such a great feature.  :)

Like everything, it's all about the tools.  Some people have tools that hide
a lot of the complexity that makes everything super simple and easy for them
... and a royal pain for everyone else (kinda like Windows).  :)

In the end, it all comes down to user education.
-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Native ZFS for Linux

2010-06-11 Thread Freddie Cash
On Fri, Jun 11, 2010 at 12:25 PM, Bob Friesenhahn <
bfrie...@simple.dallas.tx.us> wrote:

> On Fri, 11 Jun 2010, Freddie Cash wrote:
>
>>
>>
For the record, the following paragraph was incorrectly quoted by Bob.  This
paragraph was originally written by Erik Trimble:

> I don't mean to be a PITA, but I'm assuming that someone lawyerly has had
>> the appropriate discussions with the porting team about how linking against
>> the GPL'd Linux kernel means your kernel module has to be GPL-compatible.
>>  It doesn't matter if you distribute it outside the general kernel source
>> tarball, what matters is that you're linking against a GPL program, and the
>> old GPL v2 doesn't allow for a non-GPL-compatibly-licensed module to do
>> that.
>>
>>
This is the start of the stuff that I wrote:

> GPL is a distribution license, not a usage license.  You can manually
>> download all the GPL and non-GPL code you want, so long as you do it
>> separately from each other.  Then you can compile them all into a single
>> binary on your own system, and use it all you want on that system.  The GPL
>> does not affect anything that happens on that system.  If you try to copy
>> those binaries off to use on another system, then the GPL kicks in and
>> everything breaks down.
>>
>> IOW, the GPL has absolutely no bearing on what you compile and run on your
>> system ... so long as you don't distribute the code and/or binaries
>> together.
>>
>
> I am really sad to hear you saying these things since if it was all
> actually true, then Linux, *BSD, and Solaris distributions could not
> legally exist.  Thankfully, only part of the above is true.


His complaint is about the mis-quoted paragraph from Erik, and not about the
stuff I wrote.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Native ZFS for Linux

2010-06-11 Thread Freddie Cash
On Thu, Jun 10, 2010 at 9:32 PM, Erik Trimble wrote:

> On 6/10/2010 9:04 PM, Rodrigo E. De León Plicet wrote:
>
>> On Tue, Jun 8, 2010 at 7:14 PM, Anurag Agarwal
>>  wrote:
>>
>>
>>> We at KQInfotech, initially started on an independent port of ZFS to
>>> linux.
>>> When we posted our progress about port last year, then we came to know
>>> about
>>> the work on LLNL port. Since then we started working on to re-base our
>>> changing on top Brian's changes.
>>>
>>> We are working on porting ZPL on that code. Our current status is that
>>> mount/unmount is working. Most of the directory operations and read/write
>>> is
>>> also working. There is still lot more development work and testing that
>>> needs to be going in this. But we are committed to make this happen so
>>> please stay tuned.
>>>
>>>
>>
>> Good times ahead!
>>
>>
> I don't mean to be a PITA, but I'm assuming that someone lawyerly has had
> the appropriate discussions with the porting team about how linking against
> the GPL'd Linux kernel means your kernel module has to be GPL-compatible.
>  It doesn't matter if you distribute it outside the general kernel source
> tarball, what matters is that you're linking against a GPL program, and the
> old GPL v2 doesn't allow for a non-GPL-compatibly-licensed module to do
> that.
>
> GPL is a distribution license, not a usage license.  You can manually
download all the GPL and non-GPL code you want, so long as you do it
separately from each other.  Then you can compile them all into a single
binary on your own system, and use it all you want on that system.  The GPL
does not affect anything that happens on that system.  If you try to copy
those binaries off to use on another system, then the GPL kicks in and
everything breaks down.

IOW, the GPL has absolutely no bearing on what you compile and run on your
system ... so long as you don't distribute the code and/or binaries
together.

This is how a lot of out-of-tree drivers and filesystems work in Linux.

There are even apps that make managing this easier.  For example, Debian
ships with module-assistant that handles the downloading of source,
compiling, and installing on your system.  All without being affected by the
GPL-ness of the kernel, or the non-GPL-ness of the external source code.


> As a workaround, take a look at what nVidia did for their X driver - it
> uses a GPL'd kernel module as a shim, which their codebase can then call
> from userland. Which is essentially what the ZFS FUSE folks have been
> reduced to doing.
>
> The nvidia shim is only needed to be able to ship the non-GPL binary driver
with the GPL binary kernel.  If you don't use the binaries, you don't use
the shim.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs list sizes - newbie question

2010-06-04 Thread Freddie Cash
On Fri, Jun 4, 2010 at 11:41 AM, Andres Noriega
wrote:

> I understand now. So each vol's available space is reporting it's
> reservation and whatever is still available in the pool.
>
> I appreciate the explanation. Thank you!
>
>
If you want the available space to be a hard limit, have a look at the quota
property.

The reservation tells the pool to reserve that amount of space for the
dataset, meaning that space is no longer available to anything else in the
pool.

The quota tells the pool the max amount of storage the dataset can use, and
is reflected in the "space available" output of various tools (like zfs
list, df, etc).

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Usage on drives

2010-06-04 Thread Freddie Cash
On Fri, Jun 4, 2010 at 6:36 AM, Andreas Iannou <
andreas_wants_the_w...@hotmail.com> wrote:

>  Hello again,
>
> I'm wondering if we can see the amount of usage for a drive in ZFS raidz
> mirror. I'm in the process of replacing some drives but I want to replace
> the less used drives first (maybe only 40-50% utilisation). Is there such a
> thing? I saw somewhere that a guy had 3 drives in a raidz, one drive only
> had to be resilvered 612Gb to replace.
>
> I'm hoping as theres quite a bit of free space that some drives only occupy
> a little and therefore only resilver 200-300Gb of data.
>

When in doubt, read the man page.  :)

zpool iostat -v

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-03 Thread Freddie Cash
On Wed, Jun 2, 2010 at 8:10 PM, Roman Naumenko  wrote:

> Well, I explained it not very clearly. I meant the size of a raidz array
> can't be changed.
> For sure zpool add can do the job with a pool. Not with a raidz
> configuration.
>

You can't increase the number of drives in a raidz vdev, no.  Going from a
4-drive raidz1 to a 5-drive raidz1 is currently impossible.  And going from
a raidz1 to a raidz2 vdev is currently impossible.  On the flip side, it's
rare to find a hardware RAID controller that allows this.

But you can increase the storage space available in a  raidz vdev, by
replacing each drive in the raidz vdev with a larger drive.  We just did
this, going from 8x 500 GB drives in a raidz2 vdev, to 8x 1.5 TB drives in a
raidz2 vdev.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-02 Thread Freddie Cash
On Wed, Jun 2, 2010 at 3:54 PM, Roman Naumenko  wrote:

> Recently I talked to a co-worker who manages NetApp storages. We discussed
> size changes for pools in zfs and aggregates in NetApp.
>
> And some time before I had suggested to a my buddy zfs for his new home
> storage server, but he turned it down since there is no expansion available
> for a pool.
>

There are two ways to increase the storage space available to a ZFS pool:
  1.  add more vdevs to the pool
  2.  replace each drive in a vdev with a larger drive

The first option "expands the width" of the pool, adds redundancy to the
pool, and (should) increase the performance of the pool.  This is very
simple to do, but requires having the drive bays and/or drive connectors
available.  (In fact, any time you add a vdev to a pool, including when you
first create it, you go through this process.)

The second option "increases the total storage" of the pool, without
changing any of the redundancy of the pool.  Performance may or may not
increase.  Once all the drives in a vdev are replaced, the storage space
becomes available to the pool (depending on the ZFS version, you may need to
export/import the pool for the space to become available).

We've used both of the above quite successfully, both at home and at work.

Not sure what your buddy was talking about.  :)

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


  1   2   >