Re: [zfs-discuss] Zpool Import Hanging

2011-01-18 Thread Richard Elling
On Jan 17, 2011, at 8:22 PM, Repetski, Stephen wrote:
 
 On Mon, Jan 17, 2011 at 22:08, Ian Collins i...@ianshome.com wrote:
  On 01/18/11 04:00 PM, Repetski, Stephen wrote:
 
 Hi All,
 
 I believe this has been asked before, but I wasn’t able to find too much 
 information about the subject. Long story short, I was moving data around on 
 a storage zpool of mine and a zfs destroy filesystem hung (or so I 
 thought). This pool had dedup turned on at times while imported as well; it’s 
 running on a Nexenta Core 3.0.1 box (snv_134f).
 
 
 The first time the machine was rebooted, it hung at the “Loading ZFS 
 filesystems” line  after loading the kernel; I booted the box with all drives 
 unplugged and exported the pool. The machine was rebooted, and now the pool 
 is hanging on import (zpool import –Fn Nalgene). I’m using 
 “0t2761::pid2proc|::walk thread|::findstack | mdb –k” to try and view what 
 the import processes is doing, but I’m not a hard-core ZFS/Solaris dev so I 
 don’t know if I’m reading the output correctly, but it appears that ZFS is 
 continuing to delete a snapshot/FS from before (reading from the top down):
 
 What does zpool iostat pool 10 show?
 
 If you have a lot a deduped data and not a lot of RAM (or a cache device), it 
 can take a very long time to destroy a filesystem.  You will see lot of reads 
 and not many writes if this is happening.
 
 -- 
 Ian.
 
 
 Zpool iostat itself hangs, but iostat does show me one drive in particular 
 causing some issues - http://pastebin.com/6rJG3qV9 - %w and %b drop to ~50 
 and ~90, respectively, when mdk shows ZFS doing some deduplication work 
 (http://pastebin.com/EMPYy5Rr). As you said the pool is mostly reading data 
 and not writing much. I should be able to switch up that drive to another 
 controller (currently on a PCI SATA adapter) and see what iostat reports 
 then. 

%w should be near 0 for most cases.  Until you solve that problem, everything 
will be slow.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] incorrect vdev added to pool

2011-01-18 Thread Gal Buki
Hi

I have a pool with a raidz2 vdev.
Today I accidentally added a single drive to the pool.

I now have a pool that partially has no redundancy as this vdev is a single 
drive.

Is there a way to remove the vdev and replace it with a new raidz2 vdev?
If not what can I do to do damage control and add some redundancy to the single 
drive vdev?

Thanks.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] configuration

2011-01-18 Thread Gal Buki
With two drives it makes more sense to use a mirror then raidz configuration.
You will have the same amount of space and mirroring gives you more 
performance, afaik.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Status of zpool remove in raidz and non-redundant stripes

2011-01-18 Thread Gal Buki
I second that.
This is exactly what happened to me.
There is a bug (ID 4852783) that is in State 6-Fix Understood but it is 
unchanged since February 2010.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] configuration

2011-01-18 Thread Piotr Tarnowski
You can also make 250 GB slices (partitions) and create RAIDZ 3x250GB and 
mirror 2x1750GB (one or more).

Mirror has better performance for write operations, Raidz shoud be faster for 
read.

Regards
-- 
Piotr Tarnowski /DrFugazi/
http://www.drfugazi.eu.org/
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Surprise Thread Preemptions

2011-01-18 Thread Kishore Kumar Pusukuri
Hi,
I would like to know about which threads will be preempted by which on my 
OpenSolaris machine. 
Therefore, I ran a multithreaded program myprogram with 32 threads on my 
24-core Solaris machine. I make sure that each thread of my program has same 
priority (priority zero), so that we can reduce priority inversions (saving 
preemptions -- system overhead). However, I ran the following script 
whoprempt.d to see who preempted myprogram threads and got the following output 
Unlike what I thought, myprogram threads are preempted (for 2796 times -- last 
line of the output) by the threads of same myprogram. 

Could anyone explain why this happens, please? 

DTrace script
==

#pragma D option quiet
 
 sched:::preempt
 {
 self-preempt = 1;
 }

 sched:::remain-cpu
  /self-preempt/
 {
  self-preempt = 0;
 }
  
 sched:::off-cpu
 /self-preempt/
 {
/*
  * If we were told to preempt ourselves, see who we ended up 
giving 
  * the CPU to.
   */
 @[stringof(args[1]-pr_fname), args[0]-pr_pri, 
execname,
  curlwpsinfo-pr_pri] = count();
  self-preempt = 0;
 }
  
 END
  {
  printf(%30s %3s %30s %3s %5s\n, PREEMPTOR, 
PRI,||,PREEMPTED, PRI, #);
  printa(%30s %3d %30s %3d %5@d\n, @);
  }


Output:
===
PREEMPTOR  PRI   ||   PREEMPTED PRI #
   dtrace   0||   myprogram   0 1
   dtrace   50   ||  myprogram   0 1
sched  -1  ||myprogram   0 1
 myprogram   0||dtrace   0 1
   
   .   
   nscd  59   || myprogram   0 4
   sendmail  59||myprogram   0 4
  sched  60 ||   myprogram   092
  sched  98 ||   myprogram   0   272
  sched  99 ||   myprogram   0  2110
  myprogram   0   || myprogram   0  2796
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Is my bottleneck RAM?

2011-01-18 Thread Michael Armstrong
Hi guys, sorry in advance if this is somewhat a lowly question, I've recently 
built a zfs test box based on nexentastor with 4x samsung 2tb drives connected 
via SATA-II in a raidz1 configuration with dedup enabled compression off and 
pool version 23. From running bonnie++ I get the following results:

Version 1.03b   --Sequential Output-- --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
nexentastor  4G 60582  54 20502   4 12385   3 53901  57 105290  10 429.8   1
--Sequential Create-- Random Create
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
 16  7181  29 + +++ + +++ 21477  97 + +++ + +++
nexentastor,4G,60582,54,20502,4,12385,3,53901,57,105290,10,429.8,1,16,7181,29,+,+++,+,+++,21477,97,+,+++,+,+++


I'd expect more than 105290K/s on a sequential read as a peak for a single 
drive, let alone a striped set. The system has a relatively decent CPU, however 
only 2GB memory, do you think increasing this to 4GB would noticeably affect 
performance of my zpool? The memory is only DDR1.

Thanks in advance.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] kernel panic on USB disk power loss

2011-01-18 Thread Reginald Beardsley
I was copying a filesystem using zfs send | zfs receive and inadvertently 
unplugged the power to the USB disk that was the destination.   Much to my 
horror this caused the system to panic.  I recovered fine on rebooting, but it 
*really* unnerved me.

I don't find anything about this online.  I would expect it would trash the 
copy operation, but the panic seemed a bit extreme.

It's an Ultra 20 running Solaris 10 Generic_137112-02

I've got a copy of U8 I'm planning to install as the U9 license seems to 
prohibit my using it.

Suggestions?  I'd like to understand what happened and why the system went down.

Thanks,
Reg
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] configuration

2011-01-18 Thread Trusty Twelve
Hello, I'm going to build home server. System is deployed on 8 GB USB flash 
drive. I have two identical 2 TB HDD and 250 GB one. Could you please recommend 
me ZFS configuration for the set of my hard drives?
1)
pool1: mirror 2tb x 2
pool2: 250 gb (or maybe add this drive to pool1???)
2)
pool1: mirror 2tb x 2 + cache/log 250 gb
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HP ProLiant N36L

2011-01-18 Thread Trusty Twelve
I've successfully installed NexentaStor 3.0.4 on this microserver using PXE. 
Works like a charm.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] incorrect vdev added to pool

2011-01-18 Thread Andrew Gabriel

 On 01/15/11 11:32 PM, Gal Buki wrote:

Hi

I have a pool with a raidz2 vdev.
Today I accidentally added a single drive to the pool.

I now have a pool that partially has no redundancy as this vdev is a single 
drive.

Is there a way to remove the vdev


Not at the moment, as far as I know.


and replace it with a new raidz2 vdev?
If not what can I do to do damage control and add some redundancy to the single 
drive vdev?


I think you should be able to attach another disk to it to make them 
into a mirror. (Make sure you attach, and not add.)


--
Andrew
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Request for comments: L2ARC, ZIL, RAM, and slow storage

2011-01-18 Thread Karl Wagner
Hi all

This is just an off-the-cuff idea at the moment, but I would like to sound
it out.

Consider the situation where someone has a large amount of off-site data
storage (of the order of 100s of TB or more). They have a slow network link
to this storage.

My idea is that this could be used to build the main vdevs for a ZFS pool.
On top of this, an array of disks (of the order of TBs to 10s of TB) is
available locally, which can be used as L2ARC. There are also smaller,
faster arrays (of the order of 100s of GB) which, in my mind, could be used
as a ZIL.

Now, in this theoretical situation, in-play read data is kept on the L2ARC,
and can be accessed about as fast as if this array was just used as the main
pool vdevs. Written data goes to the ZIL, as is then sent down the slow link
to the offsite storage. Rarely used data is still available as if on site
(shows up in the same file structure), but is effectively archived to the
offsite storage.

Now, here comes the problem. According to what I have read, the maximum size
for the ZIL is approx 50% of the physical memory in the system, which would
be too small for this particular situation. Also, you cannot mirror the
L2ARC, which would have dire performance consequences in the case of a disk
failure in the L2ARC. I also believe (correct me if I am wrong) that the
L2ARC is invalidated on reboot, so would have to warm up again). And
finally, if the network link was to die, I am assuming the entire ZPool
would become unavailable.

This is a setup which I can see many use cases for, but it introduces too
many failure modes.

What I would like to see is an extension to ZFS's hierarchical storage
environment, such that an additional layer can be put behind the main pool
vdevs as an archive store (i.e. it goes
[ARC]-[L2ARC/ZIL]-[main]-[archive]). Infrequently used files/blocks could
be pushed into this storage, but appear to be available as normal. It would,
for example, allow old snapshot data to be pushed down, as this is very
rarely going to be used, or files which must be archived for legal reasons.
It would also utilise the bandwidth available more efficiently, as only data
being specifically sent to it would need transferring.

In the case where the archive storage becomes unavailable, there would be a
number of possible actions (e.g. error on access, block on access, make the
files disappear temporarily).

I know there are already solutions out there which do similar jobs. The
company I work for use one which pushes archive data to a tape stacker,
and pulls it back when accessed. But I think this is a ripe candidate for
becoming part of the ZFS stack.

So, what does everyone think?

Rgds
Karl

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-18 Thread Orvar Korvar
...If this is a general rule, maybe it will be worth considering using
SHA512 truncated to 256 bits to get more speed...

Doesn't it need more investigation if truncating 512bit to 256bit gives 
equivalent security as a plain 256bit hash? Maybe truncation will introduce 
some bias?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is my bottleneck RAM?

2011-01-18 Thread Richard Elling
On Jan 15, 2011, at 4:21 PM, Michael Armstrong wrote:

 Hi guys, sorry in advance if this is somewhat a lowly question, I've recently 
 built a zfs test box based on nexentastor with 4x samsung 2tb drives 
 connected via SATA-II in a raidz1 configuration with dedup enabled 
 compression off and pool version 23. From running bonnie++ I get the 
 following results:
 
 Version 1.03b   --Sequential Output-- --Sequential Input- 
 --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
 MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec 
 %CP
 nexentastor  4G 60582  54 20502   4 12385   3 53901  57 105290  10 429.8  
  1
--Sequential Create-- Random Create
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
 16  7181  29 + +++ + +++ 21477  97 + +++ + +++
 nexentastor,4G,60582,54,20502,4,12385,3,53901,57,105290,10,429.8,1,16,7181,29,+,+++,+,+++,21477,97,+,+++,+,+++
 
 
 I'd expect more than 105290K/s on a sequential read as a peak for a single 
 drive, let alone a striped set. The system has a relatively decent CPU, 
 however only 2GB memory, do you think increasing this to 4GB would noticeably 
 affect performance of my zpool? The memory is only DDR1.

2GB or 4GB of RAM + dedup is a recipe for pain. Do yourself a favor, turn off 
dedup
and enable compression.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-18 Thread Orvar Korvar
Totally Off Topic:
Very interesting. Did you produce some papers on this? Where do you work? Seems 
very fun place to work at! 


BTW, I thought about this. What do you say?

Assume I want to compress data and I succeed in doing so. And then I transfer 
the compressed data. So all the information I transferred is the compressed 
data. But, then you don't count all the information: knowledge about which 
algorithm was used, which number system, laws of math, etc. So there are lots 
of other information that is implicit, when compress/decompress - not just the 
data.

So, if you add data and all implicit information you get a certain bit size X. 
Do this again on the same set of data, with another algorithm and you get 
another bit size Y. 

You compress the data, using lots of implicit information. If you use less 
implicit information (simple algorithm relying on simple math), will X be 
smaller than if you use lots of implicit information (advanced algorithm 
relying on a large body of advanced math)? What can you say about the numbers X 
and Y? Advanced math requires many math books that you need to transfer as well.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Surprise Thread Preemptions

2011-01-18 Thread Phil Harman

Big subject!

You haven't said what your 32 threads are doing, or how you gave them 
the same priority, or what scheduler class they are running in.


However, you only have 24 VCPUs, and (I assume) 32 active threads, so 
Solaris will try to share resources evenly, and yes, it will preempt one 
of your threads to run another.


The preemption behaviour, including the time a thread is allowed to run 
without interruption, will depend on the scheduling class and parameters 
of each thread.


If you want to reduce preemption, you can move threads to the FX class, 
set an absolute priority, and tune the time quantum.


What you are seeing is expected.

Hope this helps,
Phil

p.s. if you need any more help with this, please feel free to contact me 
offline.




On 18/01/2011 06:13, Kishore Kumar Pusukuri wrote:

Hi,
I would like to know about which threads will be preempted by which on my 
OpenSolaris machine.
Therefore, I ran a multithreaded program myprogram with 32 threads on my 
24-core Solaris machine. I make sure that each thread of my program has same priority 
(priority zero), so that we can reduce priority inversions (saving preemptions -- system 
overhead). However, I ran the following script whoprempt.d to see who preempted myprogram 
threads and got the following output Unlike what I thought, myprogram threads are 
preempted (for 2796 times -- last line of the output) by the threads of same myprogram.

Could anyone explain why this happens, please?

DTrace script
==

#pragma D option quiet

  sched:::preempt
  {
  self-preempt = 1;
  }

  sched:::remain-cpu
   /self-preempt/
  {
   self-preempt = 0;
  }

  sched:::off-cpu
  /self-preempt/
  {
 /*
   * If we were told to preempt ourselves, see who we ended up 
giving
   * the CPU to.
*/
  @[stringof(args[1]-pr_fname), args[0]-pr_pri, 
execname,
   curlwpsinfo-pr_pri] = count();
   self-preempt = 0;
  }

  END
   {
   printf(%30s %3s %30s %3s %5s\n, PREEMPTOR, PRI,||,PREEMPTED, PRI, 
#);
   printa(%30s %3d %30s %3d %5@d\n, @);
   }


Output:
===
PREEMPTOR  PRI   ||   PREEMPTED PRI #
dtrace   0||   myprogram   0 1
dtrace   50   ||  myprogram   0 1
 sched  -1  ||myprogram   0 1
  myprogram   0||dtrace   0 1

.
nscd  59   || myprogram   0 4
sendmail  59||myprogram   0 4
   sched  60 ||   myprogram   092
   sched  98 ||   myprogram   0   272
   sched  99 ||   myprogram   0  2110
   myprogram   0   || myprogram   0  2796


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is my bottleneck RAM?

2011-01-18 Thread Michael Armstrong
I've since turned off dedup, added another 3 drives and results have improved 
to around 148388K/sec on average, would turning on compression make things more 
CPU bound and improve performance further?

On 18 Jan 2011, at 15:07, Richard Elling wrote:

 On Jan 15, 2011, at 4:21 PM, Michael Armstrong wrote:
 
 Hi guys, sorry in advance if this is somewhat a lowly question, I've 
 recently built a zfs test box based on nexentastor with 4x samsung 2tb 
 drives connected via SATA-II in a raidz1 configuration with dedup enabled 
 compression off and pool version 23. From running bonnie++ I get the 
 following results:
 
 Version 1.03b   --Sequential Output-- --Sequential Input- 
 --Random-
   -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
 MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec 
 %CP
 nexentastor  4G 60582  54 20502   4 12385   3 53901  57 105290  10 429.8 
   1
   --Sequential Create-- Random Create
   -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
 files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
16  7181  29 + +++ + +++ 21477  97 + +++ + +++
 nexentastor,4G,60582,54,20502,4,12385,3,53901,57,105290,10,429.8,1,16,7181,29,+,+++,+,+++,21477,97,+,+++,+,+++
 
 
 I'd expect more than 105290K/s on a sequential read as a peak for a single 
 drive, let alone a striped set. The system has a relatively decent CPU, 
 however only 2GB memory, do you think increasing this to 4GB would 
 noticeably affect performance of my zpool? The memory is only DDR1.
 
 2GB or 4GB of RAM + dedup is a recipe for pain. Do yourself a favor, turn off 
 dedup
 and enable compression.
 -- richard
 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-18 Thread Nicolas Williams
On Tue, Jan 18, 2011 at 07:16:04AM -0800, Orvar Korvar wrote:
 BTW, I thought about this. What do you say?
 
 Assume I want to compress data and I succeed in doing so. And then I
 transfer the compressed data. So all the information I transferred is
 the compressed data. But, then you don't count all the information:
 knowledge about which algorithm was used, which number system, laws of
 math, etc. So there are lots of other information that is implicit,
 when compress/decompress - not just the data.
 
 So, if you add data and all implicit information you get a certain bit
 size X. Do this again on the same set of data, with another algorithm
 and you get another bit size Y. 
 
 You compress the data, using lots of implicit information. If you use
 less implicit information (simple algorithm relying on simple math),
 will X be smaller than if you use lots of implicit information
 (advanced algorithm relying on a large body of advanced math)? What
 can you say about the numbers X and Y? Advanced math requires many
 math books that you need to transfer as well.

Just as the laws of thermodynamics preclude perpetual motion machines,
so do they preclude infinite, loss-less data compression.  Yes,
thermodynamics and information theory are linked, amazingly enough.

Data compression algorithms work by identifying certain types of
patterns, then replacing the input with notes such as pattern 1 is ...
and appears at offsets 12345 and 1234567 (I'm simplifying a lot).  Data
that has few or no observable patterns (observable by the compression
algorithm in question) will not compress, but will expand if you insist
on compressing -- randomly-generated data (e.g., the output of
/dev/urandom) will not compress at all and will expand if you insist.
Even just one bit needed to indicate whether a file is compressed or not
will mean expansion when you fail to compress and store the original
instead of the compressed version.  Data compression reduces
repetition, thus making it harder to further compress compressed data.

Try it yourself.  Try building a pipeline of all the compression tools
you have, see how many rounds of compression you can apply to typical
data before further compression fails.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HP ProLiant N36L

2011-01-18 Thread Eugen Leitl
On Mon, Jan 17, 2011 at 02:19:23AM -0800, Trusty Twelve wrote:
 I've successfully installed NexentaStor 3.0.4 on this microserver using PXE. 
 Works like a charm.

I've got 5 of them today, and for some reason NexentaCore 3.0.1 b134
was unable to write to disks (whether internal USB or the 4x SATA).

Known problem? Should I go to stable, or try NexentaStor instead?
(I'd rather keep options open with Nexenta Core and napp-it).

-- 
Eugen* Leitl a href=http://leitl.org;leitl/a http://leitl.org
__
ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is my bottleneck RAM?

2011-01-18 Thread Erik Trimble
On Tue, 2011-01-18 at 15:11 +, Michael Armstrong wrote:
 I've since turned off dedup, added another 3 drives and results have improved 
 to around 148388K/sec on average, would turning on compression make things 
 more CPU bound and improve performance further?
 
 On 18 Jan 2011, at 15:07, Richard Elling wrote:
 
  On Jan 15, 2011, at 4:21 PM, Michael Armstrong wrote:
  
  Hi guys, sorry in advance if this is somewhat a lowly question, I've 
  recently built a zfs test box based on nexentastor with 4x samsung 2tb 
  drives connected via SATA-II in a raidz1 configuration with dedup enabled 
  compression off and pool version 23. From running bonnie++ I get the 
  following results:
  
  Version 1.03b   --Sequential Output-- --Sequential Input- 
  --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
  --Seeks--
  MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  
  /sec %CP
  nexentastor  4G 60582  54 20502   4 12385   3 53901  57 105290  10 
  429.8   1
--Sequential Create-- Random 
  Create
-Create-- --Read--- -Delete-- -Create-- --Read--- 
  -Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec 
  %CP
 16  7181  29 + +++ + +++ 21477  97 + +++ + 
  +++
  nexentastor,4G,60582,54,20502,4,12385,3,53901,57,105290,10,429.8,1,16,7181,29,+,+++,+,+++,21477,97,+,+++,+,+++
  
  
  I'd expect more than 105290K/s on a sequential read as a peak for a single 
  drive, let alone a striped set. The system has a relatively decent CPU, 
  however only 2GB memory, do you think increasing this to 4GB would 
  noticeably affect performance of my zpool? The memory is only DDR1.
  
  2GB or 4GB of RAM + dedup is a recipe for pain. Do yourself a favor, turn 
  off dedup
  and enable compression.
  -- richard
  
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Compression will help speed things up (I/O, that is), presuming that
you're not already CPU-bound, which it doesn't seem you are.

If you want Dedup, you pretty much are required to buy an SSD for L2ARC,
*and* get more RAM.


These days, I really don't recommend running ZFS as a fileserver without
a bare minimum of 4GB of RAM (8GB for anything other than light use),
even with Dedup turned off. 


-- 
Erik Trimble
Java System Support
Mailstop:  usca22-317
Phone:  x67195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is my bottleneck RAM?

2011-01-18 Thread Michael Armstrong
Thanks everyone, I think overtime I'm gonna update the system to include an ssd 
for sure. Memory may come later though. Thanks for everyone's responses

Erik Trimble erik.trim...@oracle.com wrote:

On Tue, 2011-01-18 at 15:11 +, Michael Armstrong wrote:
 I've since turned off dedup, added another 3 drives and results have 
 improved to around 148388K/sec on average, would turning on compression make 
 things more CPU bound and improve performance further?
 
 On 18 Jan 2011, at 15:07, Richard Elling wrote:
 
  On Jan 15, 2011, at 4:21 PM, Michael Armstrong wrote:
  
  Hi guys, sorry in advance if this is somewhat a lowly question, I've 
  recently built a zfs test box based on nexentastor with 4x samsung 2tb 
  drives connected via SATA-II in a raidz1 configuration with dedup enabled 
  compression off and pool version 23. From running bonnie++ I get the 
  following results:
  
  Version 1.03b   --Sequential Output-- --Sequential Input- 
  --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
  --Seeks--
  MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  
  /sec %CP
  nexentastor  4G 60582  54 20502   4 12385   3 53901  57 105290  10 
  429.8   1
--Sequential Create-- Random 
  Create
-Create-- --Read--- -Delete-- -Create-- --Read--- 
  -Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec 
  %CP
 16  7181  29 + +++ + +++ 21477  97 + +++ + 
  +++
  nexentastor,4G,60582,54,20502,4,12385,3,53901,57,105290,10,429.8,1,16,7181,29,+,+++,+,+++,21477,97,+,+++,+,+++
  
  
  I'd expect more than 105290K/s on a sequential read as a peak for a 
  single drive, let alone a striped set. The system has a relatively decent 
  CPU, however only 2GB memory, do you think increasing this to 4GB would 
  noticeably affect performance of my zpool? The memory is only DDR1.
  
  2GB or 4GB of RAM + dedup is a recipe for pain. Do yourself a favor, turn 
  off dedup
  and enable compression.
  -- richard
  
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Compression will help speed things up (I/O, that is), presuming that
you're not already CPU-bound, which it doesn't seem you are.

If you want Dedup, you pretty much are required to buy an SSD for L2ARC,
*and* get more RAM.


These days, I really don't recommend running ZFS as a fileserver without
a bare minimum of 4GB of RAM (8GB for anything other than light use),
even with Dedup turned off. 


-- 
Erik Trimble
Java System Support
Mailstop:  usca22-317
Phone:  x67195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is my bottleneck RAM?

2011-01-18 Thread Erik Trimble
You can't really do that.

Adding an SSD for L2ARC will help a bit, but L2ARC storage also consumes
RAM to maintain a cache table of what's in the L2ARC.  Using 2GB of RAM
with an SSD-based L2ARC (even without Dedup) likely won't help you too
much vs not having the SSD. 

If you're going to turn on Dedup, you need at least 8GB of RAM to go
with the SSD.

-Erik


On Tue, 2011-01-18 at 18:35 +, Michael Armstrong wrote:
 Thanks everyone, I think overtime I'm gonna update the system to include an 
 ssd for sure. Memory may come later though. Thanks for everyone's responses
 
 Erik Trimble erik.trim...@oracle.com wrote:
 
 On Tue, 2011-01-18 at 15:11 +, Michael Armstrong wrote:
  I've since turned off dedup, added another 3 drives and results have 
  improved to around 148388K/sec on average, would turning on compression 
  make things more CPU bound and improve performance further?
  
  On 18 Jan 2011, at 15:07, Richard Elling wrote:
  
   On Jan 15, 2011, at 4:21 PM, Michael Armstrong wrote:
   
   Hi guys, sorry in advance if this is somewhat a lowly question, I've 
   recently built a zfs test box based on nexentastor with 4x samsung 2tb 
   drives connected via SATA-II in a raidz1 configuration with dedup 
   enabled compression off and pool version 23. From running bonnie++ I 
   get the following results:
   
   Version 1.03b   --Sequential Output-- --Sequential Input- 
   --Random-
 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
   --Seeks--
   MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  
   /sec %CP
   nexentastor  4G 60582  54 20502   4 12385   3 53901  57 105290  10 
   429.8   1
 --Sequential Create-- Random 
   Create
 -Create-- --Read--- -Delete-- -Create-- --Read--- 
   -Delete--
   files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  
   /sec %CP
  16  7181  29 + +++ + +++ 21477  97 + +++ 
   + +++
   nexentastor,4G,60582,54,20502,4,12385,3,53901,57,105290,10,429.8,1,16,7181,29,+,+++,+,+++,21477,97,+,+++,+,+++
   
   
   I'd expect more than 105290K/s on a sequential read as a peak for a 
   single drive, let alone a striped set. The system has a relatively 
   decent CPU, however only 2GB memory, do you think increasing this to 
   4GB would noticeably affect performance of my zpool? The memory is only 
   DDR1.
   
   2GB or 4GB of RAM + dedup is a recipe for pain. Do yourself a favor, 
   turn off dedup
   and enable compression.
   -- richard
   
  
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 
 Compression will help speed things up (I/O, that is), presuming that
 you're not already CPU-bound, which it doesn't seem you are.
 
 If you want Dedup, you pretty much are required to buy an SSD for L2ARC,
 *and* get more RAM.
 
 
 These days, I really don't recommend running ZFS as a fileserver without
 a bare minimum of 4GB of RAM (8GB for anything other than light use),
 even with Dedup turned off. 
 
 
 -- 
 Erik Trimble
 Java System Support
 Mailstop:  usca22-317
 Phone:  x67195
 Santa Clara, CA
 Timezone: US/Pacific (GMT-0800)
 
-- 
Erik Trimble
Java System Support
Mailstop:  usca22-317
Phone:  x67195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] configuration

2011-01-18 Thread Brandon High
On Mon, Jan 17, 2011 at 6:19 AM, Piotr Tarnowski
drfug...@drfugazi.eu.org wrote:
 You can also make 250 GB slices (partitions) and create RAIDZ 3x250GB and 
 mirror 2x1750GB (one or more).

This configuration doesn't make a lot of sense for redundancy, since
it doesn't provide any. It will have poor performance caused by
excessive disk seeks as well. The only time it would make sense is if
you're planning on replacing each slice with a separate drives.

 Mirror has better performance for write operations, Raidz shoud be faster for 
 read.

ZFS mirrors will read off of both sides of the mirror, as in a stripe.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is my bottleneck RAM?

2011-01-18 Thread Michael Armstrong
Ah ok, I wont be using dedup anyway just wanted to try. Ill be adding more ram 
though, I guess you can't have too much. Thanks

Erik Trimble erik.trim...@oracle.com wrote:

You can't really do that.

Adding an SSD for L2ARC will help a bit, but L2ARC storage also consumes
RAM to maintain a cache table of what's in the L2ARC.  Using 2GB of RAM
with an SSD-based L2ARC (even without Dedup) likely won't help you too
much vs not having the SSD. 

If you're going to turn on Dedup, you need at least 8GB of RAM to go
with the SSD.

-Erik


On Tue, 2011-01-18 at 18:35 +, Michael Armstrong wrote:
 Thanks everyone, I think overtime I'm gonna update the system to include an 
 ssd for sure. Memory may come later though. Thanks for everyone's responses
 
 Erik Trimble erik.trim...@oracle.com wrote:
 
 On Tue, 2011-01-18 at 15:11 +, Michael Armstrong wrote:
  I've since turned off dedup, added another 3 drives and results have 
  improved to around 148388K/sec on average, would turning on compression 
  make things more CPU bound and improve performance further?
  
  On 18 Jan 2011, at 15:07, Richard Elling wrote:
  
   On Jan 15, 2011, at 4:21 PM, Michael Armstrong wrote:
   
   Hi guys, sorry in advance if this is somewhat a lowly question, I've 
   recently built a zfs test box based on nexentastor with 4x samsung 2tb 
   drives connected via SATA-II in a raidz1 configuration with dedup 
   enabled compression off and pool version 23. From running bonnie++ I 
   get the following results:
   
   Version 1.03b   --Sequential Output-- --Sequential Input- 
   --Random-
 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
   --Seeks--
   MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  
   /sec %CP
   nexentastor  4G 60582  54 20502   4 12385   3 53901  57 105290  10 
   429.8   1
 --Sequential Create-- Random 
   Create
 -Create-- --Read--- -Delete-- -Create-- --Read--- 
   -Delete--
   files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  
   /sec %CP
  16  7181  29 + +++ + +++ 21477  97 + +++ 
   + +++
   nexentastor,4G,60582,54,20502,4,12385,3,53901,57,105290,10,429.8,1,16,7181,29,+,+++,+,+++,21477,97,+,+++,+,+++
   
   
   I'd expect more than 105290K/s on a sequential read as a peak for a 
   single drive, let alone a striped set. The system has a relatively 
   decent CPU, however only 2GB memory, do you think increasing this to 
   4GB would noticeably affect performance of my zpool? The memory is 
   only DDR1.
   
   2GB or 4GB of RAM + dedup is a recipe for pain. Do yourself a favor, 
   turn off dedup
   and enable compression.
   -- richard
   
  
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 
 Compression will help speed things up (I/O, that is), presuming that
 you're not already CPU-bound, which it doesn't seem you are.
 
 If you want Dedup, you pretty much are required to buy an SSD for L2ARC,
 *and* get more RAM.
 
 
 These days, I really don't recommend running ZFS as a fileserver without
 a bare minimum of 4GB of RAM (8GB for anything other than light use),
 even with Dedup turned off. 
 
 
 -- 
 Erik Trimble
 Java System Support
 Mailstop:  usca22-317
 Phone:  x67195
 Santa Clara, CA
 Timezone: US/Pacific (GMT-0800)
 
-- 
Erik Trimble
Java System Support
Mailstop:  usca22-317
Phone:  x67195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] How well does zfs mirror handle temporary disk offlines?

2011-01-18 Thread Philip Brown
Sorry if this is well known.. I tried a bunch of googles, but didnt get 
anywhere useful. Closest I came, was 
http://mail.opensolaris.org/pipermail/zfs-discuss/2009-April/028090.html  but 
that doesnt answer my question, below, reguarding zfs mirror recovery.
Details of our needs follow.


We normally are very into redundancy. Pretty much all our SAN storage is dual 
ported, along with all our production hosts. Two completely redundant paths to 
storage. Two independant SANs.

However, now, we are encountering a need for tier 3 storage, aka not that 
important, we're going to go cheap on it ;-)
That being said, we'd still like to make it as reliable and robust as possible. 
So I was wondering just how robust it would be to do ZFS mirroring, across 2 
sans.

My specific question is, how easily does ZFS handle *temporary* SAN 
disconnects, to one side of the mirror?
What if the outage is only 60 seconds?
3 minutes?
10 minutes?
an hour?

If we have 2x1TB drives, in a simple zfs mirror if one side goes 
temporarily off line, will zfs attempt to resync **1 TB** when it comes back? 
Or does it have enough intelligence to say, oh hey I know this disk..and I 
know [these bits] are still good, so I just need to resync [that bit] ?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How well does zfs mirror handle temporary disk offlines?

2011-01-18 Thread Torrey McMahon



On 1/18/2011 2:46 PM, Philip Brown wrote:

My specific question is, how easily does ZFS handle*temporary*  SAN 
disconnects, to one side of the mirror?
What if the outage is only 60 seconds?
3 minutes?
10 minutes?
an hour?


Depends on the multipath drivers and the failure mode. For example, if 
the link drops completely at the host hba connection some failover 
drivers will mark the path down immediately which will propagate up the 
stack faster than an intermittent connection or something father down 
stream failing.



If we have 2x1TB drives, in a simple zfs mirror if one side goes temporarily off 
line, will zfs attempt to resync **1 TB** when it comes back? Or does it have enough 
intelligence to say, oh hey I know this disk..and I know [these bits] are still 
good, so I just need to resync [that bit] ?


My understanding is yes though I can't find the reference for this. (I'm 
sure someone else will find it in short order.)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How well does zfs mirror handle temporary disk offlines?

2011-01-18 Thread Erik Trimble
On Tue, 2011-01-18 at 14:51 -0500, Torrey McMahon wrote:
 
 On 1/18/2011 2:46 PM, Philip Brown wrote:
  My specific question is, how easily does ZFS handle*temporary*  SAN 
  disconnects, to one side of the mirror?
  What if the outage is only 60 seconds?
  3 minutes?
  10 minutes?
  an hour?
 
 Depends on the multipath drivers and the failure mode. For example, if 
 the link drops completely at the host hba connection some failover 
 drivers will mark the path down immediately which will propagate up the 
 stack faster than an intermittent connection or something father down 
 stream failing.
 
  If we have 2x1TB drives, in a simple zfs mirror if one side goes 
  temporarily off line, will zfs attempt to resync **1 TB** when it comes 
  back? Or does it have enough intelligence to say, oh hey I know this 
  disk..and I know [these bits] are still good, so I just need to resync 
  [that bit] ?
 
 My understanding is yes though I can't find the reference for this. (I'm 
 sure someone else will find it in short order.)


ZFS's ability to handle short-term interruptions depend heavily on the
underlying device driver.

If the device driver reports the device as dead/missing/etc at any
point, then ZFS is going to require a zpool replace action before it
re-accepts the device.  If the underlying driver simply stalls, then
it's more graceful (and no user interaction is required).

As far as what the resync does:  ZFS does smart resilvering, in that
it compares what the good side of the mirror has against what the
bad side has, and only copies the differences over to sync them up.
This is one of ZFS's great strengths, in that most other RAID systems
can't do this.



-- 
Erik Trimble
Java System Support
Mailstop:  usca22-317
Phone:  x67195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How well does zfs mirror handle temporary disk offlines?

2011-01-18 Thread Chris Banal

Erik Trimble wrote:

On Tue, 2011-01-18 at 14:51 -0500, Torrey McMahon wrote:

On 1/18/2011 2:46 PM, Philip Brown wrote:

My specific question is, how easily does ZFS handle*temporary*  SAN 
disconnects, to one side of the mirror?
What if the outage is only 60 seconds?
3 minutes?
10 minutes?
an hour?


No idea how well it will reconnect the device but we had an X4500 that 
would randomly boot up and one or two disks would be missing. Reboot 
again and one or two other disks would be missing. While we were trouble 
shooting this problem this happened dozens and dozens of times and zfs 
had no trouble with it as far as I could tell. Would only resliver the 
data that was changed while that drive was offline. We had no data loss.



Thank you,
Chris Banal

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How well does zfs mirror handle temporary disk offlines?

2011-01-18 Thread Philip Brown
 On Tue, 2011-01-18 at 14:51 -0500, Torrey McMahon
 wrote:

 ZFS's ability to handle short-term interruptions
 depend heavily on the
 underlying device driver.
 
 If the device driver reports the device as
 dead/missing/etc at any
 point, then ZFS is going to require a zpool replace
 action before it
 re-accepts the device.  If the underlying driver
 simply stalls, then
 it's more graceful (and no user interaction is
 required).
 
 As far as what the resync does:  ZFS does smart
 resilvering, in that
 it compares what the good side of the mirror has
 against what the
 bad side has, and only copies the differences over
 to sync them up.


Hmm. Well, we're talking fibre, so we're very concerned with the recovery  mode 
when the fibre drivers have marked it as failed. (except it hasnt really 
failed, we've just had a switch drop out)

I THINK what you are saying, is that we could, in this situation, do:

zpool replace (old drive) (new drive)

and then your smart recovery, should do the limited resilvering only. Even 
for potentially long outages.

Is that what you are saying?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HP ProLiant N36L

2011-01-18 Thread Trusty Twelve
I've installed nexentastor on 8GB usb stick without any problems, so try 
nexentastor instead of nexentacore...
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How well does zfs mirror handle temporary disk offlines?

2011-01-18 Thread Erik Trimble
On Tue, 2011-01-18 at 13:34 -0800, Philip Brown wrote:
  On Tue, 2011-01-18 at 14:51 -0500, Torrey McMahon
  wrote:
 
  ZFS's ability to handle short-term interruptions
  depend heavily on the
  underlying device driver.
  
  If the device driver reports the device as
  dead/missing/etc at any
  point, then ZFS is going to require a zpool replace
  action before it
  re-accepts the device.  If the underlying driver
  simply stalls, then
  it's more graceful (and no user interaction is
  required).
  
  As far as what the resync does:  ZFS does smart
  resilvering, in that
  it compares what the good side of the mirror has
  against what the
  bad side has, and only copies the differences over
  to sync them up.
 
 
 Hmm. Well, we're talking fibre, so we're very concerned with the recovery  
 mode when the fibre drivers have marked it as failed. (except it hasnt 
 really failed, we've just had a switch drop out)
 
 I THINK what you are saying, is that we could, in this situation, do:
 
 zpool replace (old drive) (new drive)
 
 and then your smart recovery, should do the limited resilvering only. Even 
 for potentially long outages.
 
 Is that what you are saying?


Yes. It will always look at the replaced drive to see if it was a
prior member of the mirror, and do smart resilvering if possible.

If the device path stays the same (which, hopefully, it should), you can
even do:

zpool replace (old device) (old device)




-- 
Erik Trimble
Java System Support
Mailstop:  usca22-317
Phone:  x67195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] configuration

2011-01-18 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Trusty Twelve
 
 Hello, I'm going to build home server. System is deployed on 8 GB USB
flash
 drive. I have two identical 2 TB HDD and 250 GB one. Could you please
 recommend me ZFS configuration for the set of my hard drives?
 1)
 pool1: mirror 2tb x 2
 pool2: 250 gb (or maybe add this drive to pool1???)
 2)
 pool1: mirror 2tb x 2 + cache/log 250 gb

I recommend option 3:
mirror 2tb x 2
disconnect 250G
disconnect 8G flash drive

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Request for comments: L2ARC, ZIL, RAM, and slow storage

2011-01-18 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Karl Wagner
 
 Consider the situation where someone has a large amount of off-site data
 storage (of the order of 100s of TB or more). They have a slow network
link
 to this storage.
 
 My idea is that this could be used to build the main vdevs for a ZFS pool.
 On top of this, an array of disks (of the order of TBs to 10s of TB) is
 available locally, which can be used as L2ARC. There are also smaller,
 faster arrays (of the order of 100s of GB) which, in my mind, could be
used
 as a ZIL.
 
 Now, in this theoretical situation, in-play read data is kept on the
L2ARC,
 and can be accessed about as fast as if this array was just used as the
main
 pool vdevs. Written data goes to the ZIL, as is then sent down the slow
link
 to the offsite storage. Rarely used data is still available as if on site
 (shows up in the same file structure), but is effectively archived to
the
 offsite storage.
 
 Now, here comes the problem. According to what I have read, the maximum
 size
 for the ZIL is approx 50% of the physical memory in the system, which
would

Here's the bigger problem:
You seem to be thinking of ZIL as write buffer.  This is not the case.  ZIL
only allows sync writes to become async writes, which are buffered in RAM.
Depending on your system, it will refuse to buffer more than 5sec or 30sec
of async writes, and your async writes are still going to be slow.

Also, L2ARC is not persistent, and there is a maximum fill rate (which I
don't know much about.)  So populating the L2ARC might not happen as fast as
you want, and every time you reboot it will have to be repopulated.

If at all possible, instead of using the remote storage as the primary
storage, you can use the remote storage to receive incremental periodic
snapshots, and that would perform optimally, because the remote storage is
then isolated from rapid volatile changes.  The zfs send | zfs receive
datastreams will be full of large sequential blocks and not small random IO.

Most likely you will gain performance by enabling both compression and
dedup.  But of course, that depends on the nature of your data.


 And
 finally, if the network link was to die, I am assuming the entire ZPool
 would become unavailable.

The behavior in this situation is configurable via failmode.  The default
is wait which essentially pauses the filesystem until the disks become
available again.  Unfortunately, until the disks become available again, the
system can become ... pretty undesirable to use, and possibly require a
power cycle.

You can also use panic or continue, which you can read about in the
zpool manpage if your want.

 vdevs as an archive store (i.e. it goes
 [ARC]-[L2ARC/ZIL]-[main]-[archive]). Infrequently used files/blocks
could

You're pretty much describing precisely what I'm suggesting... using zfs
send | zfs receive.

I suppose the difference between what you're suggesting and what I'm
suggesting, is the separation of two pools versus misrepresenting the
remote storage as part of the local pool, etc.  That's a pretty major
architectural change.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How well does zfs mirror handle temporary disk offlines?

2011-01-18 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Erik Trimble
 
 As far as what the resync does:  ZFS does smart resilvering, in that
 it compares what the good side of the mirror has against what the
 bad side has, and only copies the differences over to sync them up.
 This is one of ZFS's great strengths, in that most other RAID systems
 can't do this.

It's also one of ZFS's great weaknesses.  It's a strength as long as not
much data has changed, or it was highly sequential in nature, or the drives
in the pool have extremely high IOPS (SSD's etc) because then resilvering
just the changed parts can be done very quickly.  Much quicker than
resilvering the whole drive sequentially as a typical hardware raid would
do.  However, as is often the case, a large percentage of the drive may have
changed, in essentially random order.  There are many situations where
something like 3% of the drive has changed, yet the resilver takes 100% as
long as rewriting the entire drive sequentially would have taken.  10% of
the drive changed  ZFS resilver might be 4x slower than sequentially
overwriting the entire disk as a hardware raid would have done.

Ultimately, your performance depends entirely on your usage patterns, your
pool configuration, and type of hardware.

To the OP:  If you've got one device on one SAN, mirrored to another device
on another SAN, you're probably only expecting very brief outages on either
SAN.  As such, you probably won't see any large percentage of the online SAN
change, and when the temporarily failed SAN comes back online, you can
probably expect a very fast resilver.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss