[zfs-discuss] Maximum zfs send/receive throughput

2010-06-25 Thread Mika Borner


It seems we are hitting a boundary with zfs send/receive over a network 
link (10Gb/s). We can see peak values of up to 150MB/s, but on average 
about 40-50MB/s are replicated. This is far away from the bandwidth that 
a 10Gb link can offer.


Is it possible, that ZFS is giving replication a too low 
priority/throttling it too much?



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS/ZFS slow on parallel writes

2009-09-29 Thread Mika Borner

Bob Friesenhahn wrote:
Striping across two large raidz2s is not ideal for multi-user use. You 
are getting the equivalent of two disks worth of IOPS, which does not 
go very far. More smaller raidz vdevs or mirror vdevs would be 
better.  Also, make sure that you have plenty of RAM installed.



For small files I would definitely go mirrored.
What disk configuration (number of disks, and RAID topology) is the 
NetApp using?



On NetApp you only can choose between RAID-DP and RAID-DP :-)

With mirroring you will certainly loose space-wise against NetApp, but 
if your data compresses well, you will still end up with more space 
available. Our 7410 system currently compresses with a CPU utilisation 
of around 3% for compression. This while using gzip-2 and getting a 
compression ratio of 1.96.


So far, I'm very happy with the system.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on SAN?

2009-02-14 Thread Mika Borner

Andras Spitzer wrote:

Is it worth to move the redundancy from the SAN array layer to the ZFS layer? 
(configuring redundancy on both layers is sounds like a waste to me)  There are 
certain advantages on the array to have redundancy configured (beyond the 
protection against simple disk failure). Can we compare the advantages of 
having  (for example) RAID5 configured on a high-end SAN with no redundancy at 
the ZFS layer versus no redundant RAID configuration on the high-end SAN but 
having raidz or raidz2 on the ZFS layer?

Any tests, experience or best practices regarding this topic?


  

Would also like to hear about experiences with ZFS on EMC's Symmetrix.

Currently we are using VxFS with Powerpath for multipathing, and 
synchronous SRDF for replication to our other datacenter.


At some point we will move to ZFS, but there are so many options how to 
implement this.


From a sysadmin point of view (simplicity), I would like to use mpxio 
and host based mirroring. ZFS self-healing would be available in this 
configuration.


Asking EMC guys for their opinion is not an option. They will push you 
to buy SRDF and Powerpath licenses... :-)


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Crazy Problem with

2009-01-27 Thread Mika Borner
Mika Borner wrote:
>
> You're lucky. Ben just wrote about it :-)
>
> http://www.cuddletech.com/blog/pivot/entry.php?id=1013
>
>
>   
Oops, should have read your message completly :-) Anyway you can 
"lernen" something from it...
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Crazy Problem with

2009-01-27 Thread Mika Borner
Henri Meddox wrote:
> Hi Folks,
> call me a lernen ;-)
>
> I got a crazy Problem with "zpool list" and the size of my pool:
>
> created "zpool create raidz2 hdd1 hdd2 hdd3" - each hdd is about 1GB.
>  
> zpool list shows me a size of 2.95GB - shouldn't this bis online 1GB?
>
> After creating a file about 500MB -> Capacity is shown as 50 % -> The right 
> value?
>
> Is this a known bug / feature?
>
>   

You're lucky. Ben just wrote about it :-)

http://www.cuddletech.com/blog/pivot/entry.php?id=1013



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs send / zfs receive hanging

2009-01-12 Thread Mika Borner
Hi

Updated today from snv 101 to 105 today. I wanted to do zfs send/receive to a 
new zpool while forgetting that the new pool was a newer version.

zfs send timed out after a while, but it was impossible to kill the receive 
process.

Shouldn't the zfs receive command just fail with a "wrong version" error?

In the end I had to reboot...

It would be also nice to be able to specify the zpool version during pool 
creation. E.g. If I have a newer machine and I want to move data to an older 
one, I should be able to specify the pool version, otherwise it's a one-way 
street.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-02 Thread Mika Borner
Ulrich Graef wrote:
> You need not to wade through your paper...
> ECC theory tells, that you need a minimum distance of 3
> to correct one error in a codeword, ergo neither RAID-5 or RAID-6
> are enough: you need RAID-2 (which nobody uses today).
>
> Raid-Controllers today take advantage of the fact that they know,
> which disk is returning the bad block, because this disk returns
> a read error.
>
> ZFS is even able to correct, when an error in the data exist,
> but no disk is reporting a read error,
> because ZFS ensures the integrity from root-block to the data blocks
> with a long checksum accompanying the block pointers.
>
>   

The Netapp paper mentioned by JZ 
(http://pages.cs.wisc.edu/~krioukov/ParityLostAndParityRegained-FAST08.ppt) 
talks about write verify.

Would this feature make sense in a ZFS environment? I'm not sure if 
there is any advantage. It seems quite unlikely, when data is written in 
a redundant way to two different disks, that both disks lose or 
misdirect the same writes.

Maybe ZFS could have an option to enable instant readback of written 
blocks, if one wants to be absolutely sure, data is written correctly to 
disk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Storage 7000

2008-11-17 Thread Mika Borner
Adam Leventhal wrote:
> Yes. The Sun Storage 7000 Series uses the same ZFS that's in OpenSolaris
> today. A pool created on the appliance could potentially be imported on an
> OpenSolaris system; that is, of course, not explicitly supported in the
> service contract.
>   
Would be interesting to hear more about how Fishworks differs from 
Opensolaris, what build it is based on, what package mechanism you are 
using (IPS already?), and other differences...

A little off topic: Do you know when the SSDs used in the Storage 7000 
are available for the rest of us?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Tool to figure out optimum ZFS recordsize for a Mail server Maildir tree?

2008-10-22 Thread Mika Borner
> Leave the default recordsize. With 128K recordsize,
> files smaller than  

If I turn zfs compression on, does the recordsize influence the compressratio 
in anyway?
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] EMC - top of the table for efficiency, how well would ZFS do?

2008-09-01 Thread Mika Borner
I've read the same log entry, and was also thinking about ZFS...

Pillar Data Systems is also answering to the call 
http://blog.pillardata.com/pillar_data_blog/2008/08/blog-i-love-a-p.html

BTW: Would transparent compression be considered as cheating? :-)
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Current status of a ZFS root

2006-10-27 Thread Mika Borner
>Unfortunately, the T1000 only has a
> single drive bay (!) which makes it impossible to
> follow our normal practice of mirroring the root file

You can replace the existing 3.5" disk with two 2.5" disks (quite cheap)

//Mika
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Oracle 11g Performace

2006-10-24 Thread Mika Borner
Here's an interesting read about forthcoming Oracle 11g file system 
performance. Sadly, there is now information about how this works.

It will be interesting to compare it with ZFS Performance, as soon as ZFS is 
tuned for Databases.


"Speed and performance will be the hallmark of the 11g, said Chuck Rozwat, 
executive vice president for server technologies. 

The new database will run fast enough so that for the first time it will beat 
specialized filed systems for transferring large blocks of data. 

Rozwat displayed test results that showed that the 11g beta is capable of 
transferring 1GB in just under 9 seconds compared to 12 seconds for a file 
system. 

This level of performance is important to customers who are demanding instant 
access to data, Rozwat said. 

"If systems can't perform fast enough and deliver information in real time, we 
are in real trouble," Rozwat said."

http://www.eweek.com/article2/0,1895,2036136,00.asp
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Physical Clone of zpool

2006-09-18 Thread Mika Borner
Hi

We have following scenario/problem:

Our zpool resides on a single LUN on a Hitachi Storage Array. We are
thinking about making a physical clone of the zpool with the ShadowImage
functionality.

ShadowImage takes a snapshot of the LUN, and copies all the blocks to a
new LUN (physical copy). In our case the new LUN is then made available
on the same host as the original LUN.

After the ShadowImage is taken, we can see the snapshop using the
format(1M) command as an additional disk. But when running "zpool
import" , it only says: "no pools available to import".

I think this is a bug. At least it should say something like "pool with
the same name already imported". I have only tested this on 10 06/06,
but I haven't found anything similar in the bug database, so it has to
be in OpenSolaris as well.

Mika

# mv Disclaimer.txt /dev/null








 


-
This message is intended for the addressee only and may
contain confidential or privileged information. If you
are not the intended receiver, any disclosure, copying
to any person or any action taken or omitted to be taken
in reliance on this e-mail, is prohibited and may be un-
lawful. You must therefore delete this e-mail.
Internet communications may not be secure or error-free
and may contain viruses. They may be subject to possible
data corruption, accidental or on purpose. This e-mail is
not and should not be construed as an offer or the
solicitation of an offer to purchase or subscribe or sell
or redeem any investments.
-

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Archiving on ZFS

2006-09-15 Thread Mika Borner
Hi

We are thinking about moving away from our Magneto-Optical based archive system 
(WORM technology). At the moment, we use a volume manager, which virtualizes 
the WORM's in the jukebox and presents them as UFS Filesystems. The volume 
manager automatically does asynchronous replication to an identical system in 
another datacenter. To speed up slow WORM access, the volume manager has a read 
cache.

Because of this cache, we did not find out until we checked the WORMs directly, 
that we have a silent data corruption on some WORMs (Surprise! Surprise!).

Mainly because of this, I was thinking about replacing to whole bunch with 
something more robust and modern... (guess what :-)

Anyway, there are still some points, that came to into my mind:

-The mechanism to asynchronously replicate to another host could be simulated 
using zfs send/receive. Still, I would prefer having a replication, that is 
automatically triggered, like Sun's StorEdge Network Data Replicator does this 
for UFS. This could be easily implemented in ZFS, I guess.

-We have a lot of small files (about 7 millions ~4-32k Files). Like everyone, 
we want to be SOX compliant. So I tried to run BART over those files, to get a 
fingerprint. I remember it took a couple of hours to complete. At least it was 
much faster than on UFS. How can this be speeded up? Maybe we have to split 
those files to seperate filesystems. This leads to my next point:

-I want to be sure, that nobody (maybe not even root) changes my filesystems 
for the next couple of years.. I know there is a read-only property, but it 
might not be enough. On our Hitachi array, we have a WORM functionality, which 
blocks write access to a LUN until a specified date. While this works, it is 
not as flexible as we want it, as the LUNs are too big for our use. Every day 
we are archiving documents. At the end of the day we want to freeze the 
filesystem. Would it be possible to add a time-lock property to ZFS? Could this 
be extended to still allow new files to be added to the locked file system , 
but not allowing to add/modify files (ZFS ACL's could handle this)? Would 
something like this make sense?

Thanks for your thoughts...
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Storage

2006-06-27 Thread Mika Borner
>given that zfs always does copy-on-write for any updates, it's not
clear
>why this would necessarily degrade performance..

Writing should be no problem, as it is serialized... but when both
database instances are reading a lot of different blocks at the same
time, the spindles might "heat up".

>If you want a full copy you can use zfs send/zfs receive -- either
>within the same pool or between two different pools.

Ok. But then again, it might be necessary to throttle zfs send/receive
replication between pools. Otherwise the replication process might be
influencing the production environment performance too much. Or is there
already some kind of prioritization, that I have overlooked?

//Mika

# mv Disclaimer.txt /dev/null






-
This message is intended for the addressee only and may
contain confidential or privileged information. If you
are not the intended receiver, any disclosure, copying
to any person or any action taken or omitted to be taken
in reliance on this e-mail, is prohibited and may be un-
lawful. You must therefore delete this e-mail.
Internet communications may not be secure or error-free
and may contain viruses. They may be subject to possible
data corruption, accidental or on purpose. This e-mail is
not and should not be construed as an offer or the
solicitation of an offer to purchase or subscribe or sell
or redeem any investments.
-

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS and Storage

2006-06-27 Thread Mika Borner
>RAID5 is not a "nice" feature when it breaks.

Let me correct myself...  RAID5 is a "nice" feature for systems without
ZFS...

>Are huge write caches really a advantage?  Or are you taking about
huge
>write caches with non-volatile storage?

Yes, you are right. The huge cache is needed mostly because of poor
write performance for RAID5 (of course battery backuped)...


// Mika

# mv Disclaimer.txt /dev/null


-
This message is intended for the addressee only and may
contain confidential or privileged information. If you
are not the intended receiver, any disclosure, copying
to any person or any action taken or omitted to be taken
in reliance on this e-mail, is prohibited and may be un-
lawful. You must therefore delete this e-mail.
Internet communications may not be secure or error-free
and may contain viruses. They may be subject to possible
data corruption, accidental or on purpose. This e-mail is
not and should not be construed as an offer or the
solicitation of an offer to purchase or subscribe or sell
or redeem any investments.
-

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Storage

2006-06-27 Thread Mika Borner
>but there may not be filesystem space for double the data.
>Sounds like there is a need for a zfs-defragement-file utility
perhaps?
>Or if you want to be politically cagey about naming choice, perhaps,
>zfs-seq-read-optimize-file ?  :-)

For Datawarehouse and streaming applications a 
"seq-read-omptimization" could bring additional performance. For
"normal" databases this should be benchmarked...

This brings me back to another question. We have a production database,
that is cloned on every end of month for end-of-month processing
(currently with a feature on our storage array).

I'm thinking about a ZFS version of this task. Requirements: the
production database should not suffer from performance degradation,
whilst running the clone in parallel. As ZFS does not clone all the
blocks, I wonder how much the procution database will suffer from
sharing most of the data with the clone (concurrent access vs. caching)

Maybe we need a feature in ZFS to do a full clone (speak: copy all
blocks) inside the pool, if performance is an issue just like the
"Quick Copy" vs. "Shadow Image" -features on HDS Arrays... 






-
This message is intended for the addressee only and may
contain confidential or privileged information. If you
are not the intended receiver, any disclosure, copying
to any person or any action taken or omitted to be taken
in reliance on this e-mail, is prohibited and may be un-
lawful. You must therefore delete this e-mail.
Internet communications may not be secure or error-free
and may contain viruses. They may be subject to possible
data corruption, accidental or on purpose. This e-mail is
not and should not be construed as an offer or the
solicitation of an offer to purchase or subscribe or sell
or redeem any investments.
-

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS and Storage

2006-06-27 Thread Mika Borner
>I'm a little confused by the first poster's message as well, but you
lose some benefits of ZFS if you don't create >your pools with either
RAID1 or RAIDZ, such as data corruption detection.  The array isn't
going to detect that >because all it knows about are blocks. 

That's the dilemma, the array provides nice features like RAID1 and
RAID5, but those are of no real use when using ZFS. 

The advantages  to use ZFS on such array are e.g. the sometimes huge
write cache available, use of consolidated storage and in SAN
configurations, cloning and sharing storage between hosts.

The price comes of course in additional administrative overhead (lots
of microcode updates, more components that can fail in between, etc).

Also, in bigger companies there usually is a team of storage
specialist, that mostly do not know about the applications running on
top of it, or do not care... (like: "here you have your bunch of
gigabytes...")

//Mika

# mv Disclaimer.txt /dev/null


-
This message is intended for the addressee only and may
contain confidential or privileged information. If you
are not the intended receiver, any disclosure, copying
to any person or any action taken or omitted to be taken
in reliance on this e-mail, is prohibited and may be un-
lawful. You must therefore delete this e-mail.
Internet communications may not be secure or error-free
and may contain viruses. They may be subject to possible
data corruption, accidental or on purpose. This e-mail is
not and should not be construed as an offer or the
solicitation of an offer to purchase or subscribe or sell
or redeem any investments.
-

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Storage

2006-06-26 Thread Mika Borner
>The vdev can handle dynamic lun growth, but the underlying VTOC or  
>EFI label
>may need to be zero'd and reapplied if you setup the initial vdev on 

>a slice.  If
>you introduced the entire disk to the pool you should be fine, but I 

>believe you'll
>still need to offline/online the pool.

Fine, at least the vdev can handle this...

I asked about this feature in October and hoped that it would be
implemented when integrating ZFS into Sol10U2 ...

http://www.opensolaris.org/jive/thread.jspa?messageID=11646

Does anybody know something about when this feature is finally coming?
This would keep the number of  LUNs low on the host. Especially as
devicenames can be really ugly (long!).

//Mika

# mv Disclaimer.txt /dev/null


-
This message is intended for the addressee only and may
contain confidential or privileged information. If you
are not the intended receiver, any disclosure, copying
to any person or any action taken or omitted to be taken
in reliance on this e-mail, is prohibited and may be un-
lawful. You must therefore delete this e-mail.
Internet communications may not be secure or error-free
and may contain viruses. They may be subject to possible
data corruption, accidental or on purpose. This e-mail is
not and should not be construed as an offer or the
solicitation of an offer to purchase or subscribe or sell
or redeem any investments.
-

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS and Storage

2006-06-26 Thread Mika Borner
Hi

Now that Solaris 10 06/06 is finally downloadable I have some questions
about ZFS.

-We have a big storage sytem supporting RAID5 and RAID1. At the moment,
we only use RAID5 (for non-solaris systems as well). We are thinking
about using ZFS on those LUNs instead of UFS. As ZFS on Hardware RAID5
seems like overkill, an option would be to use RAID1 with RAID-Z. Then
again, this is a waist of space, as it needs more disks, due to the
mirroring. Later on, we might be using asynchronous replication to
another storage system using SAN, even more waste of space. This looks
somehow like storage virtualization as of today just doesn't work nicely
together. What we need, would be the feature to use JBODs.

-Does ZFS in the current version support LUN extension? With UFS, we
have to zero the VTOC, and then adjust the new disk geometry. How does
it look like with ZFS?

-I've read the threads about zfs and databases. Still I'm not 100%
convenienced about read performance. Doesn't the fragmentation of the
large database files (because of the concept of COW) impact
read-performance? 

-Does anybody have any experience in database cloning using the ZFS
mechanism? What factors influence the performance, when running the
cloned database in parallel? 
-I really like the idea to keep all needed databasefiles together, to
allow fast and consistent cloning.

Thanks

Mika


# mv Disclaimer.txt /dev/null





-
This message is intended for the addressee only and may
contain confidential or privileged information. If you
are not the intended receiver, any disclosure, copying
to any person or any action taken or omitted to be taken
in reliance on this e-mail, is prohibited and may be un-
lawful. You must therefore delete this e-mail.
Internet communications may not be secure or error-free
and may contain viruses. They may be subject to possible
data corruption, accidental or on purpose. This e-mail is
not and should not be construed as an offer or the
solicitation of an offer to purchase or subscribe or sell
or redeem any investments.
-

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss