from:"Tim"

On Wed, Feb 20, 2013 at 4:49 PM, Markus Grundmann mar...@freebsduser.euwrote:

Hi!

My name is Markus and I living in germany. I'm new to this list and I have
a simple question
related to zfs. My favorite operating system is FreeBSD and I'm very happy
to use zfs on them.

It's possible to enhance the properties in the current source tree with an
entry like protected?
I find it seems not to be difficult but I'm not an professional C
programmer. For more information
please take a little bit of time and read my short post at

http://forums.freebsd.org/**showthread.php?t=37895http://forums.freebsd.org/showthread.php?t=37895

I have reviewed some pieces of the source code in FreeBSD 9.1 to find out
how difficult it was to
add an pool / filesystem property as an additional security layer for
administrators.

Whenever I modify zfs pools or filesystems it's possible to destroy [on a
bad day :-)] my data. A new
property protected=on|off in the pool and/or filesystem can help the
administrator for datalost
(e.g. zpool destroy tank or zfs destroy tank/filesystem command will
be rejected
when protected=on property is set).

It's anywhere here on this list their can discuss/forward this feature
request? I hope you have
understand my post ;-)

Thanks and best regards,
Markus

I think you're underestimating your English, it's quite good :) In any
case, I think the proposal is a good one. With the default behavior being
off, it won't break anything for existing datasets, and it can absolutely
help prevent a fat finger or a lack of caffeine ruining someone's day.

If the feature is already there somewhere, I'm sure someone will chime in.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Feature Request for zfs pool/filesystem protection?

On Wed, Feb 20, 2013 at 5:09 PM, Richard Elling richard.ell...@gmail.comwrote:

 On Feb 20, 2013, at 2:49 PM, Markus Grundmann mar...@freebsduser.eu
 wrote:

 Hi!

 My name is Markus and I living in germany. I'm new to this list and I have
 a simple question
 related to zfs. My favorite operating system is FreeBSD and I'm very happy
 to use zfs on them.

 It's possible to enhance the properties in the current source tree with an
 entry like protected?
 I find it seems not to be difficult but I'm not an professional C
 programmer. For more information
 please take a little bit of time and read my short post at

 http://forums.freebsd.org/showthread.php?t=37895

 I have reviewed some pieces of the source code in FreeBSD 9.1 to find out
 how difficult it was to
 add an pool / filesystem property as an additional security layer for
 administrators.


 Whenever I modify zfs pools or filesystems it's possible to destroy [on a
 bad day :-)] my data. A new
 property protected=on|off in the pool and/or filesystem can help the
 administrator for datalost
 (e.g. zpool destroy tank or zfs destroy tank/filesystem command will
 be rejected
 when protected=on property is set).


 Look at the delegable properties (zfs allow). For example, you can
 delegate a user to have
 specific privileges and then not allow them to destroy.

 Note: I'm only 99% sure this is implemented in FreeBSD, hopefully someone
 can verify.
  -- richard



With the version of allow I'm looking at, unless I'm missing a setting, it
looks like it'd be a complete nightmare.  I see no concept of deny, so
that means you either have to give *everyone* all permissions besides
delete, or you have to go through every user/group on the box and give
specific permissions and on top of not allowing destroy.  And then if you
change your mind later you have to go back through and give everyone you
want to have that feature access to it.  That seems like a complete PITA to
me.


--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Is there performance penalty when adding vdev to existing pool

On Wed, Feb 20, 2013 at 5:46 PM, Bob Friesenhahn 
bfrie...@simple.dallas.tx.us wrote:

 On Thu, 21 Feb 2013, Sašo Kiselkov wrote:

  On 02/21/2013 12:27 AM, Peter Wood wrote:

 Will adding another vdev hurt the performance?


 In general, the answer is: no. ZFS will try to balance writes to
 top-level vdevs in a fashion that assures even data distribution. If
 your data is equally likely to be hit in all places, then you will not
 incur any performance penalties. If, OTOH, newer data is more likely to
 be hit than old data
 , then yes, newer data will be served from fewer spindles. In that case
 it is possible to do a send/receive of the affected datasets into new
 locations and then renaming them.


 You have this reversed.  The older data is served from fewer spindles than
 data written after the new vdev is added. Performance with the newer data
 should be improved.

 Bob



That depends entirely on how full the pool is when the new vdev is added,
and how frequently the older data changes, snapshots, etc.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Feature Request for zfs pool/filesystem protection?

On Wed, Feb 20, 2013 at 6:47 PM, Richard Elling richard.ell...@gmail.comwrote:

 On Feb 20, 2013, at 3:27 PM, Tim Cook t...@cook.ms wrote:

 On Wed, Feb 20, 2013 at 5:09 PM, Richard Elling 
 richard.ell...@gmail.comwrote:

 On Feb 20, 2013, at 2:49 PM, Markus Grundmann mar...@freebsduser.eu
 wrote:

 Hi!

 My name is Markus and I living in germany. I'm new to this list and I
 have a simple question
 related to zfs. My favorite operating system is FreeBSD and I'm very
 happy to use zfs on them.

 It's possible to enhance the properties in the current source tree with
 an entry like protected?
 I find it seems not to be difficult but I'm not an professional C
 programmer. For more information
 please take a little bit of time and read my short post at

 http://forums.freebsd.org/showthread.php?t=37895

 I have reviewed some pieces of the source code in FreeBSD 9.1 to find out
 how difficult it was to
 add an pool / filesystem property as an additional security layer for
 administrators.


 Whenever I modify zfs pools or filesystems it's possible to destroy [on a
 bad day :-)] my data. A new
 property protected=on|off in the pool and/or filesystem can help the
 administrator for datalost
 (e.g. zpool destroy tank or zfs destroy tank/filesystem command
 will be rejected
 when protected=on property is set).


 Look at the delegable properties (zfs allow). For example, you can
 delegate a user to have
 specific privileges and then not allow them to destroy.

 Note: I'm only 99% sure this is implemented in FreeBSD, hopefully someone
 can verify.
  -- richard



 With the version of allow I'm looking at, unless I'm missing a setting, it
 looks like it'd be a complete nightmare.  I see no concept of deny, so
 that means you either have to give *everyone* all permissions besides
 delete, or you have to go through every user/group on the box and give
 specific permissions and on top of not allowing destroy.  And then if you
 change your mind later you have to go back through and give everyone you
 want to have that feature access to it.  That seems like a complete PITA to
 me.


 :-) they don't call it idiot-proofing for nothing! :-)

 But seriously, one of the first great zfs-discuss wars was over the
 request for a
 -f flag for destroy. The result of the research showed that if you
 typed destroy
 then you meant it, and adding a -f flag just teaches you to type
 destroy -f instead.
 See also kill -9
  -- richard


I hear you, but in his scenario of using scripts for management, there
isn't necessarily human interaction to confirm the operation (appropriately
or not).  Having a pool property seems like an easy way to prevent a
mis-parsed or outright incorrect script from causing havoc on the system.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs-discuss mailing list opensolaris EOL

2013-02-17 Thread Tim Cook

On Sun, Feb 17, 2013 at 8:58 AM, Edward Ned Harvey
(opensolarisisdeadlongliveopensolaris) 
opensolarisisdeadlongliveopensola...@nedharvey.com wrote:

  From: Tim Cook [mailto:t...@cook.ms]
 
  Why would I spend all that time and
  energy participating in ANOTHER list controlled by Oracle, when they have
  shown they have no qualms about eliminating it with basically 0 warning,
 at
  their whim?

 From an open source, community perspective, I understand and agree with
 this sentiment.  If OSS projects behave this way, they die.  The purpose of
 an oracle-hosted mailing list is not for the sake of being open in any way.
 It's for the sake of allowing public discussions about their product.
  While a certain amount of knowledge will exist with or without the list
 (people can still download solaris 11 for evaluation purposes and test it
 out on the honor system) there will be less oracle-specific knowledge in
 existence without the list.  For anyone who's 100% dedicated to OSS and/or
 illumos and doesn't care about oracle-specific stuff, there's no reason to
 use that list.  But for those of us who are sysadmins, developers using
 eval-licensed solaris, or in any way not completely closed to the
 possibility of using oracle zfs / solaris...  For those of us, it makes
 sense.

 Guess what, I formerly subscribed to netapp-toasters as well.  Until zfs
 came along and I was able to happily put netapp in my past.  Perhaps
 someday I'll leave zfs behind in favor of btrfs.  But not yet.

 Guess what also, there is a very active thriving Microsoft forum out there
 too.  And they don't even let you download MS Office or Windows for
 evaluation purposes - they're even more closed than Oracle in this regard.
  They learned their lesson about piracy and the honor system.   ;-)


We can agree to disagree.

I think you're still operating under the auspices of Oracle wanting to have
an open discussion.  This is patently false.  There's a reason why anytime
someone has an issue the response from the Oracle team that posts here is
almost always open a support ticket and give me the number.  And then we
never hear about it again/get the fix unless the end-user happens to come
back and update us.  If you think that Oracle is going to change that
stance with a list hosted on Java.net, you're sadly mistaken.  Their
(collectively, I'm not speaking of any individual) only goal is to help
paying customers.  Period.  The way they've decided to go about that is by
hoarding knowledge.  I've dealt with the company for over a decade, there
will be no open discussions.

NetApp has historically been open with their user community (although at
times in recent history they have made the mistake of turtling up), which
is why the toasters mailing list did as well as it did.  Hell, Dave Hitz
used to be a regular poster.  MS forums are active and thriving because
they've got a massive user base full of extremely experienced admins.  If
there was an open and free version of the MS products, I'm willing to bet
that you'd find the closed source version a ghost town.  For all the
bashing MS has taken throughout history, they're a very open company
relatively speaking.  I can both browse their knowledge base and download
hotfixes without any support contract.

If you're going to have to open a support ticket to get help with issues
anyways, why bother with a mailing list/forum?  Just go straight to
support.  The reason THESE lists have done so well is because the guys who
wrote the code actively participate and give detailed help in the open.  If
the only responses that ever came here were the Oracle responses to open up
a ticket beyond anything but basic problems, this place would've died a
long time ago.  I think the saddest part of the whole situation is Oracle
is so backwards and broken they don't even allow their employees to tell us
what they aren't allowed to talk to us about.  THAT is f-ed.


--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs-discuss mailing list opensolaris EOL

2013-02-16 Thread Tim Cook

On Sat, Feb 16, 2013 at 10:47 AM, Edward Ned Harvey
(opensolarisisdeadlongliveopensolaris) 
opensolarisisdeadlongliveopensola...@nedharvey.com wrote:

  In the absence of any official response, I guess we just have to assume
 this list will be shut down, right?

 So I guess we just have to move to the illumos mailing list, as Deirdre
 suggests?

 ** **

 ** **

 ** **

 *From:* zfs-discuss-boun...@opensolaris.org [mailto:
 zfs-discuss-boun...@opensolaris.org] *On Behalf Of *Edward Ned Harvey
 (opensolarisisdeadlongliveopensolaris)
 *Sent:* Friday, February 15, 2013 11:00 AM
 *To:* zfs-discuss@opensolaris.org
 *Subject:* [zfs-discuss] zfs-discuss mailing list  opensolaris EOL

 ** **

 So, I hear, in a couple weeks' time, opensolaris.org is shutting down.
 What does that mean for this mailing list?  Should we all be moving over to
 something at illumos or something?

 ** **

 I'm going to encourage somebody in an official capacity at opensolaris to
 respond...

 I'm going to discourage unofficial responses, like, illumos enthusiasts
 etc simply trying to get people to jump this list.

 ** **

 Thanks for any info ...



That would be the logical decision, yes.  Not to poke fun, but did you
really expect an official response after YEARS of nothing from Oracle?
 This is the same company that refused to release any Java patches until
the DHS issued a national warning suggesting that everyone uninstall Java.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs-discuss mailing list opensolaris EOL

2013-02-16 Thread Tim Cook

On Sat, Feb 16, 2013 at 11:21 AM, Edward Ned Harvey
(opensolarisisdeadlongliveopensolaris) 
opensolarisisdeadlongliveopensola...@nedharvey.com wrote:

  From: Tim Cook [mailto:t...@cook.ms]
 
  That would be the logical decision, yes.  Not to poke fun, but did you
 really
  expect an official response after YEARS of nothing from Oracle?  This is
 the
  same company that refused to release any Java patches until the DHS
 issued
  a national warning suggesting that everyone uninstall Java.

 Well, yes.  We do have oracle employees who contribute to this mailing
 list.  It is not accurate or fair to stereotype the whole company.  Oracle
 by itself is as large as some cities or countries.

 I can understand a company policy of secrecy about development direction
 and stuff like that.  I would think somebody would be able to officially
 confirm or deny that this mailing list is going to stop.  At least one of
 their system administrators lurks here...




We've got Oracle employees on the mailing list, that while helpful, in no
way have the authority to speak for company policy.  They've made that
clear on numerous occasions   And that doesn't change the fact that we
literally have heard NOTHING from Oracle since the closing of OpensSolaris.
 0 official statements, so I once gain ask: what do you think you were
going to get in response to your questions?

The reason you hear nothing from them on anything official is because it's
a good way to lose your job.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs-discuss mailing list opensolaris EOL

2013-02-16 Thread Tim Cook

On Sat, Feb 16, 2013 at 3:42 PM, cindy swearingen
cindy.swearin...@gmail.com wrote:

Hey Ned and Everyone,

This was new news to use too and we're just talking over some options
yesterday
afternoon so please give us a chance to regroup and provide some
alternatives.

This list will be shutdown but we can start a new one on java.net. There
is a huge
ecosystem around Solaris and ZFS, particularly within Oracle. Many of us
are still
here because we are passionate about ZFS, Solaris 11 and even Solaris 10. I
think we have a great product and a lot of info to share.

If you're interested in a rejuvenated ZFS discuss list on java.net, then
drop me a note:
cindy.swearin...@oracle.com

We are also considering a new ZFS page in that community as well. Oracle
is very
committed to Solaris and ZFS, but they want to consolidate their community
efforts on
java.net, retire some old hardware, and so on.

If you are an Oracle customer with a support contract and you are using
Solaris and ZFS,
and you want to discuss support issues, you should consider that list as
well:

https://communities.oracle.com/portal/server.pt/community/oracle_solaris_zfs_file_system/526

Thanks, Cindy

While I'm sure many appreciate the offer as I do, I can tell you for me
personally: never going to happen. Why would I spend all that time and
energy participating in ANOTHER list controlled by Oracle, when they have
shown they have no qualms about eliminating it with basically 0 warning, at
their whim? I'll be sticking to the illumos lists that I'm confident will
be turned over to someone else should the current maintainers decide they
no longer wish to contribute to the project. On the flip side, I think we
welcome all Oracle employees to participate in that list should corporate
policy allow you to.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] maczfs / ZEVO

2013-02-15 Thread Tim Cook

On Fri, Feb 15, 2013 at 10:08 AM, Edward Ned Harvey
(opensolarisisdeadlongliveopensolaris) 
opensolarisisdeadlongliveopensola...@nedharvey.com wrote:

  Anybody using maczfs / ZEVO?  Have good or bad things to say, in terms
 of reliability, performance, features?

 ** **

 My main reason for asking is this:  I have a mac, I use Time Machine, and
 I have VM's inside.  Time Machine, while great in general, has the
 limitation of being unable to intelligently identify changed bits inside a
 VM file.  So you have to exclude the VM from Time Machine, and you have to
 run backup software inside the VM.  

 ** **

 I would greatly prefer, if it's reliable, to let the VM reside on ZFS and
 use zfs send to backup my guest VM's.

 ** **

 I am not looking to replace HFS+ as the primary filesystem of the mac;
 although that would be cool, there's often a reliability benefit to staying
 on the supported, beaten path, standard configuration.  But if ZFS can be
 used to hold the guest VM storage reliably, I would benefit from that.

 ** **

 Thanks...



I have a few coworkers using it.  No horror stories and it's been in use
about 6 months now.  If there were any showstoppers I'm sure I'd have heard
loud complaints by now :)

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS monitoring

2013-02-11 Thread Tim Cook

On Mon, Feb 11, 2013 at 9:53 AM, Borja Marcos bor...@sarenet.es wrote:


 Hello,

 I'n updating Devilator, the performance data collector for Orca and
 FreeBSD to include ZFS monitoring. So far I am graphing the ARC and L2ARC
 size, L2ARC writes and reads, and several hit/misses data pairs.

 Any suggestions to improve it? What other variables can be interesting?

 An example of the current state of the program is here:

 http://devilator.frobula.com

 Thanks,





 Borja.



The zpool iostat output has all sorts of statistics I think would be
useful/interesting to record over time.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] RFE: Un-dedup for unique blocks

2013-01-20 Thread Tim Cook

On Sun, Jan 20, 2013 at 6:19 PM, Richard Elling richard.ell...@gmail.comwrote:

 On Jan 20, 2013, at 8:16 AM, Edward Harvey imaginat...@nedharvey.com
 wrote:
  But, by talking about it, we're just smoking pipe dreams.  Cuz we all
 know zfs is developmentally challenged now.  But one can dream...

 I disagree the ZFS is developmentally challenged. There is more development
 now than ever in every way: # of developers, companies, OSes, KLOCs,
 features.
 Perhaps the level of maturity makes progress appear to be moving slower
 than
 it seems in early life?

  -- richard


Well, perhaps a part of it is marketing.   Maturity isn't really an excuse
for not having a long-term feature roadmap.  It seems as though maturity
in this case equals stagnation.  What are the features being worked on we
aren't aware of?  The big ones that come to mind that everyone else is
talking about for not just ZFS but openindiana as a whole and other storage
platforms would be:
1. SMB3 - hyper-v WILL be gaining market share over the next couple years,
not supporting it means giving up a sizeable portion of the market.  Not to
mention finally being able to run SQL (again) and Exchange on a fileshare.
2. VAAI support.
3. the long-sought bp-rewrite.
4. full drive encryption support.
5. tiering (although I'd argue caching is superior, it's still a checkbox).

There's obviously more, but those are just ones off the top of my head that
others are supporting/working on.  Again, it just feels like all the work
is going into fixing bugs and refining what is there, not adding new
features.  Obviously Saso personally added features, but overall there
don't seem to be a ton of announcements to the list about features that
have been added or are being actively worked on.  It feels like all these
companies are just adding niche functionality they need that may or may not
be getting pushed back to mainline.

/debbie-downer
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Has anyone used a Dell with a PERC H310?

2013-01-09 Thread Tim Fletcher


On 08.01.2013 20:43, Krunal Desai wrote:
On Mon, Jan 7, 2013 at 4:16 PM, Sašo Kiselkov 
skiselkov...@gmail.com wrote:


PERC H200 are well behaved cards that are easy to reflash and work 
well
(even in JBOD mode) on Illumos - they are essentially a LSI SAS 
9211. If
you can get them, they're one heck of a reliable beast, and cheap 
too!


That method that was linked seemed very specific to Dell servers;
from my experience with reflashing various LSI cards, can't I just 
USB

boot to a FreeDOS environment in any system, and then run
sasflash/sas2flsh with the appropriate IT-mode firmware?


It is indeed very specific to Dell cards and I've actually tried to use 
the generic instructions for the M1015 and it failed because it the card 
didn't match the firmware version.


The normal method with an M1015 is wipe raid firmware with the megarec 
(Megaraid recovery tool), reboot and flash on IT firmware.


The Dell method is more involved but it's the only why that I've 
managed to got a Dell H200 cross flashed.



Seems like the M1015 has spiked in price again on eBay (US) whilst
the H200 is still under $100.



--
Tim Fletcher
t...@night-shade.org.uk
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] HP Proliant DL360 G7

2013-01-09 Thread Tim Fletcher


On 08.01.2013 18:30, Edmund White wrote:

The D2600 and D2700 enclosures are fully supported as Nexenta JBODs.
[1] I run them in multiple production environments [2]. [2]
I *could* use an HP-branded LSI controller (SC08Ge [3]), but I prefer
the higher performance of the LSI 9211 and 9205e HBA's.



The HP H221 is the newer SAS2008 based HBA that replaces the SC08Ge, 
it's definitely a pure HBA as I have one but I don't have any external 
disk shelves to test with currently.


http://h18004.www1.hp.com/products/quickspecs/14222_div/14222_div.html


--
Tim Fletcher
t...@night-shade.org.uk
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Has anyone used a Dell with a PERC H310?

2013-01-07 Thread Tim Fletcher


On 07/01/13 14:01, Andrzej Sochon wrote:

Hello *Sašo*!

I found you here:
http://mail.opensolaris.org/pipermail/zfs-discuss/2012-May/051546.html

“How about reflashing LSI firmware to the card? I read on Dell's spec

sheets that the card runs an LSISAS2008 chip, so chances are that

standard LSI firmware will work on it. I can send you all the required

bits to do the reflash, if you like.”

I got Dell Perc H310 controller for do-it-yourself experiments, I tried
to run it on non-Dell PC platforms like Asus P5Q and Foxconn G31MX.
Without success.

I will appreciate very much any hint how to get LSI firmware and reflash
Dell H310.



I've successfully crossflashed Dell H200 cards with this method

http://forums.servethehome.com/showthread.php?467-DELL-H200-Flash-to-IT-firmware-Procedure-for-DELL-servers


--
Tim Fletcher t...@night-shade.org.uk
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Has anyone used a Dell with a PERC H310?

2013-01-07 Thread Tim Fletcher


On 07/01/13 21:16, Sašo Kiselkov wrote:

On 01/07/2013 09:32 PM, Tim Fletcher wrote:

On 07/01/13 14:01, Andrzej Sochon wrote:

Hello *Sašo*!

I found you here:
http://mail.opensolaris.org/pipermail/zfs-discuss/2012-May/051546.html

“How about reflashing LSI firmware to the card? I read on Dell's spec

sheets that the card runs an LSISAS2008 chip, so chances are that

standard LSI firmware will work on it. I can send you all the required

bits to do the reflash, if you like.”

I got Dell Perc H310 controller for do-it-yourself experiments, I tried
to run it on non-Dell PC platforms like Asus P5Q and Foxconn G31MX.
Without success.

I will appreciate very much any hint how to get LSI firmware and reflash
Dell H310.



I've successfully crossflashed Dell H200 cards with this method

http://forums.servethehome.com/showthread.php?467-DELL-H200-Flash-to-IT-firmware-Procedure-for-DELL-servers


PERC H200 are well behaved cards that are easy to reflash and work well
(even in JBOD mode) on Illumos - they are essentially a LSI SAS 9211. If
you can get them, they're one heck of a reliable beast, and cheap too!


The modular version of the card is often cheaper and takes 2 minutes 
with a torx driver to take off the black plastic L.


I've bought one of these before and it worked well: 
http://www.ebay.co.uk/itm/170888398081


--
Tim Fletcher t...@night-shade.org.uk
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] dm-crypt + ZFS on Linux

2012-11-23 Thread Tim Cook

On Fri, Nov 23, 2012 at 9:49 AM, John Baxter johnleebax...@gmail.comwrote:


 We have the need to encypt our data, approximately 30TB on three ZFS
 volumes under Solaris 10. The volumes currently reside on iscsi sans
 connected via 10Gb/s ethernet. We have tested Solaris 11 with ZFS encrypted
 volumes and found the performance to be very poor and have an open bug
 report with Oracle.

 We are a Linux shop and since performance is so poor and still no
 resolution, we are considering ZFS on Linux with dm-crypt.
 I have read once or twice that if we implemented ZFS + dm-crypt we would
 loose features, however which features are not specified.
 We currently mirror the volumes across identical iscsi sans with ZFS and
 we use hourly ZFS snapshots to update our DR site.

 Which features of ZFS are lost if we use dm-crypt? My guess would be they
 are related to raidz but unsure.



Why don't you just use a SAN that supports full drive encryption?  There
should be basically 0 performance overhead.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS Appliance as a general-purpose server question

2012-11-22 Thread Tim Cook

On Thu, Nov 22, 2012 at 10:50 AM, Jim Klimov jimkli...@cos.ru wrote:

 On 2012-11-22 17:31, Darren J Moffat wrote:

 Is it possible to use the ZFS Storage appliances in a similar
 way, and fire up a Solaris zone (or a few) directly on the box
 for general-purpose software; or to shell-script administrative
 tasks such as the backup archive management in the global zone
 (if that concept still applies) as is done on their current
 Solaris-based box?


 No it is a true appliance, it might look like it has Solaris underneath
 but it is just based on Solaris.

 You can script administrative tasks but not using bash/ksh style
 scripting you use the ZFSSA's own scripting language.


 So, the only supported (or even possible) way is indeed to us it
 as NAS for file or block IO from another head running the database
 or application servers?..

 In the Datasheet I read that Cloning and Remote replication are
 separately licensed features; does this mean that the capability
 for zfs send|zfs recv backups from remote Solaris systems should
 be purchased separately? :(

 I wonder if it would make weird sense to get the boxes, forfeit the
 cool-looking Fishworks, and install Solaris/OI/Nexenta/whatever to
 get the most flexibility and bang for a buck from the owned hardware...
 Or, rather, shop for the equivalent non-appliance servers...

 //Jim




You'd be paying a massive premium to buy them and then install some other
OS on them.  You'd be far better off buying equivalent servers.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] cannot replace X with Y: devices have different sector alignment

2012-11-12 Thread Tim Cook

On Mon, Nov 12, 2012 at 10:39 AM, Trond Michelsen tron...@gmail.com wrote:

 On Sat, Nov 10, 2012 at 5:00 PM, Tim Cook t...@cook.ms wrote:
  On Sat, Nov 10, 2012 at 9:48 AM, Jan Owoc jso...@gmail.com wrote:
  On Sat, Nov 10, 2012 at 8:14 AM, Trond Michelsen tron...@gmail.com
  wrote:
  How can I replace the drive without migrating all the data to a
  different pool? It is possible, I hope?
  I had the same problem. I tried copying the partition layout and some
  other stuff but without success. I ended up having to recreate the
  pool and now have a non-mirrored root fs.
  If anyone has figured out how to mirror drives after getting the
  message about sector alignment, please let the list know :-).
  Not happening with anything that exists today.  The only way this would
 be
  possible is with bp_rewrite which would allow you to evacuate a vdev
  (whether it be for a situation like this, or just to shrink a pool).
  What
  you're trying to do is write a block for block copy to a disk that's
 made up
  of a different block structure.  Not happening.

 That is disappointing. I'll probably manage to find a used 2TB drive
 with 512b blocksize, so I'm sure I'll be able to keep the pool alive,
 but I had planned to swap all 2TB drives for 4TB drives within a year
 or so. This is apparently not an option anymore. I'm also a bit
 annoyed, because I cannot remember seeing any warnings (other than
 performance wise) about mixing 512b and 4kB blocksize discs in a pool,
 or any warnings that you'll be severely restricted if you use 512b
 blocksize discs at all.

  *insert everyone saying they want bp_rewrite and the guys who have the
  skills to do so saying their enterprise customers have other needs*

 bp_rewrite is what's needed to remove vdevs, right? If so, yes, being
 able to remove (or replace) a vdev, would've solved my problem.
 However, I don't see how this could not be desirable for enterprise
 customers. 512b blocksize discs are rapidly disappearing from the
 market. Enterprise discs fail ocasionally too, and if 512b blocksize
 discs can't be replaced by 4kB blocksize discs, then that effectively
 means that you can't replace failed drives on ZFS. I would think that
 this is a desirable feature of an enterprise storage solution.




Enterprise customers are guaranteed equivalent replacement drives for the
life of the system.  Generally 3-5 years.  At the end of that cycle, they
buy all new hardware and simply migrate the data.  It's generally a
non-issue due to the way gear is written off.

--TIm
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] cannot replace X with Y: devices have different sector alignment

2012-11-10 Thread Tim Cook

On Sat, Nov 10, 2012 at 9:48 AM, Jan Owoc jso...@gmail.com wrote:

 On Sat, Nov 10, 2012 at 8:14 AM, Trond Michelsen tron...@gmail.com
 wrote:
  When I try to replace the old drive, I get this error:
 
  # zpool replace tank c4t5000C5002AA2F8D6d0 c4t5000C5004DE863F2d0
  cannot replace c4t5000C5002AA2F8D6d0 with c4t5000C5004DE863F2d0:
  devices have different sector alignment
 
 
  How can I replace the drive without migrating all the data to a
  different pool? It is possible, I hope?

 I had the same problem. I tried copying the partition layout and some
 other stuff but without success. I ended up having to recreate the
 pool and now have a non-mirrored root fs.

 If anyone has figured out how to mirror drives after getting the
 message about sector alignment, please let the list know :-).

 Jan



Not happening with anything that exists today.  The only way this would be
possible is with bp_rewrite which would allow you to evacuate a vdev
(whether it be for a situation like this, or just to shrink a pool).  What
you're trying to do is write a block for block copy to a disk that's made
up of a different block structure.  Not happening.

*insert everyone saying they want bp_rewrite and the guys who have the
skills to do so saying their enterprise customers have other needs*


--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] cannot replace X with Y: devices have different sector alignment

2012-11-10 Thread Tim Cook

On Sat, Nov 10, 2012 at 9:59 AM, Jan Owoc jso...@gmail.com wrote:

 On Sat, Nov 10, 2012 at 8:48 AM, Jan Owoc jso...@gmail.com wrote:
  On Sat, Nov 10, 2012 at 8:14 AM, Trond Michelsen tron...@gmail.com
 wrote:
  When I try to replace the old drive, I get this error:
 
  # zpool replace tank c4t5000C5002AA2F8D6d0 c4t5000C5004DE863F2d0
  cannot replace c4t5000C5002AA2F8D6d0 with c4t5000C5004DE863F2d0:
  devices have different sector alignment
 
 
  How can I replace the drive without migrating all the data to a
  different pool? It is possible, I hope?
 
  I had the same problem. I tried copying the partition layout and some
  other stuff but without success. I ended up having to recreate the
  pool and now have a non-mirrored root fs.
 
  If anyone has figured out how to mirror drives after getting the
  message about sector alignment, please let the list know :-).

 Sorry... my question was partly answered by Jim Klimov on this list:
 http://openindiana.org/pipermail/openindiana-discuss/2012-June/008546.html

 Apparently the currently-suggested way (at least in OpenIndiana) is to:
 1) create a zpool on the 4k-native drive
 2) zfs send | zfs receive the data
 3) mirror back onto the non-4k drive

 I can't test it at the moment on my setup - has anyone tested this to work?

 Jan



That would absolutely work, but it's not really a fix for this situation.
 For OP to do this he'd need 42 new drives (or at least enough drives to
provide the same capacity as what he's using) to mirror to and then mirror
back.  The only way this is happening for most people is if they only have
a very small pool, and have the ability to add an equal amount of storage
to dump to.  Probably not a big deal if you've only got a handful of
drives, or if the drives you have are small and you can take downtime.
 Likely impossible for OP with 42 large drives.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] FC HBA for openindiana

2012-10-21 Thread Tim Cook

On Sun, Oct 21, 2012 at 1:41 PM, Erik Trimble tr...@netdemons.com wrote:

  Do make sure you're getting one that has the proper firmware.

 Those with BIOS don't work in SPARC boxes, and those with OpenBoot don't
 work in x64 stuff.

 A quick Sun FC HBA search on ebay turns up a whole list of stuff that's
 official Sun HBAs, which will give you an idea of the (max) pricing
 you'll be paying.

 There's currently a *huge* price difference between the 4Gb and 2Gb
 adapters.  Also, keep in mind that PCI-X adapters are far more common at
 the 1/2Gb range, while PCI-E starts to be the most common choice at 4Gb+

 Here's a list of all the old Sun FC HBAs (which can help you sort out
 which are for x64 systems, and which were for SPARC systems):


 http://www.oracle.com/technetwork/documentation/oracle-storage-networking-190061.html

 As Tim said, these should all have built-in drivers in the Illumos
 codebase.


 -Erik




The only ones that have that limitation are the Sun OEM cards.  If you buy
a QLogic retail card you can use it in either system.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] [zfs] portable zfs send streams (preview webrev)

2012-10-20 Thread Tim Cook

On Sat, Oct 20, 2012 at 2:54 AM, Arne Jansen sensi...@gmx.net wrote:

 On 10/20/2012 01:10 AM, Tim Cook wrote:
 
 
  On Fri, Oct 19, 2012 at 3:46 PM, Arne Jansen sensi...@gmx.net
  mailto:sensi...@gmx.net wrote:
 
  On 10/19/2012 09:58 PM, Matthew Ahrens wrote:
   On Wed, Oct 17, 2012 at 5:29 AM, Arne Jansen sensi...@gmx.net
  mailto:sensi...@gmx.net
   mailto:sensi...@gmx.net mailto:sensi...@gmx.net wrote:
  
   We have finished a beta version of the feature. A webrev for it
   can be found here:
  
   http://cr.illumos.org/~webrev/sensille/fits-send/
  
   It adds a command 'zfs fits-send'. The resulting streams can
   currently only be received on btrfs, but more receivers will
   follow.
   It would be great if anyone interested could give it some
 testing
   and/or review. If there are no objections, I'll send a formal
   webrev soon.
  
  
  
   Please don't bother changing libzfs (and proliferating the
 copypasta
   there) -- do it like lzc_send().
  
 
  ok. It would be easier though if zfs_send would also already use the
  new style. Is it in the pipeline already?
 
   Likewise, zfs_ioc_fits_send should use the new-style API.  See the
   comment at the beginning of zfs_ioctl.c.
  
   I'm not a fan of the name FITS but I suppose somebody else
 already
   named the format.  If we are going to follow someone else's format
   though, it at least needs to be well-documented.  Where can we
  find the
   documentation?
  
   FYI, #1 google hit for FITS:  http://en.wikipedia.org/wiki/FITS
   #3 hit:  http://code.google.com/p/fits/
  
   Both have to do with file formats.  The entire first page of google
   results for FITS format and FITS file format are related to
 these
   two formats.  FITS btrfs didn't return anything specific to the
 file
   format, either.
 
  It's not too late to change it, but I have a hard time coming up with
  some better name. Also, the format is still very new and I'm sure
 it'll
  need some adjustments.
 
  -arne
 
  
   --matt
 
 
 
  I'm sure we can come up with something.  Are you planning on this being
  solely for ZFS, or a larger architecture for replication both directions
  in the future?

 We have senders for zfs and btrfs. The planned receiver will be mostly
 filesystem agnostic and can work on a much broader range. It basically
 only needs to know how to create snapshots and where to store a few
 meta informations.
 It would be great if more filesystems would join on the sending side,
 but I have no involvement there.

 I see no basic problem in choosing a name that's already in use.
 Especially with file extensions most will be already taken. How about
 something with 'portable' and 'backup', like pib or pibs? 'i' for
 incremental.

 -Arne


Re-using names generally isn't a big deal, but in this case the existing
name is a technology that's extremely similar to what you're doing - which
WILL cause a ton of confusion in the userbase, and make troubleshooting far
more difficult when searching google/etc looking for links to documents
that are applicable.

Maybe something like far - filesystem agnostic replication?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] FC HBA for openindiana

2012-10-20 Thread Tim Cook

The built in drivers support Mpha so you're good to go.

On Friday, October 19, 2012, Christof Haemmerle wrote:

 Yep i Need. 4 Gig with multipathing if possible.

 On Oct 19, 2012, at 10:34 PM, Tim Cook t...@cook.ms javascript:_e({},
 'cvml', 't...@cook.ms'); wrote:



 On Friday, October 19, 2012, Christof Haemmerle wrote:

 hi there,
 i need to connect some old raid subsystems to a opensolaris box via fibre
 channel. can you recommend any FC HBA?

 thanx
 __



 How old?  If its 1gbit you'll need a 4gb or slower hba. Qlogic would be my
 preference. You should be able to find a 2340 for cheap on eBay.  Or a 2460
 if you want 4gb.



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] [zfs] portable zfs send streams (preview webrev)

2012-10-19 Thread Tim Cook

On Fri, Oct 19, 2012 at 3:46 PM, Arne Jansen sensi...@gmx.net wrote:

 On 10/19/2012 09:58 PM, Matthew Ahrens wrote:
  On Wed, Oct 17, 2012 at 5:29 AM, Arne Jansen sensi...@gmx.net
  mailto:sensi...@gmx.net wrote:
 
  We have finished a beta version of the feature. A webrev for it
  can be found here:
 
  http://cr.illumos.org/~webrev/sensille/fits-send/
 
  It adds a command 'zfs fits-send'. The resulting streams can
  currently only be received on btrfs, but more receivers will
  follow.
  It would be great if anyone interested could give it some testing
  and/or review. If there are no objections, I'll send a formal
  webrev soon.
 
 
 
  Please don't bother changing libzfs (and proliferating the copypasta
  there) -- do it like lzc_send().
 

 ok. It would be easier though if zfs_send would also already use the
 new style. Is it in the pipeline already?

  Likewise, zfs_ioc_fits_send should use the new-style API.  See the
  comment at the beginning of zfs_ioctl.c.
 
  I'm not a fan of the name FITS but I suppose somebody else already
  named the format.  If we are going to follow someone else's format
  though, it at least needs to be well-documented.  Where can we find the
  documentation?
 
  FYI, #1 google hit for FITS:  http://en.wikipedia.org/wiki/FITS
  #3 hit:  http://code.google.com/p/fits/
 
  Both have to do with file formats.  The entire first page of google
  results for FITS format and FITS file format are related to these
  two formats.  FITS btrfs didn't return anything specific to the file
  format, either.

 It's not too late to change it, but I have a hard time coming up with
 some better name. Also, the format is still very new and I'm sure it'll
 need some adjustments.

 -arne

 
  --matt



I'm sure we can come up with something.  Are you planning on this being
solely for ZFS, or a larger architecture for replication both directions in
the future?

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] FC HBA for openindiana

2012-10-19 Thread Tim Cook

On Friday, October 19, 2012, Christof Haemmerle wrote:

 hi there,
 i need to connect some old raid subsystems to a opensolaris box via fibre
 channel. can you recommend any FC HBA?

 thanx
 __



How old?  If its 1gbit you'll need a 4gb or slower hba. Qlogic would be my
preference. You should be able to find a 2340 for cheap on eBay.  Or a 2460
if you want 4gb.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Best way to measure performance of ZIL

2012-10-01 Thread Tim Swast

On 10/01/2012 09:09 AM, Edward Ned Harvey 
(opensolarisisdeadlongliveopensolaris) wrote:
Just perform a bunch of writes, time it. Then set sync=disabled, 
perform the same set of writes, time it. Then enable sync, add a ZIL 
device, time it. The third option will be somewhere in between the 
first two. 


To perform a bunch of writes, vdbench is a very useful tool.

https://blogs.oracle.com/henk/entry/vdbench_a_disk_and_tape
http://sourceforge.net/projects/vdbench/files/vdbench503beta/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] vm server storage mirror

2012-09-27 Thread Tim Cook

On Thu, Sep 27, 2012 at 12:48 PM, Edward Ned Harvey
(opensolarisisdeadlongliveopensolaris) 
opensolarisisdeadlongliveopensola...@nedharvey.com wrote:

  From: Tim Cook [mailto:t...@cook.ms]
  Sent: Wednesday, September 26, 2012 3:45 PM
 
  I would suggest if you're doing a crossover between systems, you use
  infiniband rather than ethernet.  You can eBay a 40Gb IB card for under
  $300.  Quite frankly the performance issues should become almost a non-
  factor at that point.

 I like that idea too - but I thought IB couldn't do crossover.  I thought
 a switch is required?



Crossover should be fine as long as you have a subnet manager on one of the
hosts.  Now you're going to ask me where you can get a subnet manager for
illumos/solaris/whatever, and I'm going to have to plead the fifth because
I haven't looked into it.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] vm server storage mirror

2012-09-26 Thread Tim Cook

On Wed, Sep 26, 2012 at 12:54 PM, Edward Ned Harvey
(opensolarisisdeadlongliveopensolaris) 
opensolarisisdeadlongliveopensola...@nedharvey.com wrote:

  Here's another one.

 ** **

 Two identical servers are sitting side by side.  They could be connected
 to each other via anything (presently using crossover ethernet cable.)  And
 obviously they both connect to the regular LAN.  You want to serve VM's
 from at least one of them, and even if the VM's aren't fault tolerant, you
 want at least the storage to be live synced.  The first obvious thing to
 do is simply cron a zfs send | zfs receive at a very frequent interval.  But
 there are a lot of downsides to that - besides the fact that you have to
 settle for some granularity, you also have a script on one system that will
 clobber the other system.  So in the event of a failure, you might
 promote the backup into production, and you have to be careful not to let
 it get clobbered when the main server comes up again.

 ** **

 I like much better, the idea of using a zfs mirror between the two
 systems.  Even if it comes with a performance penalty, as a result of
 bottlenecking the storage onto Ethernet.  But there are several ways to
 possibly do that, and I'm wondering which will be best.

 ** **

 Option 1:  Each system creates a big zpool of the local storage.  Then,
 create a zvol within the zpool, and export it iscsi to the other system.  Now
 both systems can see a local zvol, and a remote zvol, which it can use to
 create a zpool mirror.  The reasons I don't like this idea are because
 it's a zpool within a zpool, including the double-checksumming and
 everything.  But the double-checksumming isn't such a concern to me - I'm
 mostly afraid some horrible performance or reliability problem might be
 resultant.  Naturally, you would only zpool import the nested zpool on
 one system.  The other system would basically just ignore it.  But in the
 event of a primary failure, you could force import the nested zpool on
 the secondary system.

 ** **

 Option 2:  At present, both systems are using local mirroring ,3 mirror
 pairs of 6 disks.  I could break these mirrors, and export one side over
 to the other system...  And vice versa.  So neither server will be doing
 local mirroring; they will both be mirroring across iscsi to targets on
 the other host.  Once again, each zpool will only be imported on one
 host, but in the event of a failure, you could force import it on the other
 host.

 ** **

 Can anybody think of a reason why Option 2 would be stupid, or can you
 think of a better solution?




I would suggest if you're doing a crossover between systems, you use
infiniband rather than ethernet.  You can eBay a 40Gb IB card for under
$300.  Quite frankly the performance issues should become almost a
non-factor at that point.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Question about ZFS snapshots

2012-09-20 Thread Tim Cook

On Fri, Sep 21, 2012 at 12:05 AM, Stefan Ring stefan...@gmail.com wrote:

 On Fri, Sep 21, 2012 at 6:31 AM, andy thomas a...@time-domain.co.uk
 wrote:
  I have a ZFS filseystem and create weekly snapshots over a period of 5
 weeks
  called week01, week02, week03, week04 and week05 respectively. Ny
 question
  is: how do the snapshots relate to each other - does week03 contain the
  changes made since week02 or does it contain all the changes made since
 the
  first snapshot, week01, and therefore includes those in week02?

 Every snapshot is based on the previous one and store only what is
 needed to capture the differences.

  To rollback to week03, it's necesaary to delete snapshots week04 and
 week05
  first but what if week01 and week02 have also been deleted - will the
  rollback still work or is it ncessary to keep earlier snapshots?

 No, it's not necessary. You can rollback to any snapshot.

 I almost never use rollback though, in normal use. If I've
 accidentally deleted or overwritten something, I just rsync it over
 from the corresponding /.zfs/snapshots directory. Only if what I want
 to restore is huge, rollback might be a better option.


I wasn't going to jump into this quagmire, but I will.  To the second
question, if you've got snaps 1-5, and you roll back to snap 3, you will
lose snaps 4 and 5.  As part of the rollback, they will be discarded.  As
will any other changes made since snap 3.  If you delete snap 1 or snap 2,
any blocks they have in common with snapshot 3 will be retained, you will
simply see snap 3 grow because those blocks will now be accounted for
under snap 3 instead of snap 1 or snap 2.  Any blocks that were not shared
with snap 3 will be discarded.

Another point since you seem to be new to snapshots that I'll illustrate
with an example.

Say you've got snap 1, and in snap 1 you've got file 1.  File 1 is made up
of 20 blocks.  If you overwrite blocks 1-10 of file 1 50 times before you
take snapshot 2, snapshot 2 will only capture the final state of the file.
 You will not get 50 revisions of the file.  This is not continuous data
protection it's a point in time copy.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] all in one server

2012-09-18 Thread Tim Cook

On Tue, Sep 18, 2012 at 2:02 PM, Bob Friesenhahn 
bfrie...@simple.dallas.tx.us wrote:

 On Tue, 18 Sep 2012, Erik Ableson wrote:


 The bigger issue you'll run into will be data sizing as a year's worth of
 snapshot basically means that you're keeping a journal of every single
 write that's occurred over the year. If you are running


 The above is not a correct statement.  The snapshot only preserves the
 file-level differences between the points in time.  A snapshot does not
 preserve every single write.  Zfs does not even send every single write
 to underlying disk.  In some usage models, the same file may be re-written
 100 times between snapshots, or might not ever appear in any snapshot.




Depending on how frequently you're taking snapshots, your change rate, and
how long you keep the snapshots around, it may very well be true.  It's not
universally true, but it's also no universally false.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZIL devices and fragmentation

2012-07-30 Thread Tim Cook

On Mon, Jul 30, 2012 at 12:44 PM, Richard Elling
richard.ell...@gmail.comwrote:

 On Jul 30, 2012, at 10:20 AM, Roy Sigurd Karlsbakk wrote:

 - Opprinnelig melding -

 On Mon, Jul 30, 2012 at 9:38 AM, Roy Sigurd Karlsbakk

 r...@karlsbakk.net wrote:

 Also keep in mind that if you have an SLOG (ZIL on a separate

 device), and then lose this SLOG (disk crash etc), you will

 probably

 lose the pool. So if you want/need SLOG, you probably want two of

 them in a mirror…


 That's only true on older versions of ZFS. ZFSv19 (or 20?) includes

 the ability to import a pool with a failed/missing log device. You

 lose any data that is in the log and not in the pool, but the pool

 is

 importable.


 Are you sure? I booted this v28 pool a couple of months back, and

 found it didn't recognize its pool, apparently because of a missing

 SLOG. It turned out the cache shelf was disconnected, after

 re-connecting it, things worked as planned. I didn't try to force a

 new import, though, but it didn't boot up normally, and told me it

 couldn't import its pool due to lack of SLOG devices.


 Positive. :) I tested it with ZFSv28 on FreeBSD 9-STABLE a month or

 two ago. See the updated man page for zpool, especially the bit about

 import -m. :)


 On 151a2, man page just says 'use this or that mountpoint' with import -m,
 but the fact was zpool refused to import the pool at boot when 2 SLOG
 devices (mirrored) and 10 L2ARC devices were offline. Should OI/Illumos be
 able to boot cleanly without manual action with the SLOG devices gone?


 No. Missing slogs is a potential data-loss condition. Importing the pool
 without
 slogs requires acceptance of the data-loss -- human interaction.
  -- richard

 --
 ZFS Performance and Training
 richard.ell...@richardelling.com
 +1-760-896-4422



I would think a flag to allow you to automatically continue with a
disclaimer might be warranted (default behavior obviously requiring human
input).

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] New fast hash algorithm - is it needed?

2012-07-12 Thread Tim Cook

On Thu, Jul 12, 2012 at 9:14 AM, Edward Ned Harvey 
opensolarisisdeadlongliveopensola...@nedharvey.com wrote:

  From: Jim Klimov [mailto:jimkli...@cos.ru]
  Sent: Thursday, July 12, 2012 8:42 AM
  To: Edward Ned Harvey
  Subject: Re: [zfs-discuss] New fast hash algorithm - is it needed?

  2012-07-11 18:03, Edward Ned Harvey пишет:
   From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
   boun...@opensolaris.org] On Behalf Of Sašo Kiselkov

   As your dedup
   ratio grows, so does the performance hit from dedup=verify. At, say,
   dedupratio=10.0x, on average, every write results in 10 reads.

   Why?
   If you intend to write a block, and you discover it has a matching
 checksum,
   you only need to verify it matches one block.  You don't need to
 verify it
   matches all the other blocks that have previously been verified to
 match
   each other.

  As Saso explained, if you wrote the same block 10 times
  and detected it was already deduped, then by verifying
  this detection 10 times you get about 10 extra reads.

 (In this case, Jim, you wrote me off-list and I replied on-list, but in
 this case, I thought it would be ok because this message doesn't look
 private or exclusive to me.  I apologize if I was wrong.)

 I get the miscommunication now -

 When you write the duplicate block for the 10th time, we all understand
 you're not going to go back and verify 10 blocks.  (It seemed, at least to
 me, that's what Saso was saying.  Which is why I told him, No you don't.)

 You're saying, that when you wrote the duplicate block the 2nd time, you
 verified... And then when you wrote it the 3rd time, you verified...  And
 the 4th time, and the 5th time...   By the time you write the 10th time,
 you have already verified 9 previous times, but you're still going to
 verify again.

 Normally you would expect writing duplicate blocks dedup'd to be faster
 than writing the non-dedup'd data, because you get to skip the actual
 write.  (This idealistic performance gain might be pure vapor due to need
 to update metadata, but ignoring that technicality, continue
 hypothetically...)  When you verify, supposedly you're giving up that
 hypothetical performance gain, because you have to do a read instead of a
 write.  So at first blush, it seems like no net-gain for performance.  But
 if the duplicate block gets written frequently (for example a block of all
 zeros) then there's a high probability the duplicate block is already in
 ARC, so you can actually skip reading from disk and just read from RAM.

 So, the 10 extra reads will sometimes be true - if the duplicate block
 doesn't already exist in ARC.  And the 10 extra reads will sometimes be
 false - if the duplicate block is already in ARC.

Sasso: yes, it's absolutely worth implementing a higher performing hashing
algorithm.  I'd suggest simply ignoring the people that aren't willing to
acknowledge basic mathematics rather than lashing out.  No point in feeding
the trolls.  The PETABYTES of data Quantum and DataDomain have out there
are proof positive that complex hashes get the job done without verify,
even if you don't want to acknowledge the math behind it.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Two disks giving errors in a raidz pool, advice needed

2012-04-24 Thread Tim Cook

On Tue, Apr 24, 2012 at 12:16 AM, Matt Breitbach
matth...@flash.shanje.comwrote:

 So this is a point of debate that probably deserves being brought to the
 floor (probably for the umpteenth time, but indulge me).  I've heard from
 several people that I'd consider experts that once per year scrubbing is
 sufficient, once per quarter is _possibly_ excessive, and once a week is
 downright overkill.  Since scrub thrashes your disk, I'd like to avoid it
 if
 at all possible.

 My opinion is that it depends on the data.  If it's all data at rest, ZFS
 can't correct bit-rot if it's not read out on a regular interval.

 My biggest question on this?  How often does bit-rot occur on media that
 isn't read or written to excessively, but just spinning most of the day and
 only has 10-20GB physically read from the spindles daily?  We all know as
 data ages, it gets accessed less and less frequently.  At what point should
 you be scrubbing that old data every few weeks to make sure a bit or two
 hasn't flipped?

 FYI - I personally scrub once per month.  Probably overkill for my data,
 but
 I'm paranoid like that.

 -Matt



 -Original Message-


 How often do you normally run a scrub, before this happened?  It's
 possible they were accumulating for a while but went undetected for
 lack of read attempts to the disk.  Scrub more often!

 --
 Dan.





Personally unless the dataset is huge and you're using z3, I'd be scrubbing
once a week.  Even if it's z3, just do a window on Sunday's or something so
that you at least make it through the whole dataset at least once a month.

There's no reason NOT to scrub that I can think of other than the overhead
- which shouldn't matter if you're doing it during off hours.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Aaron Toponce: Install ZFS on Debian GNU/Linux

2012-04-18 Thread Tim Cook

Oracle never promised anything.  A leaked internal memo does not signify an
official company policy or statement.
On Apr 18, 2012 11:13 AM, Freddie Cash fjwc...@gmail.com wrote:

 On Wed, Apr 18, 2012 at 7:54 AM, Cindy Swearingen
 cindy.swearin...@oracle.com wrote:
 Hmmm, how come they have encryption and we don't?
 
  As in Solaris releases, or some other we?

 I would guess he means Illumos, since it's mentioned in the very next
 sentence.  :)

 Hmmm, how come they have encryption and we don't?
 Can it be backported to illumos ...

 It's too bad Oracle hasn't followed through (yet?) with their promise
 to open-source the ZFS (and other CDDL-licensed?) code in Solaris 11.
 :(
 --
 Freddie Cash
 fjwc...@gmail.com
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Drive upgrades

2012-04-13 Thread Tim Cook

On Fri, Apr 13, 2012 at 9:35 AM, Edward Ned Harvey 
opensolarisisdeadlongliveopensola...@nedharvey.com wrote:

  From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
  boun...@opensolaris.org] On Behalf Of Michael Armstrong
 
  Is there a way to quickly ascertain if my seagate/hitachi drives are as
 large as
  the 2.0tb samsungs? I'd like to avoid the situation of replacing all
 drives and
  then not being able to grow the pool...

 It doesn't matter.  If you have a bunch of drives that are all approx the
 same size but vary slightly, and you make (for example) a raidz out of
 them,
 then the raidz will only be limited by the size of the smallest one.  So
 you
 will only be wasting 1% of the drives that are slightly larger.

 Also, given that you have a pool currently made up of 13x2T and 5x1T ... I
 presume these are separate vdev's.  You don't have one huge 18-disk raidz3,
 do you?  that would be bad.  And it would also mean that you're currently
 wasting 13x1T.  I assume the 5x1T are a single raidzN.  You can increase
 the
 size of these disks, without any cares about the size of the other 13.

 Just make sure you have the autoexpand property set.

 But most of all, make sure you do a scrub first, and make sure you complete
 the resilver in between each disk swap.  Do not pull out more than one disk
 (or whatever your redundancy level is) while it's still resilvering from
 the
 previously replaced disk.  If you're very thorough, you would also do a
 scrub in between each disk swap, but if it's just a bunch of home movies
 that are replaceable, you will probably skip that step.



You will however have an issue replacing them if one should fail.  You need
to have the same block count to replace a device, which is why I asked for
a right-sizing years ago.  Deaf ears :/

--Tim




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Drive upgrades

2012-04-13 Thread Tim Cook

On Fri, Apr 13, 2012 at 11:46 AM, Freddie Cash fjwc...@gmail.com wrote:

 On Fri, Apr 13, 2012 at 9:30 AM, Tim Cook t...@cook.ms wrote:
  You will however have an issue replacing them if one should fail.  You
 need
  to have the same block count to replace a device, which is why I asked
 for a
  right-sizing years ago.  Deaf ears :/

 I thought ZFSv20-something added a if the blockcount is within 10%,
 then allow the replace to succeed feature, to work around this issue?

 --
 Freddie Cash
 fjwc...@gmail.com



That would be news to me.  I'd love to hear it's true though.  When I made
the request there was excuse after excuse about how it would be difficult
and Sun always provided replacement drives of identical size, etc (although
there were people who responded who in fact had received drives from Sun of
different sizes in RMA).  I was hoping now that the braintrust had moved on
from Sun that they'd embrace what I consider a common-sense decision, but I
don't think it's happened.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS and spread-spares (kinda like GPFS declustered RAID)?

2012-01-07 Thread Tim Cook

 of spread spares and/or declustered
  RAID would go into just making another write-block allocator
  in the same league raidz or mirror are nowadays...
  BTW, are such allocators pluggable (as software modules)?
 
  What do you think - can and should such ideas find their
  way into ZFS? Or why not? Perhaps from theoretical or
  real-life experience with such storage approaches?
 
  //Jim Klimov
 
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

 --

 ZFS and performance consulting
 http://www.RichardElling.com
 illumos meetup, Jan 10, 2012, Menlo Park, CA
 http://www.meetup.com/illumos-User-Group/events/41665962/



As always, feel free to tell me why my rant is completely off base ;)

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] S11 vs illumos zfs compatiblity

2012-01-05 Thread Tim Cook

On Thu, Jan 5, 2012 at 9:32 AM, Richard Elling richard.ell...@gmail.comwrote:

 On Jan 5, 2012, at 6:53 AM, sol wrote:
  if a bug fixed in Illumos is never reported to Oracle by a customer,
  it would likely never get fixed in Solaris either
 
 
  :-(
 
  I would have liked to think that there was some good-will between the
 ex- and current-members of the zfs team, in the sense that the people who
 created zfs but then left Oracle still care about it enough to want the
 Oracle version to be as bug-free as possible.

 There is good-will between the developers. And the ZFS working group has
 representatives
 currently employed by Oracle. However, Oracle is a lawnmower.
 http://www.youtube.com/watch?v=-zRN7XLCRhc

 
  (Obviously I don't expect this to be the case for developers of all
 software but I think filesystem developers are a special breed!)

 They are! And there are a lot of really cool things happening in the wild
 as well as behind
 Oracle's closed doors.
  -- richard

 --

 ZFS and performance consulting
 http://www.RichardElling.com
 illumos meetup, Jan 10, 2012, Menlo Park, CA
 http://www.meetup.com/illumos-User-Group/events/41665962/





Speaking of illumos, what exactly is the deal with the zfs discuss mailing
list?  There's all of 3 posts that show up for all of 2011.  Am I missing
something, or is there just that little traction currently?
http://www.listbox.com/member/archive/182191/sort/time_rev/

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS Recovery: What do I try next?

2011-12-22 Thread Tim Cook

On Thu, Dec 22, 2011 at 10:00 AM, Myers Carpenter my...@maski.org wrote:

On Sat, Nov 5, 2011 at 2:35 PM, Myers Carpenter my...@maski.org wrote:

I would like to pick the brains of the ZFS experts on this list: What
would you do next to try and recover this zfs pool?

I hate running across threads that ask a question and the person that
asked them never comes back to say what they eventually did, so...

To summarize: In late October I had two drives fail in a raidz1 pool. I
was able to recover all the data from one drive, but the other could not be
seen by the controller. Trying to zpool import was not working. I had 3
of the 4 drives, why couldn't I mount this.

I read about every option in zdb and tried ones that might tell me
something more about what was on this recovered drive. I eventually hit on

zdb -p devs -e -lu /bank4/hd/devs/loop0

where /bank4/hd/devs/loop0 was a symlink back to /dev/loop0 where I had
setup the disk image of the recovered drive.

This showed the uberblocks which looked like this:

Uberblock[1]
magic = 00bab10c
version = 26
txg = 23128193
guid_sum = 13396147021153418877
timestamp = 1316987376 UTC = Sun Sep 25 17:49:36 2011
rootbp = DVA[0]=0:2981f336c00:400 DVA[1]=0:1e8dcc01400:400
DVA[2]=0:3b16a3dd400:400 [L0 DMU objset] fletcher4 lzjb LE contiguous
unique triple size=800L/200P birth=23128193L/23128193P fill=255
cksum=136175e0a4:79b27ae49c7:1857d594ca833:34ec76b965ae40

Then it all came clear: This drive had encountered errors one month before
the other drive had failed and zfs had stopped writing to it.

So the lesson here: Don't be a dumbass like me. Setup up nagios or some
other system to alert you when a pool has become degraded. ZFS works very
well with one drive out of the array, you aren't probably going to notice
problems unless you are proactively looking for them.

myers

Or, if you aren't scrubbing on a regular basis, just change your zpool
failmode property. Had you set it to wait or panic, it would've been very
clear, very quickly that something was wrong.
http://prefetch.net/blog/index.php/2008/03/01/configuring-zfs-to-gracefully-deal-with-failures/

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Can I create a mirror for a root rpool?

2011-12-15 Thread Tim Cook

Do you still need to do the grub install?
On Dec 15, 2011 5:40 PM, Cindy Swearingen cindy.swearin...@oracle.com
wrote:

 Hi Anon,

 The disk that you attach to the root pool will need an SMI label
 and a slice 0.

 The syntax to attach a disk to create a mirrored root pool
 is like this, for example:

 # zpool attach rpool c1t0d0s0 c1t1d0s0

 Thanks,

 Cindy

 On 12/15/11 16:20, Anonymous Remailer (austria) wrote:


 On Solaris 10 If I install using ZFS root on only one drive is there a way
 to add another drive as a mirror later? Sorry if this was discussed
 already. I searched the archives and couldn't find the answer. Thank you.
 __**_
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/**mailman/listinfo/zfs-discusshttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss

 __**_
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/**mailman/listinfo/zfs-discusshttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] slow zfs send/recv speed

2011-11-15 Thread Tim Cook

On Tue, Nov 15, 2011 at 5:17 PM, Andrew Gabriel
andrew.gabr...@oracle.comwrote:

On 11/15/11 23:05, Anatoly wrote:

Good day,

The speed of send/recv is around 30-60 MBytes/s for initial send and
17-25 MBytes/s for incremental. I have seen lots of setups with 1 disk to
100+ disks in pool. But the speed doesn't vary in any degree. As I
understand 'zfs send' is a limiting factor. I did tests by sending to
/dev/null. It worked out too slow and absolutely not scalable.
None of cpu/memory/disk activity were in peak load, so there is of room
for improvement.

Is there any bug report or article that addresses this problem? Any
workaround or solution?

I found these guys have the same result - around 7 Mbytes/s for 'send'
and 70 Mbytes for 'recv'.
http://wikitech-static.**wikimedia.org/articles/z/f/s/**
Zfs_replication.htmlhttp://wikitech-static.wikimedia.org/articles/z/f/s/Zfs_replication.html

Well, if I do a zfs send/recv over 1Gbit ethernet from a 2 disk mirror,
the send runs at almost 100Mbytes/sec, so it's pretty much limited by the
ethernet.

Since you have provided none of the diagnostic data you collected, it's
difficult to guess what the limiting factor is for you.

--
Andrew Gabriel

So all the bugs have been fixed? I seem to recall people on this mailing
list using mbuff to speed it up because it was so bursty and slow at one
point. IE:
http://blogs.everycity.co.uk/alasdair/2010/07/using-mbuffer-to-speed-up-slow-zfs-send-zfs-receive/

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] about btrfs and zfs

On Tue, Oct 18, 2011 at 11:46 AM, Mark Sandrock mark.sandr...@oracle.comwrote:


 On Oct 18, 2011, at 11:09 AM, Nico Williams wrote:

  On Tue, Oct 18, 2011 at 9:35 AM, Brian Wilson wrote:
  I just wanted to add something on fsck on ZFS - because for me that used
 to
  make ZFS 'not ready for prime-time' in 24x7 5+ 9s uptime environments.
  Where ZFS doesn't have an fsck command - and that really used to bug me
 - it
  does now have a -F option on zpool import.  To me it's the same
  functionality for my environment - the ability to try to roll back to a
  'hopefully' good state and get the filesystem mounted up, leaving the
  corrupted data objects corrupted.  [...]
 
  Yes, that's exactly what it is.  There's no point calling it fsck
  because fsck fixes individual filesystems, while ZFS fixups need to
  happen at the volume level (at volume import time).
 
  It's true that this should have been in ZFS from the word go.  But
  it's there now, and that's what matters, IMO.

 Doesn't a scrub do more than what
 'fsck' does?


Not really.  fsck will work on an offline filesystem to correct errors and
bring it back online.  Scrub won't even work until the filesystem is already
imported and online. If it's corrupted you can't even import it, hence the
-F flag addition.  Plus, IIRC, scrub won't actually correct any errors, it
will only flag them.  Manually fixing what scrub finds can be a giant pain.



 
  It's also true that this was never necessary with hardware that
  doesn't lie, but it's good to have it anyways, and is critical for
  personal systems such as laptops.

 IIRC, fsck was seldom needed at
 my former site once UFS journalling
 became available. Sweet update.

 Mark



We all hope to never have to run fsck, but not having it at all is a bit of
a non-starter in most environments.


--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] about btrfs and zfs

On Tue, Oct 18, 2011 at 2:41 PM, Kees Nuyt k.n...@zonnet.nl wrote:

 On Tue, 18 Oct 2011 12:05:29 -0500, Tim Cook t...@cook.ms wrote:

  Doesn't a scrub do more than what
  'fsck' does?
 
  Not really.  fsck will work on an offline filesystem to correct errors
 and
  bring it back online.  Scrub won't even work until the filesystem is
 already
  imported and online. If it's corrupted you can't even import it, hence
 the
  -F flag addition.  Plus, IIRC, scrub won't actually correct any errors,
 it
  will only flag them.  Manually fixing what scrub finds can be a giant
 pain.

 IIRC Scrub will correct errors if the pool has sufficient
 redundancy. So will any read of a corrupted block.

 http://hub.opensolaris.org/bin/view/Community+Group+zfs/selfheal
 --
  (  Kees Nuyt
  )
 c[_]



Every scrub I've ever done that has found an error required manual fixing.
 Every pool I've ever created has been raid-z or raid-z2, so the silent
healing, while a great story, has never actually happened in practice in any
environment I've used ZFS in.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] about btrfs and zfs

On Tue, Oct 18, 2011 at 3:06 PM, Peter Tribble peter.trib...@gmail.comwrote:

 On Tue, Oct 18, 2011 at 8:52 PM, Tim Cook t...@cook.ms wrote:
 
  Every scrub I've ever done that has found an error required manual
 fixing.
   Every pool I've ever created has been raid-z or raid-z2, so the silent
  healing, while a great story, has never actually happened in practice in
 any
  environment I've used ZFS in.

 You have, of course, reported each such failure, because if that
 was indeed the case then it's a clear and obvious bug?

 For what it's worth, I've had ZFS repair data corruption on
 several occasions - both during normal operation and as a
 result of a scrub, and I've never had to intervene manually.

 --
 -Peter Tribble
 http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/



Given that there  are guides on how to manually fix the corruption, I don't
see any need to report it.  It's considered acceptable and expected behavior
from everyone I've talked to at Sun...
http://dlc.sun.com/osol/docs/content/ZFSADMIN/gbbwl.html

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] about btrfs and zfs

On Tue, Oct 18, 2011 at 3:27 PM, Peter Tribble peter.trib...@gmail.comwrote:

 On Tue, Oct 18, 2011 at 9:12 PM, Tim Cook t...@cook.ms wrote:
 
 
  On Tue, Oct 18, 2011 at 3:06 PM, Peter Tribble peter.trib...@gmail.com
  wrote:
 
  On Tue, Oct 18, 2011 at 8:52 PM, Tim Cook t...@cook.ms wrote:
  
   Every scrub I've ever done that has found an error required manual
   fixing.
Every pool I've ever created has been raid-z or raid-z2, so the
 silent
   healing, while a great story, has never actually happened in practice
 in
   any
   environment I've used ZFS in.
 
  You have, of course, reported each such failure, because if that
  was indeed the case then it's a clear and obvious bug?
 
  For what it's worth, I've had ZFS repair data corruption on
  several occasions - both during normal operation and as a
  result of a scrub, and I've never had to intervene manually.
 
  --
  -Peter Tribble
  http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
 
 
  Given that there  are guides on how to manually fix the corruption, I
 don't
  see any need to report it.  It's considered acceptable and expected
 behavior
  from everyone I've talked to at Sun...
  http://dlc.sun.com/osol/docs/content/ZFSADMIN/gbbwl.html

 If you have adequate redundancy, ZFS will - and does -
 repair errors. The document you quote is for the case
 where you don't actually have adequate redundancy: ZFS
 will refuse to make up data for you, and report back where
 the problem was. Exactly as designed.

 (And yes, I've come across systems without redundant
 storage, or had multiple simultaneous failures. The original
 statement was that if you have redundant copies of the data
 or, in the case of raidz, enough information to reconstruct
 it, then ZFS will repair it for you. Which has been exactly in
 accord with my experience.)

 --
 -Peter Tribble
 http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/




I had and have redundant storage, it has *NEVER* automatically fixed it.
 You're the first person I've heard that has had it automatically fix it.
Per the page or an unlikely series of events conspired to corrupt multiple
copies of a piece of data.

Their unlikely series of events, that goes unnamed, is not that unlikely in
my experience.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Wanted: sanity check for a clustered ZFS idea

2011-10-15 Thread Tim Cook

 into understanding
 my idea better. And yes, I do also think that channeling
 disk over ethernet via one of the servers is a bad thing
 bound to degrade performance as opposed to what can
 be had anyway with direct disk access.

  Ethernet has *always* been faster than a HDD. Even back when we had 3/180s
 10Mbps Ethernet it was faster than the 30ms average access time for the
 disks of
 the day. I tested a simple server the other day and round-trip for 4KB of
 data on a
 busy 1GbE switch was 0.2ms. Can you show a HDD as fast? Indeed many SSDs
 have trouble reaching that rate under load.


 As noted by other posters, access times are not bandwidth.
 So these are two different faster's ;) Besides, (1Gbps)
 Ethernet is faster than a single HDD stream. But it is not
 quite faster than an array of 14HDDs...

 And if Ethernet is utilized by its direct tasks - whatever they
 be, say video streaming off this server to 5000 viewers or
 whatever is needed to saturate the network, disk access
 over the same ethernet link would have to compete. And
 whatever the QoS settings, viewers would lose - either the
 real-time multimedia signal would lag, or the disk data to
 feed it.

 Moreover, usage of an external NAS (a dedicated server
 with Ethernet connection to the blade chassis) would make
 an external box dedicated and perhaps optimized to storage
 tasks (i.e. with ZIL/L2ARC), and would free up a blade for
 VM farming needs, but it would consume much of the LAN
 bandwidth of the blades using its storage services.

  Today, HDDs aren't fast, and are not getting faster.
  -- richard

 Well, typical consumer disks did get about 2-3 times faster for
 linear RW speeds over the past decade; but for random access
 they do still lag a lot. So, agreed ;)

 //Jim



Quite frankly your choice in blade chassis was a horrible design decision.
 From your description of its limitations it should never be the building
block for a vmware cluster in the first place.  I would start by rethinking
that decision instead of trying to pound a round ZFS peg into a square hole.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Wanted: sanity check for a clustered ZFS idea

2011-10-14 Thread Tim Cook

 every drive that isn't being used to
boot an existing server to this solaris host as individual disks, and let
that server take care of RAID and presenting out the storage to the rests of
the vmware hosts.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!

2011-08-19 Thread Tim Cook

On Fri, Aug 19, 2011 at 4:43 AM, Stu Whitefish swhitef...@yahoo.com wrote:


  It seems that obtaining an Oracle support contract or a contract renewal
 is equally frustrating.

 I don't have any axe to grind with Oracle. I'm new to the Solaris thing and
 wanted to see if it was for me.

 If I was using this box to make money then sure I wouldn't have any problem
 paying for support. I don't expect
 handouts and I don't mind paying.

 I trusted ZFS because I heard it's for enterprise use and now I have 200G
 of data offline and not a peep from Oracle.
 Looking on the net I found another guy who had the same exact failure.

 To my way of thinking somebody needs to standup and get this fixed for us
 and make sure it doesn't happen to anybody
 else. If that happens I have no grudge against Oracle or Solaris. If it
 doesn't that's a pretty sour experience for someone
 to go through and it will definitely make me look at this whole thing in
 another light.

 I still believe somebody over there will do the right thing. I don't
 believe Oracle needs to hold people's data hostage to make money.
 I am sure they have enough good products and services to make money
 honestly.

 Jim



You digitally signed a license agreement stating the following:
*No Technical Support*
Our technical support organization will not provide technical support, phone
support, or updates to you for the Programs licensed under this agreement.

To turn around and keep repeating that they're holding your data hostage
is disingenuous at best.  Nobody is holding your data hostage.  You
voluntarily put it on an operating system that explicitly states doesn't
offer support from the parent company.  Nobody from Oracle is going to show
up with a patch for you on this mailing list because none of the Oracle
employees want to lose their job and subsequently be subjected to a
lawsuit.  If that's what you're planning on waiting for, I'd suggest you
take a new approach.

Sorry to be a downer, but that's reality.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS performance question over NFS

2011-08-18 Thread Tim Cook

What are the specs on the client?
On Aug 18, 2011 10:28 AM, Thomas Nau thomas@uni-ulm.de wrote:
 Dear all.
 We finally got all the parts for our new fileserver following several
 recommendations we got over this list. We use

 Dell R715, 96GB RAM, dual 8-core Opterons
 1 10GE Intel dual-port NIC
 2 LSI 9205-8e SAS controllers
 2 DataON DNS-1600 JBOD chassis
 46 Seagate constellation SAS drives
 2 STEC ZEUS RAM


 The base zpool config utilizes 42 drives plus the STECs as mirrored
 log devices. The Seagates are setup as a stripe of 7 times 6-drive-RAIDZ2
 junks plus as said a dedicated ZIL made of the mirrored STECs.

 As a quick'n dirty check we ran filebench with the fileserver
 workload. Running locally we get

 statfile1 5476ops/s 0.0mb/s 0.6ms/op 179us/op-cpu
 deletefile1 5476ops/s 0.0mb/s 1.0ms/op 454us/op-cpu
 closefile3 5476ops/s 0.0mb/s 0.0ms/op 5us/op-cpu
 readfile1 5476ops/s 729.5mb/s 0.2ms/op 128us/op-cpu
 openfile2 5477ops/s 0.0mb/s 0.8ms/op 204us/op-cpu
 closefile2 5477ops/s 0.0mb/s 0.0ms/op 5us/op-cpu
 appendfilerand1 5477ops/s 42.8mb/s 0.3ms/op 184us/op-cpu
 openfile1 5477ops/s 0.0mb/s 0.9ms/op 209us/op-cpu
 closefile1 5477ops/s 0.0mb/s 0.0ms/op 6us/op-cpu
 wrtfile1 5477ops/s 688.4mb/s 0.4ms/op 220us/op-cpu
 createfile1 5477ops/s 0.0mb/s 2.7ms/op 1068us/op-cpu



 with a single remote client (similar Dell System) using NFS

 statfile1 90ops/s 0.0mb/s 27.6ms/op 145us/op-cpu
 deletefile1 90ops/s 0.0mb/s 64.5ms/op 401us/op-cpu
 closefile3 90ops/s 0.0mb/s 25.8ms/op 40us/op-cpu
 readfile1 90ops/s 11.4mb/s 3.1ms/op 363us/op-cpu
 openfile2 90ops/s 0.0mb/s 66.0ms/op 263us/op-cpu
 closefile2 90ops/s 0.0mb/s 22.6ms/op 124us/op-cpu
 appendfilerand1 90ops/s 0.7mb/s 0.5ms/op 101us/op-cpu
 openfile1 90ops/s 0.0mb/s 72.6ms/op 269us/op-cpu
 closefile1 90ops/s 0.0mb/s 43.6ms/op 189us/op-cpu
 wrtfile1 90ops/s 11.2mb/s 0.2ms/op 211us/op-cpu
 createfile1 90ops/s 0.0mb/s 226.5ms/op 709us/op-cpu



 the same remote client with zpool sync disabled on the server

 statfile1 479ops/s 0.0mb/s 6.2ms/op 130us/op-cpu
 deletefile1 479ops/s 0.0mb/s 13.0ms/op 351us/op-cpu
 closefile3 480ops/s 0.0mb/s 3.0ms/op 37us/op-cpu
 readfile1 480ops/s 62.7mb/s 0.8ms/op 174us/op-cpu
 openfile2 480ops/s 0.0mb/s 14.1ms/op 235us/op-cpu
 closefile2 480ops/s 0.0mb/s 6.0ms/op 123us/op-cpu
 appendfilerand1 480ops/s 3.7mb/s 0.2ms/op 53us/op-cpu
 openfile1 480ops/s 0.0mb/s 13.7ms/op 235us/op-cpu
 closefile1 480ops/s 0.0mb/s 11.1ms/op 190us/op-cpu
 wrtfile1 480ops/s 60.3mb/s 0.2ms/op 233us/op-cpu
 createfile1 480ops/s 0.0mb/s 35.6ms/op 683us/op-cpu


 Disabling ZIL is no option but I expected a much better performance
 especially the ZEUS RAM only gets us a speed-up of about 1.8x

 Is this test realistic for a typical fileserver scenario or does it
require many
 more clients to push the limits?

 Thanks
 Thomas
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool import hangs any zfs-related programs, eats all RAM and dies in swapping hell

2011-06-14 Thread Tim Cook

On Tue, Jun 14, 2011 at 3:16 PM, Frank Van Damme
frank.vanda...@gmail.comwrote:

 2011/6/10 Tim Cook t...@cook.ms:
  While your memory may be sufficient, that cpu is sorely lacking.  Is it
 even
  64bit?  There's a reason intel couldn't give those things away in the
 early
  2000s and amd was eating their lunch.

 A Pentium 4 is 32-bit.

  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



EM64T was added to the Pentium 4 architecture with the D nomenclature,
which is what he has.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Q: pool didn't expand. why? can I force it?

2011-06-12 Thread Tim Cook

On Sun, Jun 12, 2011 at 3:54 AM, Johan Eliasson 
johan.eliasson.j...@gmail.com wrote:

 I replaced a smaller disk in my tank2, so now they're all 2TB. But look,
 zfs still thinks it's a pool of 1.5 TB disks:

 nebol@filez:~# zpool list tank2
 NAMESIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
 tank2  5.44T  4.20T  1.24T77%  1.00x  ONLINE  -

 nebol@filez:~# zpool status tank2
  pool: tank2
  state: ONLINE
  scrub: none requested
 config:

NAMESTATE READ WRITE CKSUM
tank2   ONLINE   0 0 0
  raidz1-0  ONLINE   0 0 0
c8t0d0  ONLINE   0 0 0
c8t1d0  ONLINE   0 0 0
c8t2d0  ONLINE   0 0 0
c8t3d0  ONLINE   0 0 0

 errors: No known data errors

 and:

   6. c8t0d0 ATA-ST2000DL003-9VT1-CC32-1.82TB
  /pci@0,0/pci8086,29f1@1/pci8086,32c@0/pci11ab,11ab@1/disk@0,0
   7. c8t1d0 ATA-ST2000DL003-9VT1-CC32-1.82TB
  /pci@0,0/pci8086,29f1@1/pci8086,32c@0/pci11ab,11ab@1/disk@1,0
   8. c8t2d0 ATA-ST2000DL003-9VT1-CC32-1.82TB
  /pci@0,0/pci8086,29f1@1/pci8086,32c@0/pci11ab,11ab@1/disk@2,0
   9. c8t3d0 ATA-ST2000DL003-9VT1-CC32-1.82TB

 So the question is, why didn't it expand? And can I fix it?


Autoexpand is likely turned off.
http://download.oracle.com/docs/cd/E19253-01/819-5461/githb/index.html

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS Hard link space savings

2011-06-12 Thread Tim Cook

On Sun, Jun 12, 2011 at 5:28 PM, Nico Williams n...@cryptonector.comwrote:

 On Sun, Jun 12, 2011 at 4:14 PM, Scott Lawson
 scott.law...@manukau.ac.nz wrote:
  I have an interesting question that may or may not be answerable from
 some
  internal
  ZFS semantics.

 This is really standard Unix filesystem semantics.

  [...]
 
  So total storage used is around ~7.5MB due to the hard linking taking
 place
  on each store.
 
  If hard linking capability had been turned off, this same message would
 have
  used 1500 x 2MB =3GB
  worth of storage.
 
  My question is there any simple ways of determining the space savings on
  each of the stores from the usage of hard links?  [...]

 But... you just did!  :)  It's: number of hard links * (file size +
 sum(size of link names and/or directory slot size)).  For sufficiently
 large files (say, larger than one disk block) you could approximate
 that as: number of hard links * file size.  The key is the number of
 hard links, which will typically vary, but for e-mails that go to all
 users, well, you know the number of links then is the number of users.

 You could write a script to do this -- just look at the size and
 hard-link count of every file in the store, apply the above formula,
 add up the inflated sizes, and you're done.

 Nico

 PS: Is it really the case that Exchange still doesn't deduplicate
 e-mails?  Really?  It's much simpler to implement dedup in a mail
 store than in a filesystem...



MS has had SIS since Exchange 4.0.  They dumped it in 2010 because it was a
huge source of their small random I/O's.  In an effort to allow Exchange to
be more storage friendly (IE: more of a large sequential I/O profile),
they've done away with SIS.  The defense for it is that you can buy more
cheap storage for less money than you'd save with SIS and 15k rpm disks.
 Whether that's factual I suppose is for the reader to decide.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool import hangs any zfs-related programs, eats all RAM and dies in swapping hell

2011-06-10 Thread Tim Cook

 it nearly occurs, I have only a few seconds of uptime
 left, and since each run boot-to-crash takes roughly 2-3
 hours now, I am unlikely to be active at the console at
 these critical few seconds.

 And the sync would likely never return in this case, too.

   Delete's,  and dataset / snapshot deletes
 are not managed correctly in a deduped environment in
 ZFS.  This is a known problem although it should not be anywhere
 nearly as bad as what you are describing in the current tip.


 Well, it is, on a not lowest-end hardware (at least in
 terms of what OpenSolaris developers can expect from a
 general enthusiast community which is supposed to help
 by testing, deploying and co-developing the best OS).

 The part where such deletes are slow are understandable
 and explainable - I don't have any big performance
 expectations for the box, 10Mbyte/sec is quite fine
 with me here. The part where it leads to crashes and
 hangs system programs (zfs, zpool, etc) is unacceptable.


 The startup delay you are seeing is another feature of ZFS, if you
reboot
 in the middle of a large file delete or dataset destroy, ZFS ( and the
OS)
 will not come up until it finishes the delete or dataset destroy first.


 Why can't it be an intensive, but background, operation?
 Import the pool, let it be used, and go on deleting...
 like it was supposed to be in that lifetime when the
 box began deleting these blocks ;)

 Well, it took me a worrysome while to figure this out
 the first time, a couple of months ago. Now I am just
 rather annoyed about absence of access to my box and
 data, but I hope that it will come around after several
 retries.

 Apparently, this unpredictability (and slowness and
 crashes) is a show-stopper for any enterprise use.

 I have made workarounds for the OS to come up okay,
 though. Since the root pool is separate, I removed
 pool and dcpool from zpool.cache file, and now the
 OS milestones do not depend on them to be available.

 Instead, importing the pool (with cachefile=none),
 starting the iscsi target and initiator, creating and
 removing the LUN with sbdadm, and importing the
 dcpool are all wrapped in several SMF services
 so I can relatively easily control the presence
 of these pools (I can disable them from autostart
 by touching a file in /etc directory).

 Steve



 - Jim Klimov jimkli...@cos.ru wrote:

 I've captured an illustration for this today, with my watchdog as
 well as vmstat, top and other tools. Half a gigabyte in under one
 second - the watchdog never saw it coming :(





While your memory may be sufficient, that cpu is sorely lacking.  Is it even
64bit?  There's a reason intel couldn't give those things away in the early
2000s and amd was eating their lunch.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Metadata (DDT) Cache Bias

2011-06-05 Thread Tim Cook

On Sun, Jun 5, 2011 at 9:56 AM, Edward Ned Harvey 
opensolarisisdeadlongliveopensola...@nedharvey.com wrote:

  From: Richard Elling [mailto:richard.ell...@gmail.com]
  Sent: Saturday, June 04, 2011 9:10 PM
   Instant Poll : Yes/No ?
 
  No.
 
  Methinks the MRU/MFU balance algorithm adjustment is more fruitful.

 Operating under the assumption that cache hits can be predicted, I agree
 with RE.  However, that's not always the case, and if you have a random
 work
 load with enough ram to hold the whole DDT, but you don't have enough ram
 to
 hold your whole storage pool, then dedup hurts your performance
 dramatically.  Your only option is to set primarycache=metadata, and simply
 give up hope that you could *ever* have a userdata cache hit.

 The purpose for starting this thread is to suggest it might be worthwhile
 (particularly with dedup enabled) to at least have the *option* of always
 keeping the metadata in cache, but still allow userdata to be cached too,
 up
 to the size of c_max.  Just in case you might ever see a userdata cache
 hit.
 ;-)

 And as long as we're taking a moment to think outside the box, it might as
 well be suggested that this doesn't have to be a binary decision,
 all-or-nothing.  One way to implement such an idea would be to assign a
 relative weight to metadata versus userdata.  Dan and Roch suggested a
 value
 of 128x seems appropriate.  I'm sure some people would suggest infinite
 metadata weight (which is synonymous to the aforementioned
 primarycache=metadata, plus the ability to cache userdata in the remaining
 unused ARC space.)


I'd go with the option of allowing both a weighted and a forced option.  I
agree though, if you do primarycache=metadata, the system should still
attempt to cache userdata if there is additional space remaining.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] JBOD recommendation for ZFS usage

2011-05-30 Thread Tim Cook

On Mon, May 30, 2011 at 1:35 PM, Jim Klimov j...@cos.ru wrote:

 Thanks, now I have someone to interrogate, who seems to have
 seen these boxes live - if you don't mind ;)

 - Original Message -
 From: Richard Elling richard.ell...@gmail.com
 Date: Monday, May 30, 2011 22:04

  We also commonly see the dual-expander backplanes.

   According to the docs, each chip addresses all disks on its
  backplane, and it seems
   implied (but not expressly stated) that either one chip and
  path works, or another.

  For SAS targets, both paths work simultaneously.

 Does this mean that if the J0 uplinks of backplanes are connected
 to HBAs in two different servers, both of these servers can address
 individual disks (and the unit of failover is not a backplane but a disk
 after all)?

 And if both HBAs are in a single server, this doubles the SAS link
 throughput by having two paths - and can ZFS somehow balance
 among them?

   So if your application can live with the unit of failover
  being a bunch of 21 or 24 disks -
   that might be a way to go. However each head would only have
  one connection to
   each backplane, and I'm not sure if you can STONITH the non-
  leading head to enforce
   failovers (and enable the specific PRI/SEC chip of the backplane).

  The NexentaStor HA-Cluster plugin manages STONITH and reservations.
  I do not believe programming expanders or switches for
  clustering is the best approach.
  It is better to let the higher layers manage this.
 Makes sense.

 Since I originally thought that only one path works at a given time,
 it may be needed to somehow shutdown the competitor HBA/link ;)

   I am not sure if this requirement also implies dual SAS data
  connectors - pictures
   of HCL HDDs all have one connector...

  These are dual ported.
 Does this mean mecanically two 7-pin SATA data ports and a wide
 power port, for a total of 3 connectors on the back of HDD, as well
 as on the backplane sockets? Or does it mean something else?

 Because I've looked up half a dozen of SuperMicro-supported drives
 (bold SAS in the list for E2-series chassis), and in the online shops'
 images they all have the standard 2 connectors (wide and 7-pin):
 http://www.supermicro.com.tr/SAS-1-CompList.pdf
 The HCL is rather small, and other components may work but are
 not supported by SuperMicro.

 And to be more specific, do you know if Hitachi 7K3000 series SAS
 models HUS723020ALS640 (2Tb) or HUS723030ALS640 (3Tb)
 are suitable for these boxes?

 Does it make sense to keep the OS/swap on faster smaller drives like
 a mirror of HUS156030VLS600 (300Gb SAS 15kRPM) - or is it
 a waste of money? (And are they known to work in these boxes?)

   Hint: Nexenta people seem to be good OEM friends with
  Supermicro, so they
   might know ;)

  Yes :-)
   -- richard

 Thanks!

 //Jim Klimov

SAS drives are SAS drives, they aren't like SCSI.  There aren't 20 different
versions with different pinouts.

Multipathing is handled by mpxio.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS, Oracle and Nexenta

2011-05-25 Thread Tim Cook

On Wed, May 25, 2011 at 8:53 AM, Frank Van Damme
frank.vanda...@gmail.comwrote:

 Op 25-05-11 14:27, joerg.moellenk...@sun.com schreef:
  Well, at first ZFS development is no standard body and at the end
  everything has to be measured in compatibility to the Oracle ZFS
  implementation

 Why? Given that ZFS is Solaris ZFS just as well as Nexenta ZFS just as
 well as illumos ZFS, by what reason is Oracle ZFS being declared the
 standard or reference? Because they write the first so-many lines or
 because they make the biggest sales on it (kinda hard to sell licenses
 to an open source product)?



Because they OWN the code, and the patents to protect the code.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS, Oracle and Nexenta

2011-05-25 Thread Tim Cook

On Wed, May 25, 2011 at 10:01 AM, Paul Kraus p...@kraus-haus.org wrote:

 On Wed, May 25, 2011 at 10:27 AM, Bob Friesenhahn
 bfrie...@simple.dallas.tx.us wrote:

  The method the IETF uses seems to be particularly immune to vendor
  interference.  Vendors who want to participate in defining an
 interoperable
  standard can achieve substantial success.  Vendors who only want their
 own
  way encounter deafening silence and isolation.

There have been a number of RFC's effectively written by one
 vendor in order to be able to claim open standards compliance, the
 biggest corporate offender in this regard, but clearly not the only
 one, is Microsoft. The next time I run across one of these RFC's I'll
 make sure to forward you a copy.

The only one that comes to mind immediately was the change to the
 specification of what characters were permissible in DNS records to
 include underscore _. This was specifically to support Microsoft's
 existing naming convention. I am NOT saying that was a bad change, but
 that it was a change driven by ONE vendor.




Except it wasn't just Microsoft at all.  There were three vendors on the
original RFC, and one of the authors was Paul Vixie... the author of BIND.
http://www.ietf.org/rfc/rfc2782.txt

You should probably do a bit of research before throwing out claims like
that to try to shoot someone down.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Solaris vs FreeBSD question

2011-05-18 Thread Tim Cook

On Wed, May 18, 2011 at 7:47 AM, Paul Kraus p...@kraus-haus.org wrote:

Over the past few months I have seen mention of FreeBSD a couple
 time in regards to ZFS. My question is how stable (reliable) is ZFS on
 this platform ?

This is for a home server and the reason I am asking is that about
 a year ago I bought some hardware based on it's inclusion on the
 Solaris 10 HCL, as follows:

 SuperMicro 7045A-WTB (although I would have preferred the server
 version, but it wasn't on the HCL)
 Two quad core 2.0 GHz Xeon CPUs
 8 GB RAM (I am NOT planning on using DeDupe)
 2 x Seagate ES-2 250 GB SATA drives for the OS
 4 x Seagate ES-2 1 TB SATA drives for data
 Nvidia Geforce 8400 (cheapest video card I could get locally)

I could not get the current production Solaris or OpenSolaris to
 load. The miniroot would GPF while loading the kernel. I could not get
 the problem resolved and needed to get the server up and running as my
 old server was dying (dual 550 MHz P3 with 1 GB RAM) and I needed to
 get my data (about 600 GB) off of it before I lost anything. That old
 server was running Solaris 10 and the data was in a zpool with
 mirrored vdevs of different sized drives. I had lost one drive in each
 vdev and zfs saved my data. So I loaded OpenSuSE and moved the data to
 a mirrored pair of 1 TB drives.

I still want to move my data to ZFS, and push has come to shove,
 as I am about to overflow the 1 TB mirror and I really, really hate
 the Linux options for multiple disk device management (I'm spoiled by
 SVM and ZFS). So now I really need to get that hardware loaded with an
 OS that supports ZFS. I have tried every variation of Solaris that I
 can get my hands on including Solaris 11 Express and Nexenta 3 and
 they all GPF loading the kernel to run the installer. My last hope is
 that I have a very plain vanilla (ancient S540) video card to swap in
 for the Nvidia on the very long shot chance that is the problem. But I
 need a backup plan if that does not work.

I have tested the hardware with FreeBSD 8 and it boots to the
 installer. So my question is whether the FreeBSD ZFS port is up to
 production use ? Is there anyone here using FreeBSD in production with
 good results (this list tends to only hear about serious problems and
 not success stories) ?

 P.S. If anyone here has a suggestion as to how to get Solaris to load
 I would love to hear it. I even tried disabling multi-cores (which
 makes the CPUs look like dual core instead of quad) with no change. I
 have not been able to get serial console redirect to work so I do not
 have a good log of the failures.

 --

 {1-2-3-4-5-6-7-}
 Paul Kraus
 - Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
 - Sound Coordinator, Schenectady Light Opera Company (
 http://www.sloctheater.org/ )
 - Technical Advisor, RPI Players



I've heard nothing but good things about it.  FreeNAS uses it:
http://freenas.org/ and IXSystems sells a commercial product based on the
FreeNAS/FreeBSD code.  I don't think they have a full-blown implementation
of CIFS (just Samba), but other than that, I don't think you'll have too
many issues.  I actually considered moving over to it, but I made the
unfortunate mistake of upgrading to Solaris 11 Express, which means my zpool
version is now too new to run anything else (AFAIK).

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Deduplication Memory Requirements

2011-05-09 Thread Tim Cook

On Mon, May 9, 2011 at 2:11 AM, Evaldas Auryla evaldas.aur...@edqm.euwrote:

  On 05/ 6/11 07:21 PM, Brandon High wrote:

 On Fri, May 6, 2011 at 9:15 AM, Ray Van Dolsonrvandol...@esri.com
 wrote:

  We use dedupe on our VMware datastores and typically see 50% savings,
 often times more.  We do of course keep like VM's on the same volume

 I think NetApp uses 4k blocks by default, so the block size and
 alignment should match up for most filesystems and yield better
 savings.

 Assuming that VMware datastores are on NFS ? Otherwise VMware filesystem
 VMFS uses its own block sizes from 1M to 8M, so the important point is to
 align guest OS partition to 1M, and Windows guests starting from Vista/2008
 do that by default now.

 Regards,


The VMFS filesystem itself is aligned by NetApp at LUN creation time.  You
still align to a 4K block on a filer because there is no way to
automatically align an encapsulated guest, especially when you could have
different guest OS types on a LUN.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Deduplication Memory Requirements

On Wed, May 4, 2011 at 6:36 PM, Erik Trimble erik.trim...@oracle.comwrote:

 On 5/4/2011 4:14 PM, Ray Van Dolson wrote:

 On Wed, May 04, 2011 at 02:55:55PM -0700, Brandon High wrote:

 On Wed, May 4, 2011 at 12:29 PM, Erik Trimbleerik.trim...@oracle.com
  wrote:

I suspect that NetApp does the following to limit their resource
 usage:   they presume the presence of some sort of cache that can be
 dedicated to the DDT (and, since they also control the hardware, they
 can
 make sure there is always one present).  Thus, they can make their code

 AFAIK, NetApp has more restrictive requirements about how much data
 can be dedup'd on each type of hardware.

 See page 29 of http://media.netapp.com/documents/tr-3505.pdf - Smaller
 pieces of hardware can only dedup 1TB volumes, and even the big-daddy
 filers will only dedup up to 16TB per volume, even if the volume size
 is 32TB (the largest volume available for dedup).

 NetApp solves the problem by putting rigid constraints around the
 problem, whereas ZFS lets you enable dedup for any size dataset. Both
 approaches have limitations, and it sucks when you hit them.

 -B

 That is very true, although worth mentioning you can have quite a few
 of the dedupe/SIS enabled FlexVols on even the lower-end filers (our
 FAS2050 has a bunch of 2TB SIS enabled FlexVols).

  Stupid question - can you hit all the various SIS volumes at once, and
 not get horrid performance penalties?

 If so, I'm almost certain NetApp is doing post-write dedup.  That way, the
 strictly controlled max FlexVol size helps with keeping the resource limits
 down, as it will be able to round-robin the post-write dedup to each FlexVol
 in turn.

 ZFS's problem is that it needs ALL the resouces for EACH pool ALL the time,
 and can't really share them well if it expects to keep performance from
 tanking... (no pun intended)


On a 2050?  Probably not.  It's got a single-core mobile celeron CPU and
2GB/ram.  You couldn't even run ZFS on that box, much less ZFS+dedup.  Can
you do it on a model that isn't 4 years old without tanking performance?
 Absolutely.

Outside of those two 2000 series, the reason there are dedup limits isn't
performance.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Deduplication Memory Requirements

On Wed, May 4, 2011 at 6:51 PM, Erik Trimble erik.trim...@oracle.comwrote:

  On 5/4/2011 4:44 PM, Tim Cook wrote:



 On Wed, May 4, 2011 at 6:36 PM, Erik Trimble erik.trim...@oracle.comwrote:

 On 5/4/2011 4:14 PM, Ray Van Dolson wrote:

 On Wed, May 04, 2011 at 02:55:55PM -0700, Brandon High wrote:

 On Wed, May 4, 2011 at 12:29 PM, Erik Trimbleerik.trim...@oracle.com
  wrote:

I suspect that NetApp does the following to limit their resource
 usage:   they presume the presence of some sort of cache that can be
 dedicated to the DDT (and, since they also control the hardware, they
 can
 make sure there is always one present).  Thus, they can make their code

 AFAIK, NetApp has more restrictive requirements about how much data
 can be dedup'd on each type of hardware.

 See page 29 of http://media.netapp.com/documents/tr-3505.pdf - Smaller
 pieces of hardware can only dedup 1TB volumes, and even the big-daddy
 filers will only dedup up to 16TB per volume, even if the volume size
 is 32TB (the largest volume available for dedup).

 NetApp solves the problem by putting rigid constraints around the
 problem, whereas ZFS lets you enable dedup for any size dataset. Both
 approaches have limitations, and it sucks when you hit them.

 -B

 That is very true, although worth mentioning you can have quite a few
 of the dedupe/SIS enabled FlexVols on even the lower-end filers (our
 FAS2050 has a bunch of 2TB SIS enabled FlexVols).

  Stupid question - can you hit all the various SIS volumes at once, and
 not get horrid performance penalties?

 If so, I'm almost certain NetApp is doing post-write dedup.  That way, the
 strictly controlled max FlexVol size helps with keeping the resource limits
 down, as it will be able to round-robin the post-write dedup to each FlexVol
 in turn.

 ZFS's problem is that it needs ALL the resouces for EACH pool ALL the
 time, and can't really share them well if it expects to keep performance
 from tanking... (no pun intended)


  On a 2050?  Probably not.  It's got a single-core mobile celeron CPU and
 2GB/ram.  You couldn't even run ZFS on that box, much less ZFS+dedup.  Can
 you do it on a model that isn't 4 years old without tanking performance?
  Absolutely.

  Outside of those two 2000 series, the reason there are dedup limits isn't
 performance.

  --Tim

  Indirectly, yes, it's performance, since NetApp has plainly chosen
 post-write dedup as a method to restrict the required hardware
 capabilities.  The dedup limits on Volsize are almost certainly driven by
 the local RAM requirements for post-write dedup.

 It also looks like NetApp isn't providing for a dedicated DDT cache, which
 means that when the NetApp is doing dedup, it's consuming the normal
 filesystem cache (i.e. chewing through RAM).  Frankly, I'd be very surprised
 if you didn't see a noticeable performance hit during the period that the
 NetApp appliance is performing the dedup scans.



Again, it depends on the model/load/etc.  The smallest models will see
performance hits for sure.  If the vol size limits are strictly a matter of
ram, why exactly would they jump from 4TB to 16TB on a 3140 by simply
upgrading ONTAP?  If the limits haven't gone up on, at the very least, every
one of the x2xx systems 12 months from now, feel free to dig up the thread
and give an I-told-you-so.  I'm quite confident that won't be the case.  The
16TB limit SCREAMS to me that it's a holdover from the same 32bit limit that
causes 32-bit volumes to have a 16TB limit.  I'm quite confident they're
just taking the cautious approach on moving to 64bit dedup code.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Deduplication Memory Requirements

On Wed, May 4, 2011 at 10:15 PM, Edward Ned Harvey 
opensolarisisdeadlongliveopensola...@nedharvey.com wrote:

  From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
  boun...@opensolaris.org] On Behalf Of Erik Trimble
 
  ZFS's problem is that it needs ALL the resouces for EACH pool ALL the
  time, and can't really share them well if it expects to keep performance
  from tanking... (no pun intended)

 That's true, but on the flipside, if you don't have adequate resources
 dedicated all the time, it means performance is unsustainable.  Anything
 which is going to do post-write dedup will necessarily have degraded
 performance on a periodic basis.  This is in *addition* to all your scrubs
 and backups and so on.



AGAIN, you're assuming that all system resources are used all the time and
can't possibly go anywhere else.  This is absolutely false.  If someone is
running a system at 99% capacity 24/7, perhaps that might be a factual
statement.  I'd argue if someone is running the system 99% all of the time,
the system is grossly undersized for the workload.  How can you EVER expect
a highly available system to run 99% on both nodes (all nodes in a vmax/vsp
scenario) and ever be able to fail over?  Either a home-brew Opensolaris
Cluster, Oracle 7000 cluster, or NetApp?

I'm gathering that this list in general has a lack of understanding of how
NetApp does things.  If you don't know for a fact how it works, stop jumping
to conclusions on how you think it works.  I know for a fact that short of
the guys currently/previously writing the code at NetApp, there's a handful
of people in the entire world who know (factually) how they're allocating
resources from soup to nuts.

As far as this discussion is concerned, there's only two points that matter:
They've got dedup on primary storage, it works in the field.  The rest is
just static that doesn't matter.  Let's focus on how to make ZFS better
instead of trying to guess how others are making it work, especially when
they've got a completely different implementation.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Deduplication Memory Requirements

On Wed, May 4, 2011 at 10:23 PM, Edward Ned Harvey 
opensolarisisdeadlongliveopensola...@nedharvey.com wrote:

  From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
  boun...@opensolaris.org] On Behalf Of Ray Van Dolson
 
  Are any of you out there using dedupe ZFS file systems to store VMware
  VMDK (or any VM tech. really)?  Curious what recordsize you use and
  what your hardware specs / experiences have been.

 Generally speaking, dedup doesn't work on VM images.  (Same is true for ZFS
 or netapp or anything else.)  Because the VM images are all going to have
 their own filesystems internally with whatever blocksize is relevant to the
 guest OS.  If the virtual blocks in the VM don't align with the ZFS (or
 whatever FS) host blocks...  Then even when you write duplicated data
 inside
 the guest, the host won't see it as a duplicated block.

 There are some situations where dedup may help on VM images...  For example
 if you're not using sparse files and you have a zero-filed disk...  But in
 that case, you should probably just use a sparse file instead...  Or ...
  If
 you have a golden image that you're copying all over the place ... but in
 that case, you should probably just use clones instead...

 Or if you're intimately familiar with both the guest  host filesystems,
 and
 you choose blocksizes carefully to make them align.  But that seems
 complicated and likely to fail.



That's patently false.  VM images are the absolute best use-case for dedup
outside of backup workloads.  I'm not sure who told you/where you got the
idea that VM images are not ripe for dedup, but it's wrong.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS Going forward after Oracle - Let's get organized, let's get started.

2011-04-09 Thread Tim Cook

On Sat, Apr 9, 2011 at 4:25 PM, Garrett D'Amore garr...@nexenta.com wrote:

 On Sun, 2011-04-10 at 08:56 +1200, Ian Collins wrote:
  On 04/10/11 05:41 AM, Chris Forgeron wrote:
   I see your point, but you also have to understand that sometimes too
 many helpers/opinions are a bad thing.  There is a set core of ZFS
 developers who make a lot of this move forward, and they are the key right
 now. The rest of us will just muddy the waters with conflicting/divergent
 opinions on direction and goals.
  
  In the real world we would be called customers, you know the people who
  actually use the product.

 Right.  And in the real world, customers are generally not involved with
 architectural discussions of products.  Their input is collected and
 feed into the process, but they don't get to sit at the whiteboard with
 developers as the work on the designs.


What real world?  Real world of enterprise storage development, or real
world of open-source project?  It sounds to me like you want to have your
cake and eat it too.




  Developers, no matter how good, shouldn't work in a vacuum.

 Agreed, and we don't.


Except for the secret mailing list, and the fact you've stated repeatedly
the code will be behind a wall until you feel it's ready for the public to
see, right?  How exactly are the developers not working in a vacuum?




  If you want to see a good example of how things should be done in the
  open, follow the caiman-discuss list.

 Caiman-discuss may be an excellent example of a model that can work, but
 it might not be the best model for ZFS.  There are many more contentious
 issues, and more contentious personalities, and other considerations
 that I don't want to get into.

 Ultimately, our model is like an IEEE working group.  The members have
 decided to run this list in this fashion, without any significant
 dissension.

 Of course, if you don't like this, and want to start your own group, I
 encourage you to do so.

 I'll also point at zfs-discuss@opensolaris.org, which is monitored by a
 number of the members of this cabal.  That's a great way to give
 feedback.

- Garrett



That's mature.  If you don't like it, fork it yourself.  With responses
like that, I can only imagine how quickly you're going to build up steam
behind your project outside of the four or so entities that have a vested
interest.  I've always said, the best way to build a community is by telling
anyone who suggests perhaps they might be able to give feedback that they
should be happy you're giving them any scraps at all (or in your case, not
even that).


--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] dual protocal on one file system?

2011-03-12 Thread Tim Cook

On Sat, Mar 12, 2011 at 7:42 PM, Fred Liu fred_...@issi.com wrote:

   Hi,



 Is it possible to run both CIFS and NFS on one file system over ZFS?





 Thanks.



 Fred



Yes, but managing permissions in that scenario is generally a nightmare.  If
you're using NFSv4 with AD integration, it's a bit more manageable, but it's
still definitely a work in progress.


--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] dual protocal on one file system?

2011-03-12 Thread Tim Cook

2011/3/12 Fred Liu fred_...@issi.com

  Tim,



 Thanks.



 Is there a mapping mechanism like what DataOnTap does to map the
 permission/acl between NIS/LDAP and AD?



 Thanks.



 Fred



 *From:* Tim Cook [mailto:t...@cook.ms]
 *Sent:* 星期日, 三月 13, 2011 9:53
 *To:* Fred Liu
 *Cc:* zfs-discuss@opensolaris.org
 *Subject:* Re: [zfs-discuss] dual protocal on one file system?





 On Sat, Mar 12, 2011 at 7:42 PM, Fred Liu fred_...@issi.com wrote:

 Hi,



 Is it possible to run both CIFS and NFS on one file system over ZFS?





 Thanks.



 Fred





 Yes, but managing permissions in that scenario is generally a nightmare.
  If you're using NFSv4 with AD integration, it's a bit more manageable, but
 it's still definitely a work in progress.





 --Tim



Yes.
http://www.unix.com/man-page/OpenSolaris/1m/idmap/

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] cannot replace c10t0d0 with c10t0d0: device is too small

2011-03-04 Thread Tim Cook

On Fri, Mar 4, 2011 at 10:22 AM, Robert Hartzell b...@rwhartzell.netwrote:

 In 2007 I bought 6 WD1600JS 160GB sata disks and used 4 to create a raidz
 storage pool and then shelved the other two for spares. One of the disks
 failed last night so I shut down the server and replaced it with a spare.
 When I tried to zpool replace the disk I get:

 zpool replace tank c10t0d0
 cannot replace c10t0d0 with c10t0d0: device is too small

 The 4 original disk partition tables look like this:

 Current partition table (original):
 Total disk sectors available: 312560317 + 16384 (reserved sectors)

 Part  TagFlag First Sector Size Last Sector
  0usrwm34  149.04GB  312560350
  1 unassignedwm 0   0   0
  2 unassignedwm 0   0   0
  3 unassignedwm 0   0   0
  4 unassignedwm 0   0   0
  5 unassignedwm 0   0   0
  6 unassignedwm 0   0   0
  8   reservedwm 3125603518.00MB  312576734

 Spare disk partition table looks like this:

 Current partition table (original):
 Total disk sectors available: 312483549 + 16384 (reserved sectors)

 Part  TagFlag First Sector Size Last Sector
  0usrwm34  149.00GB  312483582
  1 unassignedwm 0   0   0
  2 unassignedwm 0   0   0
  3 unassignedwm 0   0   0
  4 unassignedwm 0   0   0
  5 unassignedwm 0   0   0
  6 unassignedwm 0   0   0
  8   reservedwm 3124835838.00MB  312499966

 So it seems that two of the disks are slightly different models and are
 about 40mb smaller then the original disks.

 I know I can just add a larger disk but I would rather user the hardware I
 have if possible.
 1) Is there anyway to replace the failed disk with one of the spares?
 2) Can I recreate the zpool using 3 of the original disks and one of the
 slightly smaller spares? Will zpool/zfs adjust its size to the smaller disk?
 3) If #2 is possible would I still be able to use the last still shelved
 disk as a spare?

 If #2 is possible I would probably recreate the zpool as raidz2 instead of
 the current raidz1.

 Any info/comments would be greatly appreciated.

 Robert




You cannot.  That's why I suggested two years ago that they chop off 1% from
the end of the disk at install time to equalize drive sizes.  That way you
you wouldn't run into this problem trying to replace disks from a different
vendor or different batch.  The response was that Sun makes sure all drives
are exactly the same size (although I do recall someone on this forum having
this issue with Sun OEM disks as well).  It's ridiculous they don't take
into account the slight differences in drive sizes from vendor to vendor.
 Forcing you to single-source your disks is a bad habit to get into IMO.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] A few questions

2011-01-04 Thread Tim Cook

On Mon, Jan 3, 2011 at 5:56 AM, Garrett D'Amore garr...@nexenta.com wrote:

  On 01/ 3/11 05:08 AM, Robert Milkowski wrote:

 On 12/26/10 05:40 AM, Tim Cook wrote:



 On Sat, Dec 25, 2010 at 11:23 PM, Richard Elling richard.ell...@gmail.com
  wrote:


 There are more people outside of Oracle developing for ZFS than inside
 Oracle.
 This has been true for some time now.




  Pardon my skepticism, but where is the proof of this claim (I'm quite
 certain you know I mean no disrespect)?  Solaris11 Express was a massive
 leap in functionality and bugfixes to ZFS.  I've seen exactly nothing out of
 outside of Oracle in the time since it went closed.  We used to see
 updates bi-weekly out of Sun.  Nexenta spending hundreds of man-hours on a
 GUI and userland apps isn't work on ZFS.



 Exactly my observation as well. I haven't seen any ZFS related development
 happening at Ilumos or Nexenta, at least not yet.


 Just because you've not seen it yet doesn't imply it isn't happening.
 Please be patient.

- Garrett



Or, conversely, don't make claims of all this code contribution prior to
having anything to show for your claimed efforts.  Duke Nukem Forever was
going to be the greatest video game ever created... we were told to be
patient... we're still waiting for that too.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] A few questions

2011-01-04 Thread Tim Cook

On Tue, Jan 4, 2011 at 8:21 PM, Garrett D'Amore garr...@nexenta.com wrote:

  On 01/ 4/11 09:15 PM, Tim Cook wrote:



 On Mon, Jan 3, 2011 at 5:56 AM, Garrett D'Amore garr...@nexenta.comwrote:

  On 01/ 3/11 05:08 AM, Robert Milkowski wrote:

 On 12/26/10 05:40 AM, Tim Cook wrote:



 On Sat, Dec 25, 2010 at 11:23 PM, Richard Elling 
 richard.ell...@gmail.com wrote:


 There are more people outside of Oracle developing for ZFS than inside
 Oracle.
 This has been true for some time now.




  Pardon my skepticism, but where is the proof of this claim (I'm quite
 certain you know I mean no disrespect)?  Solaris11 Express was a massive
 leap in functionality and bugfixes to ZFS.  I've seen exactly nothing out of
 outside of Oracle in the time since it went closed.  We used to see
 updates bi-weekly out of Sun.  Nexenta spending hundreds of man-hours on a
 GUI and userland apps isn't work on ZFS.



 Exactly my observation as well. I haven't seen any ZFS related development
 happening at Ilumos or Nexenta, at least not yet.


  Just because you've not seen it yet doesn't imply it isn't happening.
 Please be patient.

- Garrett



  Or, conversely, don't make claims of all this code contribution prior to
 having anything to show for your claimed efforts.  Duke Nukem Forever was
 going to be the greatest video game ever created... we were told to be
 patient... we're still waiting for that too.



 Um, have you not been paying attention?  I've delivered quite a lot of
 contribution to illumos already, just not in ZFS.   Take a close look --
 there almost certainly wouldn't *be* an open source version of OS/Net had I
 not done the work to enable this in libc, kernel crypto, and other bits.
 This work is still higher priority than ZFS innovation for a variety of
 reasons -- mostly because we need a viable and supportable illumos upon
 which to build those ZFS innovations.

 That said, much of the ZFS work I hope to contribute to illumos needs more
 baking, but some of it is already open source in NexentaStor.  (You can for
 a start look at zfs-monitor, the WORM support, and support for hardware GZIP
 acceleration all as things that Nexenta has innovated in ZFS, and which are
 open source today if not part of illumos.  Check out
 http://www.nexenta.org for source code access.)

 So there, money placed where mouth is.  You?

- Garrett



The claim was that there are more people contributing code from outside of
Oracle than inside to zfs.  Your contributions to Illumos do absolutely
nothing to backup that claim.  ZFS-monitor is not ZFS code (it's an FMA
module), WORM also isn't ZFS code, it's an OS level operation, and GZIP
hardware acceleration is produced by Indra networks, and has absolutely
nothing to do with ZFS.  Does it help ZFS?  Sure, but that's hardly a code
contribution to ZFS when it's simply a hardware acceleration card that
accelerates ALL gzip code.

So, great job picking three projects that are not proof of developers
working on ZFS.  And great job not providing any proof to the claim there
are more developers working on ZFS outside of Oracle than within.

You're going to need a hell of a lot bigger bank account to cash the check
than what you've got.  As for me, I don't recall making any claims on this
list that I can't back up, so I'm not really sure what you're getting at.  I
can only assume the defensive tone of your email is because you've been
called out and can't backup the claims either.

So again: if you've got code in the works, great.  Talk about it when it's
ready.  Stop throwing out baseless claims that you have no proof of and then
fall back on just be patient, it's coming.  We've heard that enough from
Oracle and Sun already.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS ... open source moving forward?

2010-12-25 Thread Tim Cook

On Sat, Dec 25, 2010 at 8:25 AM, Edward Ned Harvey 
opensolarisisdeadlongliveopensola...@nedharvey.com wrote:

  From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
  boun...@opensolaris.org] On Behalf Of Joerg Schilling
 
  And people should note that Netapp filed their patents starting from
 1993.
  This
  is 5 years after I started to develop WOFS, which is copy on write. This
 still
 
  In any case, this is 20 year old technology. Aren't patents something to
  protect new ideas?

 Boy, those guys must be really dumb to waste their time filing billion
 dollar lawsuits, protecting 20-year old technology, when it's so obvious
 that you and other people clearly invented it before them, and all the
 money
 they waste on lawyers can never achieve anything.  They should all fire
 themselves.  And anybody who defends against it can safely hire a law
 student for $20/hr to represent them, and just pull out your documents as
 defense, because that's so easy.

 Plus, as you said, the technology is so old, it should be worthless by now.
 Why are we all wasting our time in this list talking about irrelevant old
 technology, anyway?




Indeed.  Isn't the Oracle database itself at least 20 years old?  And
Windows?  And Solaris itself?  All the employees of those companies should
probably just start donating their time for free instead of collecting a
paycheck since it's quite obvious they should no longer be able to charge
for their product.

What I find most entertaining is all the armchair lawyers on this mailing
list that think they've got prior art when THEY'VE NEVER EVEN SEEN THE CODE
IN QUESTION!


--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS ... open source moving forward?

2010-12-25 Thread Tim Cook

On Sat, Dec 25, 2010 at 1:10 PM, Erik Trimble erik.trim...@oracle.comwrote:

 On 12/25/2010 6:25 AM, Edward Ned Harvey wrote:

 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Joerg Schilling

 And people should note that Netapp filed their patents starting from
 1993.
 This
 is 5 years after I started to develop WOFS, which is copy on write. This

 still

 In any case, this is 20 year old technology. Aren't patents something to
 protect new ideas?

 Boy, those guys must be really dumb to waste their time filing billion
 dollar lawsuits, protecting 20-year old technology, when it's so obvious
 that you and other people clearly invented it before them, and all the
 money
 they waste on lawyers can never achieve anything.  They should all fire
 themselves.  And anybody who defends against it can safely hire a law
 student for $20/hr to represent them, and just pull out your documents as
 defense, because that's so easy.

 Plus, as you said, the technology is so old, it should be worthless by
 now.
 Why are we all wasting our time in this list talking about irrelevant old
 technology, anyway?


 While that's a bit sarcastic there Ned,  it *should* be the literal truth.
  But, as the SCO/Linux suit showed, having no realistic basis for a lawsuit
 doesn't prevent one from being dragged through the (U.S.) courts for the
 better part of a decade.

 sigh

 Why can't we have a loser-pays civil system like every other civilized
 country?


 --
 Erik Trimble
 Java System Support
 Mailstop:  usca22-123
 Phone:  x17195
 Santa Clara, CA
 Timezone: US/Pacific (GMT-0800)



If you've got enough money, we do.  You just have to make it to the end of
the trial, and have a judge who feels similar.  They often award monetary
settlements for the cost of legal defense to the victor.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] A few questions

2010-12-25 Thread Tim Cook

On Sat, Dec 25, 2010 at 11:23 PM, Richard Elling
richard.ell...@gmail.comwrote:

 On Dec 21, 2010, at 5:05 AM, Deano wrote:


 The question therefore is, is there room in the software implementation to
 achieve performance and reliability numbers similar to expensive drives
 whilst using relative cheap drives?


 For some definition of similar, yes. But using relatively cheap drives
 does
 not mean the overall system cost will be cheap.  For example, $250 will buy
 8.6K random IOPS @ 4KB in an SSD[1], but to do that with cheap disks
 might
 require eighty 7,200 rpm SATA disks.

 ZFS is good but IMHO easy to see how it can be improved to better meet this
 situation, I can’t currently say when this line of thinking and code will
 move from research to production level use (tho I have a pretty good idea ;)
 ) but I wouldn’t bet on the status quo lasting much longer. In some ways the
 removal of OpenSolaris may actually be a good thing, as its catalyized a
 number of developers from the view that zfs is Oracle led, to thinking “what
 can we do with zfs code as a base”?


 There are more people outside of Oracle developing for ZFS than inside
 Oracle.
 This has been true for some time now.




Pardon my skepticism, but where is the proof of this claim (I'm quite
certain you know I mean no disrespect)?  Solaris11 Express was a massive
leap in functionality and bugfixes to ZFS.  I've seen exactly nothing out of
outside of Oracle in the time since it went closed.  We used to see
updates bi-weekly out of Sun.  Nexenta spending hundreds of man-hours on a
GUI and userland apps isn't work on ZFS.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Disk failed, System not booting

2010-12-20 Thread Tim Cook

Just boot off a live cd, import the pool, and swap it that way.

I'm guessing you havent changed your failmode to continue?
On Dec 20, 2010 10:48 AM, Albert Frenz y...@zockbar.de wrote:
 hi there,

 i got freenas installed with a raidz1 pool of 3 disks. one of them now
failed and it gives me errors like Unrecovered red errors:
autorreallocatefailed or MEDIUM ERROR asc:11,4 and the system won't even
boot up. so i bought a replacement drive, but i am a bit concerned since
normaly you should detach the drive via terminal. i can't do it, since it
won't boot up. so am i safe, if i just shut down the machine and replace the
drive with the new one and resilver?

 thanks in advance
 adrian
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] OT: anyone aware how to obtain 1.8.0 for X2100M2?

2010-12-19 Thread Tim Cook

You have to have a support contract to download BIOS and firmware now.
On Dec 19, 2010 12:29 PM, Eugen Leitl eu...@leitl.org wrote:

 I realize this is off-topic, but Oracle has completely
 screwed up the support site from Sun. I figured someone
 here would know how to obtain

 Sun Fire X2100 M2 Server Software 1.8.0 Image contents:

 * BIOS is version 3A21
 * SP is updated to version 3.24 (ELOM)
 * Chipset driver is updated to 9.27

 from

 http://www.sun.com/servers/entry/x2100/downloads.jsp

 I've been trying for an hour, and I'm at the end of
 my rope.
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Mixing different disk sizes in a pool?

2010-12-18 Thread Tim Cook

On Sat, Dec 18, 2010 at 7:26 AM, Ian D rewar...@hotmail.com wrote:

 Another question:  all those disks are on Dell MD1000 JBODs (11 of them)
 and we have 12 SAS ports on three LSI 9200-16e HBAs.  Is there any point
 connecting each JBOD on a separate port or is it ok cascading them in groups
 of three?  Is there a bandwidth limit we'll be hitting doing that?

 Thanks


It's fine to cascade them.  SAS is all point-to-point.  I strongly doubt
you'll hit a bandwidth constraint on the backend, especially if you have the
shelves multipathed, but if that's a concern you will get more peak
bandwidth putting them on separate ports.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Mixing different disk sizes in a pool?

2010-12-18 Thread Tim Cook

On Sat, Dec 18, 2010 at 4:24 PM, Ian D rewar...@hotmail.com wrote:

  The answer really depends on what you want to do with
  pool(s).  You'll
  have to provide more information.

 Get the maximum of very random IOPS I get can out of those drives for
 database usage.
 --


Random IOPS won't max out the SAS link.  You'll be fine stacking them.  But
again, if you have the ports available, and already have the cables, it
won't hurt anything to use them.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS ... open source moving forward?

2010-12-16 Thread Tim Cook

On Thu, Dec 16, 2010 at 8:11 AM, Linder, Doug
doug.lin...@merchantlink.comwrote:

 Joerg Schilling wrote:

  The reason for not being able to use ZFS under Linux is not the license
  used by ZFS but the missing will for integration.
 
  Several lawyers explained already why adding ZFS to the Linux would
  just create a collective work that is permitted by the GPL.

 Folks, I very much did not intend to start, nor do I want to participate in
 or perpetuate, any religious flame wars.  This list is for ZFS discussion.
  There are plenty of other places for License Wars and IP discussion.

 The only thing I'll add is that I, as I said, I really don't care at all
 about licenses.  When it comes to licenses, to me (and, I suspect, the vast
 majority of other OSS users), GPL is synonymous with open source.  Is
 that correct?  No.  Am I aware that plenty of other licenses exist?  Yes.
  Is the issue important?  Sure.  Do I have time or interest to worry about
 niggly little details?  No.  All I want is to be able to use the best
 technology in the ways that are most useful to me without artificial
 restrictions.  Anything that advances that, I'm for.

 This is one of those geek things where the topic you're personally very
 geeky about seems *hugely* important and you can't understand why others
 don't see that.  Maybe it bugs you when people use GPL to mean open
 source, but the fact is that lots and lots of people do.  It bugs me when
 Stallman tries to get everyone to use the ridiculous GNU/Linux, as if
 anyone would ever say that.  It bugs me when people say I *could* care
 less.  But I live with these things.  People talk the way they talk.  If
 you're into IP issues and OSS licensing, that's great.  But don't be
 surprised if other people aren't as fascinated with the dirty details of IP
 law as you are.  Most people find the law unutterably boring.

 So, feel free to discuss this as much as you want, but leave me out of it.
  I regret and apologize for my callous disregard in casually tossing around
 a clearly incendiary term like GPL.

 Everyone have a great day! :)





The problem is, what you're saying amounts to:
I want Oracle to port ZFS to linux because I don't want to pay for it.  I
don't want to pay Oracle for it, and I want to be able to use it any way I
see fit.

What is in it for Oracle?  Goodwill doesn't pay the bills.  Claiming you'd
start paying for Solaris if they gave you ZFS for free in Linux is
absolutely ridiculous.  If the best response you can come up with is
goodwill, I suggest wishing in one hand and shitting in the other because
there's no way Oracle is going to give away such a  valuable piece of code
for no monetary compensation.  *AT BEST* I could see them releasing a binary
for OEL only that they won't be sharing with anyone else.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Guide to COMSTAR iSCSI?

2010-12-13 Thread Tim Cook

On Mon, Dec 13, 2010 at 5:30 PM, Chris Mosetick cmoset...@gmail.com wrote:

 I have found this post from Mike La Spina to be very detailed covering this
 topic, yet I could not seem to get it to work right on my first hasty
 attempt a while back.  Let me know if you have success, or adjustments that
 get this to work.


 http://blog.laspina.ca/ubiquitous/securing-comstar-and-vmware-iscsi-connections

 -Chris


 On Sun, Dec 12, 2010 at 12:47 AM, Martin Mundschenk 
 m.mundsch...@mundschenk.de wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Hi!

 I have configured two LUs following this guide:


 http://thegreyblog.blogspot.com/2010/02/setting-up-solaris-comstar-and.html

 Now I want each LU to be available to only one distinct client in the
 network. I found no easy guide how to accomplish the anywhere in the
 internet. Any hint?

 Martin


 -BEGIN PGP SIGNATURE-
 Version: GnuPG/MacGPG2 v2.0.16 (Darwin)

 iQEcBAEBAgAGBQJNBIw2AAoJEA6eiwqkMgR8vAcH/0jeBh0PvZdnjLK4FOY6/Xw1
 JwAqdNbS5jvUn8pvYRxdA379gqyZNoFXMRTpPl5Xefw88rpXS+vqvDHoaM1A5Wov
 tTERXrh9DMACAswm4KYnA7lcWxEUJWBJ8LA870Sd6GVqPHbBnE+R+o2Op69XUy/g
 +sAa0f7MDHPJP46xad5/qweUVRNZ0C+Ka2YYqhWKvYTN2DEYmFfnem+c6Vna2TXv
 uOLoEeV+CHOI/BdrpcDaU8XQzAS5f1x/oTPhk56j0Uzm4q8+aKqc2YTccvGnRJCm
 8F+/ZyZ40fy2TRLfhmZIGoL+y9nrJqUDm+K2jXkdH/55vzsk+EdhfZUlDYXsalo=
 =NdL6
 -END PGP SIGNATURE-




Looking at that, the one comment I'd make is that I'd strongly suggest
avoiding CHAP.  It really provides nothing in the way of security, and
simply adds more complexity.  If you're doing iSCSI across a WAN (I really
hope you aren't), you'd be better served using a VPN.  If you're doing it on
a LAN and you're concerned about security, use VLAN's.  It's generally a
good idea to dedicate a VLAN to vmware storage traffic anyways (whether it
be iSCSI or NFS) if your infrastructure can handle VLAN's.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS ... open source moving forward?

2010-12-11 Thread Tim Cook

On Sat, Dec 11, 2010 at 3:08 PM, Joerg Schilling 
joerg.schill...@fokus.fraunhofer.de wrote:

 Edward Ned Harvey opensolarisisdeadlongliveopensola...@nedharvey.com
 wrote:

  Problem is...  Oracle is now the only company in the world who's immune
 to netapp lawsuit over ZFS.  Even if IBM and Dell and HP wanted to band
 together and fund the open-source development of ZFS and openindiana...
  It's a real risk.

 I don't believe that there is a significant risk as the NetApp patents are
 invalid because of prior art.


You are not a court of law, and that statement has not been tested.  It is
your opinion and nothing more.  I'd appreciate if every time you repeated
that statement, you'd preface it with in my opinion so you don't have
people running around believing what they're doing is safe.  I'd hope they'd
be smart enough to consult with a lawyer, but it's probably better to just
not spread unsubstantiated rumor in the first place.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS ... open source moving forward?

2010-12-11 Thread Tim Cook

On Sat, Dec 11, 2010 at 5:17 PM, Joerg Schilling 
joerg.schill...@fokus.fraunhofer.de wrote:

 Tim Cook t...@cook.ms wrote:

   I don't believe that there is a significant risk as the NetApp patents
 are
   invalid because of prior art.
  
  
  You are not a court of law, and that statement has not been tested.  It
 is
  your opinion and nothing more.  I'd appreciate if every time you repeated
  that statement, you'd preface it with in my opinion so you don't have
  people running around believing what they're doing is safe.  I'd hope
 they'd
  be smart enough to consult with a lawyer, but it's probably better to
 just
  not spread unsubstantiated rumor in the first place.

 If you have substancial information on why NetApp may rightfully own a
 patent
 that is essential for ZFS, I would be interested to get this information.

 Jörg



The initial filing was public record.  It has been posted on this mailing
list already, and you responded to those posts.  I'm not sure why you're
acting like you're oblivious to the case.  Regardless, I'll answer your
rhetorical question:
http://www.groklaw.net/articlebasic.php?story=20080529163415471

You BELIEVING the are wrong doesn't make it so, sorry.  Until it is settled
in a court of law, or the patent office invalidates their patents, you are
making unsubstantiated claims.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS ... open source moving forward?

2010-12-10 Thread Tim Cook

On Fri, Dec 10, 2010 at 8:54 AM, Bob Friesenhahn 
bfrie...@simple.dallas.tx.us wrote:

 On Fri, 10 Dec 2010, Edward Ned Harvey wrote:


 It's been a while since I last heard anybody say anything about this.
 What's the latest version of publicly
 released ZFS?  Has oracle made it closed-source moving forward?


 Nice troll.

 Bob
 --
 Bob Friesenhahn
 bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,http://www.GraphicsMagick.org/


I'm not sure how it's trolling.  There have been 0 public statements I've
seen from Oracle on their future plans for what was opensolaris.  A leaked
internal memo is NOT official company policy.  Until I see source or an
official statement, I'm not holding my breath.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] 3TB HDD in ZFS

2010-12-06 Thread Tim Cook

It's based on a jumper on most new drives.
On Dec 6, 2010 8:41 PM, taemun tae...@gmail.com wrote:
 On 7 December 2010 13:25, Brandon High bh...@freaks.com wrote:

 There shouldn't be any problems using a 3TB drive with Solaris, so
 long as you're using a 64-bit kernel. Recent versions of zfs should
 properly recognize the 4k sector size as well.


 I think you'll find that these 3TB, 4KiB physical sector drives are still
 exporting logical sectors of 512B (this is what Anandtech has indicated,
 anyway). ZFS assumes that the drives logical sectors are directly mapped
to
 physical sectors, and will create an ashift=9 vdev for the drives.

 Hence why enthusiasts are making their own zpool binaries with a hardcoded
 ashift=12 so they can create pools that actually function beyond 20 random
 writes per second with these drives:

http://digitaldj.net/2010/11/03/zfs-zpool-v28-openindiana-b147-4k-drives-and-you/

 Cheers,
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Zfs ignoring spares?

2010-12-05 Thread Tim Cook

-8 ONLINE   0 0 0
c4t22d0ONLINE   0 0 0
c4t23d0ONLINE   0 0 0
c4t24d0ONLINE   0 0 0
c4t25d0ONLINE   0 0 0
c4t26d0ONLINE   0 0 0
c4t27d0ONLINE   0 0 0
c4t28d0ONLINE   0 0 0
  raidz2-9 ONLINE   0 0 0
c4t29d0ONLINE   0 0 0
c4t30d0ONLINE   0 0 0
c4t31d0ONLINE   0 0 0
c4t32d0ONLINE   0 0 0
c4t33d0ONLINE   0 0 0
c4t34d0ONLINE   0 0 0
c4t35d0ONLINE   0 0 0
  raidz2-10ONLINE   0 0 0
c4t36d0ONLINE   0 0 0
c4t37d0ONLINE   0 0 0
c4t38d0ONLINE   0 0 0
c4t39d0ONLINE   0 0 0
c4t40d0ONLINE   0 0 0
c4t41d0ONLINE   0 0 0
c4t42d0ONLINE   0 0 0
cache
  c8t0d0   ONLINE   0 0 0
  c8t1d0   ONLINE   0 0 0
spares
  c4t43d0  INUSE currently in use
  c4t44d0  INUSE currently in use

 errors: No known data errors
 r...@prv-backup:~#



Hot spares are dedicated spares in the ZFS world.  Until you replace the
actual bad drives, you will be running in a degraded state.  The idea is
that spares are only used in an emergency.  You are degraded until your
spares are no longer in use.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] OCZ RevoDrive ZFS support

2010-11-29 Thread Tim Cook

On Sun, Nov 28, 2010 at 5:18 PM, Krunal Desai mov...@gmail.com wrote:

  There are problems with Sandforce controllers, according to forum posts.
 Buggy firmware. And in practice, Sandforce is far below it's theoretical
 values. I expect Intel to have fewer problems.

 I believe it's more the firmware (and pace of firmware updates) from
 companies making Sandforce-based drives than it is the controller.
 Enthusiasts can tolerate OCZ and others releasing alphas/betas in forum
 posts.

 While the G2 Intel drives may not be the performance kings anymore (or the
 most price-effective), I'd argue they're certainly the most stable when it
 comes to firmware. Have my eye on a G3 Intel drive for my laptop, where I
 can't really afford beta firmware updates biting me on the road.

 --khd



Again this is news to me.  Do you have examples?  There were plenty of
revisions when they first dropped 6-8 months ago, but I haven't heard of
anything similar in quite some time. As for Intel, they've had their share
of issues as well.  I assume you remember the data-loss inducing BIOS
password bug?

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] OCZ RevoDrive ZFS support

2010-11-28 Thread Tim Cook

On Sun, Nov 28, 2010 at 1:41 PM, Orvar Korvar 
knatte_fnatte_tja...@yahoo.com wrote:

 There are problems with Sandforce controllers, according to forum posts.
 Buggy firmware. And in practice, Sandforce is far below it's theoretical
 values. I expect Intel to have fewer problems.


According to what forum posts?  There were issues when Crucial and a few
others released alpha firmware into production...  Anandtech has put those
drives through the ringer without issue.  Several people on this list are
running them as well.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] OCZ RevoDrive ZFS support

2010-11-28 Thread Tim Cook

On Sun, Nov 28, 2010 at 10:42 AM, David Magda dma...@ee.ryerson.ca wrote:

 On Nov 27, 2010, at 16:14, Tim Cook wrote:

  You don't need drivers for any SATA based SSD.  It shows up as a standard
 hard drive and plugs into a standard SATA port.  By the time the G3 Intel
 drive is out, the next gen Sandforce should be out as well.  Unless Intel
 does something revolutionary, they will still be behind the Sandforce
 drives.


 Are you referring to the SF-2000 chips?

http://www.sandforce.com/index.php?id=133
http://www.legitreviews.com/article/1429/1/
http://www.google.com/search?q=sandforce+sf-2000



Yup.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Ext. UPS-backed SATA SSD ZIL?

On Sat, Nov 27, 2010 at 9:34 AM, Christopher George cgeo...@ddrdrive.comwrote:

  I haven't had a chance to test a Vertex 2 PRO against my 2 EX, and I'd
  be interested if anyone else has.

 I recently presented at the OpenStorage Summit 2010 and compared
 exactly the three devices you mention in your post (Vertex 2 EX,
 Vertex 2 Pro, and the DDRdrive X1) as ZIL Accelerators.

 Jump to slide 37 for the write IOPS benchmarks:

 http://www.ddrdrive.com/zil_accelerator.pdf

  and you *really* want to make sure you get  the 4k alignment right

 Excellent point, starting on slide 66 the performance impact of partition
 misalignment is illustrated.  Considering the results, longevity might be
 an even greater concern than decreased IOPS performance as ZIL
 acceleration is a worst case scenario for a Flash based SSD.

  The DDRdrive is still the way to go for the ultimate ZIL accelleration,
  but it's pricey as hell.

 In addition to product cost, I believe IOPS/$ is a relevant point of
 comparison.

 Google products gives the price range for the OCZ 50GB SSDs:
 Vertex 2 EX (OCZSSD2-2VTXEX50G: $870 - $1,011 USD)
 Vertex 2 Pro (OCZSSD2-2VTXP50G:  $399 - $525 USD)

 4KB Sustained and Aligned Mixed Write IOPS results (See pdf above):
 Vertex 2 EX (6325 IOPS)
 Vertex 2 Pro (3252 IOPS)
 DDRdrive X1 (38701 IOPS)

 Using the lowest online price for both the Vertex 2 EX and Vertex 2 Pro,
 and the full list price (SRP) of the DDRdrive X1.

 IOPS/Dollar($):
 Vertex 2 EX (6325 IOPS / $870)  =  7.27
 Vertex 2 Pro (3252 IOPS / $399)  =  8.15
 DDRdrive X1 (38701 IOPS / $1,995)  =  19.40

 Best regards,




Why would you disable TRIM on an SSD benchmark?  I can't imagine anyone
intentionally crippling their drive in the real-world.  Furthermore, I don't
think 1 hour sustained is a very accurate benchmark.  Most workloads are
bursty in nature.  If you're doing sustained high-IOPS workloads like that,
the back-end is going to fall over and die long before the hour time-limit.
Your 38k IOPS would need nearly 500 drives to sustain that workload with any
kind of decent latency.  If you've got 500 drives, you're going to want a
hell of a lot more ZIL space than the ddrdrive currently provides.

I'm all for benchmarks, but try doing something a bit more realistic.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] OCZ RevoDrive ZFS support

On Sat, Nov 27, 2010 at 8:10 AM, Orvar Korvar 
knatte_fnatte_tja...@yahoo.com wrote:

 A noob question:

 These drives that people talk about, can you use them as a system disc too?
 Install Solaris 11 Express on them? Or can you only use them as a L2ARC or
 Zil?
 --


They're a standard SATA hard drive.  You can use them for whatever you'd
like.  For the price though, they aren't really worth the money to buy just
to put your OS on.   Your system drive on a Solaris system generally doesn't
see enough I/O activity to require the kind of IOPS you can get out of most
modern SSD's.  If you were using the system as a workstation, it'd
definitely help, as applications tend to feel more responsive with an SSD.
That's all I run in my laptops now.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] OCZ RevoDrive ZFS support

On Sat, Nov 27, 2010 at 2:16 PM, Orvar Korvar 
knatte_fnatte_tja...@yahoo.com wrote:

 Your system drive on a Solaris system generally doesn't see enough I/O
 activity to require the kind of IOPS you can get out of most modern SSD's. 

 My system drive sees a lot of activity, to the degree everything is going
 slow. I have a SunRay that my girlfriend use, and I have 5-10 torrents going
 on, and surf the web - often my system crawls. Very often my girlfriend gets
 irritated because everything lags and she frequently asks me if she can do
 some task, or if she should wait until I have finished copying my files.
 Unbearable.

 I have a quad core Intel 9450 at 2.66GHz, and 8GB RAM.

 I am planning to use a SSD and really hope it will be faster.




 $ iostat -xcnXCTdz 1

 cpu
 us sy wt id
  25  7  0 68
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0,00,00,00,0  0,0  0,00,00,0   0   0 c8
0,00,00,00,0  0,0  0,00,00,0   0   0 c8t0d0
   37,0  442,1 4489,6 51326,1  7,5  2,0   15,74,1  98 100 c7d0



Desktop usage is a different beast as I alluded to.  A dedicated server
typically doesn't have any issues.  I'd strongly suggest getting one of the
sandforce controller based SSD's.  They're the best on the market right now
by far.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Ext. UPS-backed SATA SSD ZIL?

On Sat, Nov 27, 2010 at 2:24 PM, Christopher George cgeo...@ddrdrive.comwrote:

  Why would you disable TRIM on an SSD benchmark?

 Because ZFS does *not* support TRIM, so the benchmarks
 are configured to replicate actual ZIL Accelerator workloads.

  If you're doing sustained high-IOPS workloads like that, the
  back-end is going to fall over and die long before the hour time-limit.

 The reason the graphs are done in a time line fashion is so you look
 at any point in the 1 hour series to see how each device performs.

 Best regards,



TRIM was putback in July...  You're telling me it didn't make it into S11
Express?

http://mail.opensolaris.org/pipermail/onnv-notify/2010-July/012674.html

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] OCZ RevoDrive ZFS support

On Sat, Nov 27, 2010 at 3:12 PM, Orvar Korvar 
knatte_fnatte_tja...@yahoo.com wrote:

 I am waiting for the next gen Intel SSD drives, G3. They are arriving very
 soon. And from what I can infer by reading here, I can use it without
 issues. Solaris will recognize the Intel SDD drive without any drivers
 needed, or whatever?

 Intel new SSD should work with Solaris 11 Express, yes?


You don't need drivers for any SATA based SSD.  It shows up as a standard
hard drive and plugs into a standard SATA port.  By the time the G3 Intel
drive is out, the next gen Sandforce should be out as well.  Unless Intel
does something revolutionary, they will still be behind the Sandforce
drives.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Ext. UPS-backed SATA SSD ZIL?