Re: [zfs-discuss] Version to upgrade to?

2010-02-02 Thread David Dyer-Bennet

On 2/2/2010 8:16 PM, Erik Trimble wrote:
As the contents of /etc/release indicates, you're using the 2009.06 
release of OpenSolaris, which is the current latest "stable" release.  
It was based on Build 111b of the source base (as shown by the 
'snv_111b' moniker in both 'uname -a' and /etc/release )


There will be some updates to this release, as the updatemanager GUI 
(or pkg CLI) will show you when you run it.  However, these updates 
aren't much more than a small section of critical bugfixes.


Ah, but we are now getting critical fixes, at least?



If you'd like to live on the bleeding edge, and have access to the 
latest Development builds, got to http://pkg.opensolaris.org and click 
on the 'dev' link for instructions as to how to change your update 
repository from the 'stable' branch to the 'development' repository.


As that link may be a bit hard to see, you can use this direct link:

http://pkg.opensolaris.org/dev/en/index.shtml


Okay, so those are really the alternatives?  I can't choose to install 
build 124 now, since it's nowhere near the latest?  I figured I must be 
missing something that would let me do that.




Under the heading of "if it ain't broke, don't fix it", I wouldn't 
bother to change to the dev branch at this point.  The next stable 
"named" release is due out in a month or so (tentatively for late 
March, though April is likely).  This will be automatically available 
via your current pkg repository.  This release should be based on b133 
or thereabouts.




Always an interesting question. I'm VERY interested in a fix in 122 
(it's been preventing my backups from working decently for a YEAR; have 
to do a full, can't do incrementals).  I suspect I'll go to bleeding 
edge, find one that works (which I hope will be the first week I try!), 
and then stay there until the next stable is out, and flip back to 
stable.  Shouldn't need new features for a while :-).


--
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Version to upgrade to?

2010-02-02 Thread Tim Cook
On Tue, Feb 2, 2010 at 8:16 PM, Erik Trimble  wrote:

> As the contents of /etc/release indicates, you're using the 2009.06 release
> of OpenSolaris, which is the current latest "stable" release.  It was based
> on Build 111b of the source base (as shown by the 'snv_111b' moniker in both
> 'uname -a' and /etc/release )
>
> There will be some updates to this release, as the updatemanager GUI (or
> pkg CLI) will show you when you run it.  However, these updates aren't much
> more than a small section of critical bugfixes.
>
> If you'd like to live on the bleeding edge, and have access to the latest
> Development builds, got to http://pkg.opensolaris.org and click on the
> 'dev' link for instructions as to how to change your update repository from
> the 'stable' branch to the 'development' repository.
>
> As that link may be a bit hard to see, you can use this direct link:
>
> http://pkg.opensolaris.org/dev/en/index.shtml
>
>
>
> Under the heading of "if it ain't broke, don't fix it", I wouldn't bother
> to change to the dev branch at this point.  The next stable "named" release
> is due out in a month or so (tentatively for late March, though April is
> likely).  This will be automatically available via your current pkg
> repository.  This release should be based on b133 or thereabouts.
>
> -Erik
>
>
>

As an aside, is the stable branch being regularly patched now with security
and bug fixes?

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help needed with zfs send/receive

2010-02-02 Thread Brent Jones
On Tue, Feb 2, 2010 at 7:41 PM, Brent Jones  wrote:
> On Tue, Feb 2, 2010 at 12:05 PM, Arnaud Brand  wrote:
>> Hi folks,
>>
>> I'm having (as the title suggests) a problem with zfs send/receive.
>> Command line is like this :
>> pfexec zfs send -Rp tank/t...@snapshot | ssh remotehost pfexec zfs recv -v -F
>> -d tank
>>
>> This works like a charm as long as the snapshot is small enough.
>>
>> When it gets too big (meaning somewhere between 17G and 900G), I get ssh
>> errors (can't read from remote host).
>>
>> I tried various encryption options (the fastest being in my case arcfour)
>> with no better results.
>> I tried to setup a script to insert dd on the sending and receiving side to
>> buffer the flow, still read errors.
>> I tried with mbuffer (which gives better performance), it didn't get better.
>> Today I tried with netcat (and mbuffer) and I got better throughput, but it
>> failed at 269GB transferred.
>>
>> The two machines are connected to the switch with 2x1GbE (Intel) joined
>> together with LACP.
>> The switch logs show no errors on the ports.
>> kstat -p | grep e1000g shows one recv error on the sending side.
>>
>> I can't find anything in the logs which could give me a clue about what's
>> happening.
>>
>> I'm running build 131.
>>
>> If anyone has the slightest clue of where I could look or what I could do to
>> pinpoint/solve the problem, I'd be very gratefull if (s)he could share it
>> with me.
>>
>> Thanks and have a nice evening.
>>
>> Arnaud
>>
>>
>>
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>
>>
>
> This issue seems to have started after snv_129 for me. I get "connect
> reset by peer", or transfers (of any kind) simply timeout.
>
> Smaller transfers succeed most of the time, while larger ones usually
> fail. Rolling back to snv_127 (my last one) does not exhibit this
> issue. I have not had time to narrow down any causes, but I did find
> one bug report that found some TCP test scenarios failed during one of
> the builds, but unable to find that CR at this time.
>
> --
> Brent Jones
> br...@servuhome.net
>

Ah, I found the CR that seemed to describe the situation (broken
pipe/connection reset by peer)

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6905510


-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help needed with zfs send/receive

2010-02-02 Thread Brent Jones
On Tue, Feb 2, 2010 at 12:05 PM, Arnaud Brand  wrote:
> Hi folks,
>
> I'm having (as the title suggests) a problem with zfs send/receive.
> Command line is like this :
> pfexec zfs send -Rp tank/t...@snapshot | ssh remotehost pfexec zfs recv -v -F
> -d tank
>
> This works like a charm as long as the snapshot is small enough.
>
> When it gets too big (meaning somewhere between 17G and 900G), I get ssh
> errors (can't read from remote host).
>
> I tried various encryption options (the fastest being in my case arcfour)
> with no better results.
> I tried to setup a script to insert dd on the sending and receiving side to
> buffer the flow, still read errors.
> I tried with mbuffer (which gives better performance), it didn't get better.
> Today I tried with netcat (and mbuffer) and I got better throughput, but it
> failed at 269GB transferred.
>
> The two machines are connected to the switch with 2x1GbE (Intel) joined
> together with LACP.
> The switch logs show no errors on the ports.
> kstat -p | grep e1000g shows one recv error on the sending side.
>
> I can't find anything in the logs which could give me a clue about what's
> happening.
>
> I'm running build 131.
>
> If anyone has the slightest clue of where I could look or what I could do to
> pinpoint/solve the problem, I'd be very gratefull if (s)he could share it
> with me.
>
> Thanks and have a nice evening.
>
> Arnaud
>
>
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>

This issue seems to have started after snv_129 for me. I get "connect
reset by peer", or transfers (of any kind) simply timeout.

Smaller transfers succeed most of the time, while larger ones usually
fail. Rolling back to snv_127 (my last one) does not exhibit this
issue. I have not had time to narrow down any causes, but I did find
one bug report that found some TCP test scenarios failed during one of
the builds, but unable to find that CR at this time.

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread Marc Nicholas
On Tue, Feb 2, 2010 at 9:52 PM, Toby Thain  wrote:

>
> On 2-Feb-10, at 1:54 PM, Orvar Korvar wrote:
>
>  100% uptime for 20 years?
>>
>> So what makes OpenVMS so much more stable than Unix? What is the
>> difference?
>>
>
>
> The short answer is that uptimes like that are VMS *cluster* uptimes.
> Individual hosts don't necessarily have that uptime, but the cluster
> availability is maintained for extremely long periods.
>
> You can probably find more discussion of this in comp.os.vms.


And the 15MB/sec of I/O throughput on that state-of-the-art cluster is
something to write home about? ;)

Seriously, as someone alluded to earlier, we're not comparing apples to
applies. And a 9000 series VAX Cluster was one of the earlier multi-user
systems I worked on for reference ;)

Making that kind of stuff work with modern expectations and tolerances is a
whole new kettle of fish...


-marc
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread Toby Thain


On 2-Feb-10, at 1:54 PM, Orvar Korvar wrote:


100% uptime for 20 years?

So what makes OpenVMS so much more stable than Unix? What is the  
difference?



The short answer is that uptimes like that are VMS *cluster* uptimes.  
Individual hosts don't necessarily have that uptime, but the cluster  
availability is maintained for extremely long periods.


You can probably find more discussion of this in comp.os.vms.

--Toby


--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] PCI-E CF adapter?

2010-02-02 Thread Frank Cusack
On January 14, 2010 1:08:56 PM -0500 Frank Cusack  
wrote:

I know this is slightly OT but folks discuss zfs compatible hardware
here all the time. :)

Has anyone used something like this combination?




It'd be nice to have externally accessible CF slots for my NAS.


I couldn't use the PCIe adapter as above, it is a full profile card.
I did find a low profile card for roughly twice as much, but no luck
in my server (x2270).  The BIOS does not recognize it as a device
I can boot from.  I didn't go any further than that.

In the X2270 it's not really useful anyway.  The latch mechanism
for the PCIe slot interferes with the extension of the ExpressCard
adapter.  It does just fit when you get the case all buttoned up,
but the EC retention mechanism is push-to-release and when you
insert the EC/CF card into the adapter, this pushes the EC card
adapter into the PCIe card, triggering its release.  Because
the PCIe latch on the server case interferes with the EC release,
you can't actually eject the EC/CF adapter enough to subsequently
push it back in to lock it.  I know that's all hard to understand
and it's probably not worth anyone's time to re-read it closely.
Sorry about that.

I don't have another machine with PCIe slots to try.  I'll just
stick with USB sticks I guess.  Since I'd avoid writing to the
CF anyway, it's not much different.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Version to upgrade to?

2010-02-02 Thread Erik Trimble
As the contents of /etc/release indicates, you're using the 2009.06 
release of OpenSolaris, which is the current latest "stable" release.  
It was based on Build 111b of the source base (as shown by the 
'snv_111b' moniker in both 'uname -a' and /etc/release )


There will be some updates to this release, as the updatemanager GUI (or 
pkg CLI) will show you when you run it.  However, these updates aren't 
much more than a small section of critical bugfixes.


If you'd like to live on the bleeding edge, and have access to the 
latest Development builds, got to http://pkg.opensolaris.org and click 
on the 'dev' link for instructions as to how to change your update 
repository from the 'stable' branch to the 'development' repository.


As that link may be a bit hard to see, you can use this direct link:

http://pkg.opensolaris.org/dev/en/index.shtml



Under the heading of "if it ain't broke, don't fix it", I wouldn't 
bother to change to the dev branch at this point.  The next stable 
"named" release is due out in a month or so (tentatively for late March, 
though April is likely).  This will be automatically available via your 
current pkg repository.  This release should be based on b133 or 
thereabouts.


-Erik




David Dyer-Bennet wrote:

I'm currently running:

 OpenSolaris 2009.06 snv_111b X86
   Copyright 2009 Sun Microsystems, Inc.  All Rights Reserved.
Use is subject to license terms.
  Assembled 07 May 2009

Or uname shows "SunOS fsfs 5.11 snv_111b i86pc i386 i86pc".

I'm totally confused about versions of this thing, by the way, and releases.

This is a home NAS server, running (currently) 4 data disks in two
mirrored pairs in the data pool.  Disks are SATA hot-swap on the on-board
controllers on an Asus M2N-SLI Deluxe motherboard, in case that matters
for what version is suitable.  I'm adding a Supermicro UIO MegaRAID
AOC-USAS-L8i controller, in case THAT matters for what version is
suitable.  I'm using CIFS.  The data pool is currently 800GB, about to
become 1.2TB when I add a third pair of disks.  That's about it.

So, what's the best easy-to-install opensolaris upgrade for me to go to? 
And how could I have figured this out for myself (like a list of what's in

the repository and what it's called maybe)?  And how do I go about
updating to it?  I currently have opensolaris-1 and opensolaris-2 boot
environments on my rpool, which confirms my memory that I've used pkg to
update in the past.  Since the hardware is either already working, or
something widely said to work well in Solaris, I think the primary concern
in picking a version is the state of the ZFS code, hence asking here.

To put it differently:

If I wanted to upgrade to build 124, say, or 130, how would I do that? 
What would I type?


And the other half of the question, what's the best stable built around
this week?

  



--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help needed with zfs send/receive

2010-02-02 Thread Arnaud Brand




Le 02/02/10 22:49, Tim Cook a écrit :

  
  On Tue, Feb 2, 2010 at 3:25 PM, Richard
Elling 
wrote:
  On
Feb 2, 2010, at 12:05 PM, Arnaud Brand wrote:
> Hi folks,
>
> I'm having (as the title suggests) a problem with zfs send/receive.
> Command line is like this :
> pfexec zfs send -Rp tank/t...@snapshot | ssh remotehost pfexec zfs
recv -v -F -d tank
>
> This works like a charm as long as the snapshot is small enough.
>
> When it gets too big (meaning somewhere between 17G and 900G), I
get ssh errors (can't read from remote host).
>
> I tried various encryption options (the fastest being in my case
arcfour) with no better results.
> I tried to setup a script to insert dd on the sending and
receiving side to buffer the flow, still read errors.
> I tried with mbuffer (which gives better performance), it didn't
get better.
> Today I tried with netcat (and mbuffer) and I got better
throughput, but it failed at 269GB transferred.
>
> The two machines are connected to the switch with 2x1GbE (Intel)
joined together with LACP.

LACP is spawned from the devil to plague mankind.  It won't
help your ssh transfer at all. It will cause your hair to turn grey and
get pulled out by the roots.  Try turning it off or using a separate
network for your transfer.
 -- richard

  
  
 
  That's a bit harsh :)  
  
To further what Richard said though, LACP isn't going to help with your
issue.  LACP is NOT round-robin load balancing.  Think of it more like
source-destination.  You need to have multiple connections going out to
different source/destination mac/ip/whatever addresses.  Typically it
works great for something like a fileserver that has 50 clients hitting
it.  Then those clients will be balanced across the multiple links. 
When you've got one server talking to one other server, it isn't going
to buy you much of anything 99% of the time.
  
Also, depending on your switch, it can actually hamper you quite a
bit.  If you've got a good cisco/hp/brocade/extreme
networks/force10/etc switch, it's fine.  If you've got a $50 soho
netgear, you typically are going to get what you paid for :)
  
  
  
--Tim


I'll remove LACP when I get back to work tomorrow (that's in a few
hours).
I already knew about it's principles (doesn't hurt to repeat them
though), but as we have at least two machines connecting simultaneously
to this server, plus occasionnal clients, plus the replication stream,
I thought I could win some bandwidth.
I think I should've stayed by the rule : first make it work, then make
it fast.

In the mean time, I've launched the same command with a dd to a local
file instead of a zfs recv (ie:something along the lines pfexec
zfs send -Rp tank/t...@snapshot | ssh remotehost dd of=/tank/repl.zfs
bs=128k).

I hope I'm not running into the issues related to e1000g problems under
load (zfs recv eats up all the CPU when it flushes and then the
transfer almost stalls for a second or two).
For the switch, it's an HP4208 with reasonnably up to date firmware
(less than 6 month old, next update of our switches scheduled on Feb,
20th).
Strange thing is that the connection is lost on the sending side, but
the receiving side show it's still "established" (in netstat -an).
I could try changing the network cables too, maybe one of them has a
problem.

Thanks,
Arnaud





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread Bob Friesenhahn

On Tue, 2 Feb 2010, Miles Nordin wrote:


"fc" == Frank Cusack  writes:


   fc> by FCoE are you talking about iSCSI?

FCoE is an L2 design where ethernet ``pause'' frames can be sent
specific to one of the seven CoS levels instead of applying to the
entire port, which makes PAUSE abuseable for other purposes than their


Please redirect this encyclopedia contribution to WikiPedia, where it 
belongs.


Thanks,

Bob


former one.  CoS is an L2 priority/QoS tag inside the VLAN header.
Before this hack, pause frames are not useful for congestion
management because they cause head-of-line blocking, so serious
switches only send them in response to backplane congestion, and for
example serious hosts might send them for PCI contention, if clever
enough.  With the hack, the HOL-blocking effect of a PAUSE still
spreads further than you might ideally like but can be constrained to
one of the seven CoS planes in your fabric (probably, the Storage
plane).  This lets you have an HOL-blocking, lossless storage fabric
in parallel with a buffered TCP fabric that is not lossless (uses
packet drops for congestion control like normal Ethernet).  You will
find some squirrely language from FCoE proponents around these issues
because they are trying to convince you that you have every desireable
buzzword in every part of your network, while in fact what you're
doing is making the same wise trade-off that every other non-Ethernet
LAN fabric has always made.  My parallel point is that the
HOL-blocking lossless fabric is *CHEAPER* to create, not nmore
expensive.  It is less capable.  It has no buffers and therefore no
QoS.  It just happens to be what's best for storage.  so, they want
you to pay the prices of a multi-queued QoSed WRED big-buffered
non-blocking fabric suitable for transit traffic even though you
mostly just need to push storage bits: classic upsell, just like all
those ``XL'' PFC's they try to push off to customers who are not even
in the DFZ.

FCoE also includes a bunch of expensive hocus-pokus to bridge these
frames onto a traditional FC-switched network and do a bunch of other
things I don't understand like FC zoning and F-SPF.  Most of the pitch
dwells on this, trying to convince you they've made things ``simpler''
for you because it's once piece of wire.  This seems like an
anti-feature to me: wire's cheap while understanding things is hard,
and now everyone's forced to catch up and learn Fibre Channel before
it's safe to touch anything.  Good in the long run, absolutely.
Cheaper, fuck no.

but the legitimate pitch for FCoE over iSCSI, to my view right now,
comes from not from this management baloney but from the seven CoS
levels, and the possibility some can be blocking and others buffered.
Internet *transit* traffic (as opposed to end systems), and anything
high-rtt, *must* be buffered, while within the LAN my current thinking
is that you're better off with a 10Gbit/s HOL-blocking bufferless link
than a 1Gbit/s non-blocking buffered link.  The latter applies double
for storage traffic which, made up of UDP-like reads and writes where
you are stuck trying to perfect TCP to avoid blowing the buffers of
normal switches while still getting yourself out of slow-start before
the transaction's over andn doing all this in an environment where you
cannot even convince thick-skulled netadmins they NEED to provide RED,
not this bullshit ``weighted tail drop'' of 3560 u.s.w., and which
besides really need backpressure from the fabric so they can be QoS'ed
in the initiator's stack ahead of the network so that for example
scrubs don't slow down pools (don't you find this happens more over
iSCSI than over SAS?).  I'm saaying, um...shit...saying, ``You need to
think, about what you are trying to accomplish,'' and that Sun might
have a suite of protocols based on ancient IB stuff that accomplishes
more than FCoE, and does it cheaper (to them) and more simply, so,
following their usual annoying plan, step 2 charge FCoE prices minus
, step 3 profit.

meanwhile mellanox, having forseen all this and built open standards
to solve it, is out there desperately trying to push some baloney
called Etherband or something because all you bank admins are too daft
to buy anything that does not have Ether in the name. :(



--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] CR 6811542: that's interesting

2010-02-02 Thread David Dyer-Bennet

On Tue, February 2, 2010 17:34, James C. McPherson wrote:
> On  3/02/10 09:31 AM, David Dyer-Bennet wrote:
>>
>> Can anybody who can see the CR online figure out what release build the
>> fix was / will be in?  Speaking of what build I should upgrade to :-).
>
> closed as duplicate of
>
>
> 6696858 zfs receive of incremental replication stream can dereference NULL
> pointer and crash
>
> which was fixed in snv_122.

Excellent!  Since all the builds I'm currently considering are greater
than that.  This has been mucking up my backup procedure (making me do
full backups instead of incrementals) for a year, meaning I haven't done
enough of them.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] CR 6811542: that's interesting

2010-02-02 Thread James C. McPherson

On  3/02/10 09:31 AM, David Dyer-Bennet wrote:

Just got a bunch of change notification emails on my report of a zfs
send/receive segfault. I can't find this online anywhere though, so I
can't check a couple of things.

Strange thing though is that although it just changed ownership and then
state today in the system, the history records entered seem to show most
of the actual changes were a year ago.  And given the description of the
problem (pointer-handling error in strong processing when a directory path
didn't end with a slash, if I read it correctly), there was an easy
workaround all this time, too.  Damn.

Can anybody who can see the CR online figure out what release build the
fix was / will be in?  Speaking of what build I should upgrade to :-).


closed as duplicate of


6696858 zfs receive of incremental replication stream can dereference NULL 
pointer and crash


which was fixed in snv_122.



James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] CR 6811542: that's interesting

2010-02-02 Thread David Dyer-Bennet
Just got a bunch of change notification emails on my report of a zfs
send/receive segfault. I can't find this online anywhere though, so I
can't check a couple of things.

Strange thing though is that although it just changed ownership and then
state today in the system, the history records entered seem to show most
of the actual changes were a year ago.  And given the description of the
problem (pointer-handling error in strong processing when a directory path
didn't end with a slash, if I read it correctly), there was an easy
workaround all this time, too.  Damn.

Can anybody who can see the CR online figure out what release build the
fix was / will be in?  Speaking of what build I should upgrade to :-).

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Version to upgrade to?

2010-02-02 Thread David Dyer-Bennet
I'm currently running:

 OpenSolaris 2009.06 snv_111b X86
   Copyright 2009 Sun Microsystems, Inc.  All Rights Reserved.
Use is subject to license terms.
  Assembled 07 May 2009

Or uname shows "SunOS fsfs 5.11 snv_111b i86pc i386 i86pc".

I'm totally confused about versions of this thing, by the way, and releases.

This is a home NAS server, running (currently) 4 data disks in two
mirrored pairs in the data pool.  Disks are SATA hot-swap on the on-board
controllers on an Asus M2N-SLI Deluxe motherboard, in case that matters
for what version is suitable.  I'm adding a Supermicro UIO MegaRAID
AOC-USAS-L8i controller, in case THAT matters for what version is
suitable.  I'm using CIFS.  The data pool is currently 800GB, about to
become 1.2TB when I add a third pair of disks.  That's about it.

So, what's the best easy-to-install opensolaris upgrade for me to go to? 
And how could I have figured this out for myself (like a list of what's in
the repository and what it's called maybe)?  And how do I go about
updating to it?  I currently have opensolaris-1 and opensolaris-2 boot
environments on my rpool, which confirms my memory that I've used pkg to
update in the past.  Since the hardware is either already working, or
something widely said to work well in Solaris, I think the primary concern
in picking a version is the state of the ZFS code, hence asking here.

To put it differently:

If I wanted to upgrade to build 124, say, or 130, how would I do that? 
What would I type?

And the other half of the question, what's the best stable built around
this week?

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool status output confusing

2010-02-02 Thread James C. McPherson

On  3/02/10 01:31 AM, Tonmaus wrote:

Hi James,

am I right to understand that in a nutshell the problem is that if
page  80/83 information is present but corrupt/inaccurate/forged (name

> it as you want), zfs will not get to down to the GUID?

Hi Tonmaus,
If page83 information is present, ZFS will use it. The problem
that Moshe came across is that with the controller he used,
ARC-1680ix, the target/lun assignment algorithm in the firmware
made the disks move around from ZFS' point of view - it appeared
that the firmware was screwing around with the Page83 info and
rather than keeping the info associated with the specific device,
it was ... movinng things around of its own accord.

The GUID is generated from the device id (aka devid) which is
generated from [(1) page83, (2) page80, (3) well-known method of
fabrication] information. You can read more about this in my
presentation about GUIDs and devids:

http://www.jmcp.homeunix.com/~jmcp/WhatIsAGuid.pdf


cheers,
James
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread Miles Nordin
> "fc" == Frank Cusack  writes:

fc> by FCoE are you talking about iSCSI?

FCoE is an L2 design where ethernet ``pause'' frames can be sent
specific to one of the seven CoS levels instead of applying to the
entire port, which makes PAUSE abuseable for other purposes than their
former one.  CoS is an L2 priority/QoS tag inside the VLAN header.
Before this hack, pause frames are not useful for congestion
management because they cause head-of-line blocking, so serious
switches only send them in response to backplane congestion, and for
example serious hosts might send them for PCI contention, if clever
enough.  With the hack, the HOL-blocking effect of a PAUSE still
spreads further than you might ideally like but can be constrained to
one of the seven CoS planes in your fabric (probably, the Storage
plane).  This lets you have an HOL-blocking, lossless storage fabric
in parallel with a buffered TCP fabric that is not lossless (uses
packet drops for congestion control like normal Ethernet).  You will
find some squirrely language from FCoE proponents around these issues
because they are trying to convince you that you have every desireable
buzzword in every part of your network, while in fact what you're
doing is making the same wise trade-off that every other non-Ethernet
LAN fabric has always made.  My parallel point is that the
HOL-blocking lossless fabric is *CHEAPER* to create, not nmore
expensive.  It is less capable.  It has no buffers and therefore no
QoS.  It just happens to be what's best for storage.  so, they want
you to pay the prices of a multi-queued QoSed WRED big-buffered
non-blocking fabric suitable for transit traffic even though you
mostly just need to push storage bits: classic upsell, just like all
those ``XL'' PFC's they try to push off to customers who are not even
in the DFZ.

FCoE also includes a bunch of expensive hocus-pokus to bridge these
frames onto a traditional FC-switched network and do a bunch of other
things I don't understand like FC zoning and F-SPF.  Most of the pitch
dwells on this, trying to convince you they've made things ``simpler''
for you because it's once piece of wire.  This seems like an
anti-feature to me: wire's cheap while understanding things is hard,
and now everyone's forced to catch up and learn Fibre Channel before
it's safe to touch anything.  Good in the long run, absolutely.
Cheaper, fuck no.

but the legitimate pitch for FCoE over iSCSI, to my view right now,
comes from not from this management baloney but from the seven CoS
levels, and the possibility some can be blocking and others buffered.
Internet *transit* traffic (as opposed to end systems), and anything
high-rtt, *must* be buffered, while within the LAN my current thinking
is that you're better off with a 10Gbit/s HOL-blocking bufferless link
than a 1Gbit/s non-blocking buffered link.  The latter applies double
for storage traffic which, made up of UDP-like reads and writes where
you are stuck trying to perfect TCP to avoid blowing the buffers of
normal switches while still getting yourself out of slow-start before
the transaction's over andn doing all this in an environment where you
cannot even convince thick-skulled netadmins they NEED to provide RED,
not this bullshit ``weighted tail drop'' of 3560 u.s.w., and which
besides really need backpressure from the fabric so they can be QoS'ed
in the initiator's stack ahead of the network so that for example
scrubs don't slow down pools (don't you find this happens more over
iSCSI than over SAS?).  I'm saaying, um...shit...saying, ``You need to
think, about what you are trying to accomplish,'' and that Sun might
have a suite of protocols based on ancient IB stuff that accomplishes
more than FCoE, and does it cheaper (to them) and more simply, so,
following their usual annoying plan, step 2 charge FCoE prices minus
, step 3 profit.

meanwhile mellanox, having forseen all this and built open standards
to solve it, is out there desperately trying to push some baloney
called Etherband or something because all you bank admins are too daft
to buy anything that does not have Ether in the name. :(


pgpyvz3N8H8Ve.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread Marc Nicholas
I believe magical unicorn controllers and drives are both bug-free and
100% spec compliant. The leprichorns sell them if you're trying to
find them ;)

-marc

On 2/2/10, David Magda  wrote:
> On Feb 2, 2010, at 15:21, Tim Cook wrote:
>
>> How exactly do you suggest the drive manufacturers make their drives
>> "just
>> work" with every SAS/SATA controller on the market, and all of the
>> quirks
>> they have?  You're essentially saying you want the drive
>> manufacturers to do
>> what the storage vendors are doing today (all of the integration
>> work), only
>> not charge you for it.
>
> One way is to have bug-free firmware software in both the disk and the
> controllers that follows the specs perfectly. :)
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

-- 
Sent from my mobile device
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread Richard Elling
On Feb 2, 2010, at 2:56 PM, David Magda wrote:
> 
> On Feb 2, 2010, at 15:17, Tim Cook wrote:
> 
>> On Tue, Feb 2, 2010 at 12:54 PM, Orvar Korvar wrote:
>> 
>>> 100% uptime for 20 years?
>>> 
>>> So what makes OpenVMS so much more stable than Unix? What is the
>>> difference?
>>> 
>> 
>> They had/have clustering software that was/is bulletproof.  I don't think
>> anyone in the Unix community has duplicated it to date.  As for differences,
>> google is your friend?
>> 
>> http://www3.sympatico.ca/n.rieck/docs/vms_vs_unix.html
> 
> And by "clustering" we're not talking about something like Sun Cluster where 
> it restarts an application after a node fails. It's more along the lines of 
> multiple machines acting as a single server (though each runs its own copy of 
> the OS--not a single image system):

Did you ever wonder why Solaris Cluster seemed to be overkill for a 
simple failover service?  The original design goals for Solaris Cluster
looked a lot more like VMScluster than what you see today in Solaris
Cluster. You can see the remnants remain in the code and features like
pxfs (no, ZFS won't work with pxfs). The barriers to bringing such 
technology from a simple process model (like VMS) to a modern OS
like Solaris are daunting.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread David Magda

On Feb 2, 2010, at 15:21, Tim Cook wrote:

How exactly do you suggest the drive manufacturers make their drives  
"just
work" with every SAS/SATA controller on the market, and all of the  
quirks
they have?  You're essentially saying you want the drive  
manufacturers to do
what the storage vendors are doing today (all of the integration  
work), only

not charge you for it.


One way is to have bug-free firmware software in both the disk and the  
controllers that follows the specs perfectly. :)


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread David Magda


On Feb 2, 2010, at 15:17, Tim Cook wrote:


On Tue, Feb 2, 2010 at 12:54 PM, Orvar Korvar wrote:


100% uptime for 20 years?

So what makes OpenVMS so much more stable than Unix? What is the
difference?



They had/have clustering software that was/is bulletproof.  I don't  
think
anyone in the Unix community has duplicated it to date.  As for  
differences,

google is your friend?

http://www3.sympatico.ca/n.rieck/docs/vms_vs_unix.html


And by "clustering" we're not talking about something like Sun Cluster  
where it restarts an application after a node fails. It's more along  
the lines of multiple machines acting as a single server (though each  
runs its own copy of the OS--not a single image system):


http://en.wikipedia.org/wiki/OpenVMS#Clustering
http://en.wikipedia.org/wiki/VMScluster

It was originally released in VMS version 4 back in 1984.

VMS originally ran on VAX, was ported to DEC's Alpha (now dead), and  
is now on Intel's Itanium (not that popular AFAIK).


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread Richard Elling
On Feb 2, 2010, at 1:56 PM, Frank Cusack wrote:

> On February 2, 2010 4:31:47 PM -0500 Miles Nordin  wrote:
>> and FCoE is just dumb if you have IB, honestly.
> 
> by FCoE are you talking about iSCSI?

FCoE is to iSCSI as Netware (IPX/SPX) is to NFS :-)
 -- richard


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread Bob Friesenhahn

On Tue, 2 Feb 2010, Frank Cusack wrote:


On February 2, 2010 4:31:47 PM -0500 Miles Nordin  wrote:

 and FCoE is just dumb if you have IB, honestly.


by FCoE are you talking about iSCSI?


No.  They are different.  FCoE uses "raw" ethernet packets and 
ethernet switches can/should be specially designed to support it 
whereas iSCSI is a TCP-based protocol.  FCoE is basically fiber 
channel "SAN" protocol over ethernet.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread Frank Cusack

On February 2, 2010 4:31:47 PM -0500 Miles Nordin  wrote:

 and FCoE is just dumb if you have IB, honestly.


by FCoE are you talking about iSCSI?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread Richard Elling
On Feb 2, 2010, at 10:54 AM, Orvar Korvar wrote:

> 100% uptime for 20 years? 
> 
> So what makes OpenVMS so much more stable than Unix? What is the difference?

Software reliability studies show that the more reliable software is
old software that hasn't changed :-)

On Feb 2, 2010, at 12:42 PM, David Dyer-Bennet wrote:
> I'm suggesting that the standard for the interface ought to be
> sufficiently standardized and well-enough documented that things meeting
> it just work, in the way that desktop motherboards and disk drives "just
> work", i.e. well enough for nearly everybody.  I understand why people
> pushing the limits would need custom-tuned hardware, but I don't think the
> middle of the market should need it.

Every mobo and disk I own has quirks. The only time things settle down
to a widely accepted norm is when innovation stops.  For example, recently
the SATA TRIM command has received a lot of press.  Next quarter, it will
be some other feature on the buzzword list.

> The controllers shouldn't be full of quirks; companies that routinely make
> them that way need to clean up their act or be driven out of the market. 
> Same for the drives.

I think there are only about 5 HDD companies (Hitachi, Seagate, Western
Digital, Samsung, Toshiba) and 3 controller companies today (LSI, Marvell, 
Intel). The remainder are in the process of getting out of the business or being
bought.  Interesting history here:
http://en.wikipedia.org/wiki/List_of_defunct_hard_disk_manufacturers
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help needed with zfs send/receive

2010-02-02 Thread Tim Cook
On Tue, Feb 2, 2010 at 3:25 PM, Richard Elling wrote:

> On Feb 2, 2010, at 12:05 PM, Arnaud Brand wrote:
> > Hi folks,
> >
> > I'm having (as the title suggests) a problem with zfs send/receive.
> > Command line is like this :
> > pfexec zfs send -Rp tank/t...@snapshot | ssh remotehost pfexec zfs recv
> -v -F -d tank
> >
> > This works like a charm as long as the snapshot is small enough.
> >
> > When it gets too big (meaning somewhere between 17G and 900G), I get ssh
> errors (can't read from remote host).
> >
> > I tried various encryption options (the fastest being in my case arcfour)
> with no better results.
> > I tried to setup a script to insert dd on the sending and receiving side
> to buffer the flow, still read errors.
> > I tried with mbuffer (which gives better performance), it didn't get
> better.
> > Today I tried with netcat (and mbuffer) and I got better throughput, but
> it failed at 269GB transferred.
> >
> > The two machines are connected to the switch with 2x1GbE (Intel) joined
> together with LACP.
>
> LACP is spawned from the devil to plague mankind.  It won't
> help your ssh transfer at all. It will cause your hair to turn grey and
> get pulled out by the roots.  Try turning it off or using a separate
> network for your transfer.
>  -- richard
>
>

That's a bit harsh :)

To further what Richard said though, LACP isn't going to help with your
issue.  LACP is NOT round-robin load balancing.  Think of it more like
source-destination.  You need to have multiple connections going out to
different source/destination mac/ip/whatever addresses.  Typically it works
great for something like a fileserver that has 50 clients hitting it.  Then
those clients will be balanced across the multiple links.  When you've got
one server talking to one other server, it isn't going to buy you much of
anything 99% of the time.

Also, depending on your switch, it can actually hamper you quite a bit.  If
you've got a good cisco/hp/brocade/extreme networks/force10/etc switch, it's
fine.  If you've got a $50 soho netgear, you typically are going to get what
you paid for :)

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread Miles Nordin
> "bh" == Brandon High  writes:
> "ok" == Orvar Korvar  writes:
> "mp" == matthew patton  writes:

bh> This one holds "only" 24 drives:
bh> http://www.supermicro.com/products/chassis/4U/846/SC846TQ-R900.cfm
bh> ($950)

This one holds only 20 drives.  includes fan, not power supply.
the airflow management seems shit but basically works okay:

 
http://www.servercase.com/Merchant2/merchant.mvc?Screen=PROD&Product_Code=CK4020&Category_Code=4UBKBLN
 ($375.38 shipping included)
 (pornographer's-grade sleds)

I have one of these and am happy with it.  It's 20 plain SATA ports,
passively.  Competitors' cases are including SAS switches on the
backplane, and since I don't know how to buy switches apart from cases
you may need to get a competing case if yuo want a SAS switch.

I found it cheaper to use ATI 790fx board that has four 8-lane PCIe
slots (16x for power, but 8 data lanes active when all four slots are
in use) and then use 1068e cards with breakout cables, so basically
the 790fx chip does the multiplexing.  Obviously this does not scale
as far as a SAS fabric: the 4 PCIe will handle 16 drives, an 8x NIC of
some kind, and an nVidia card.

ok> Is it possible to have large chassi with lots of
ok> drives, and the opensolaris in another chassi, how do you
ok> connect them both?

SAS.  The cabling and the switching chips are eerily reminiscent of
infiniband, but while I think IB has open source drivers and stacks
and relatively generic proprietary firmware, more of the brains of the
SAS fabric seem to be controlled by proprietary software running as
``firmware'' on all the LSI chips involved, so I think this landscape
might make harder smartctl, using a cd burner, or offering a SAS
target into the fabric through COMSTAR (AFAIK none of these things
work now---I guess we'll see what evolves).  but I guess no one built
cheap single chips to tunnel SATA inside IB, so here we are with
broken promises and compromises and overpriced sillyness like FCoE.

In the model numbers of the old 3Gbit/s LSI cards, the second digit
was the number of external ports, and the third digit the number of
internal ports.  For example LSI SAS3801E-R is a mega_sas-drivered
(raid-on-a-card) with 8 external ports, and LSI SAS3081E-R has 8
internal ports.  but if you want a cheaper card with IT firmware for
mpt driver, without RAID-on-a card, yuo may have to hunt some more.
The external ports are offered on one or two single connectors with
four ``ports'' per connector---each of the four can be broken out and
connected to an individual disk using a passive four-legged-octopus
cable, or bonded together in sets of four to form a single faster
logical link to a SAS switch chip.  beyond that I don't really know
how it all works.  I'm probably telling you stuff you already know but
at least hopefully now everyone's caught up.

mp> I buy a Ferrari for the engine and bodywork and chassis
mp> engineering.

Didn't they go through bankruptcy repeatedly and then get bought by
Fiat?  Whatever this folded-sheetmetal crap thing from so-called
``servercase.com'' is, it's probably backed secretly by the chinese
government, and I bet it outlasts your fancy J4500.  This seems to me
like a bad situation, but I'm not sure what to do about it.

There are many ways to slice the market vertically.  For example you
could also get your integration done by renting whitebox crap through
a server-rental company that rents you dedicated storage or compute
nodes at 10 or 100 at a time pre-connected by network equipment they
won't discuss with you (probably cheaper than the cisco stuff you'd
definitely buy if it were your own ass on the line).  Part of Sun's
function is prequalifying but another part is to reach inside their
customer's organizations, extract experience, and then share it among
all customers discretely without any one customer feeling violated.  A
hardware rental company can do the same thing, and I bet they can do
it at similar scale, with a lot less political bullshit.  I think
there's a big messy underground market of these shady rental companies
in parallel to the above-ground overblogged overpriced flakey EC2-like
stuff.  My hope is that the IB stack, in which Sun's also apparently
deeply invested with both Solaris and IB-included blades and switches
and backplanes and onboard MAC's, will start taking a chunk out of
Cisco's pie.  Meanwhile the box-renting company extracts money from
you by performing an ass-covering function: they can buy cheap risky
things you can't, and then you say to your minders, ``other people buy
from them too.  It was not reckless to become their customer.  I've
both saved you money and manoevered you the agility to evade the
problems we've had without writing off a lot of purchases,'' when
really what you are outsourcing here is your own expensive CYA
tendencies.

But back to the potential pie-slice for Sun to steal: the function of
the IB switching chips themselves are far simpler 

Re: [zfs-discuss] Help needed with zfs send/receive

2010-02-02 Thread Richard Elling
On Feb 2, 2010, at 12:05 PM, Arnaud Brand wrote:
> Hi folks,
> 
> I'm having (as the title suggests) a problem with zfs send/receive.
> Command line is like this :
> pfexec zfs send -Rp tank/t...@snapshot | ssh remotehost pfexec zfs recv -v -F 
> -d tank
> 
> This works like a charm as long as the snapshot is small enough.
> 
> When it gets too big (meaning somewhere between 17G and 900G), I get ssh 
> errors (can't read from remote host).
> 
> I tried various encryption options (the fastest being in my case arcfour) 
> with no better results.
> I tried to setup a script to insert dd on the sending and receiving side to 
> buffer the flow, still read errors.
> I tried with mbuffer (which gives better performance), it didn't get better.
> Today I tried with netcat (and mbuffer) and I got better throughput, but it 
> failed at 269GB transferred.
> 
> The two machines are connected to the switch with 2x1GbE (Intel) joined 
> together with LACP.

LACP is spawned from the devil to plague mankind.  It won't
help your ssh transfer at all. It will cause your hair to turn grey and
get pulled out by the roots.  Try turning it off or using a separate 
network for your transfer.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread Frank Cusack

On February 2, 2010 2:17:30 PM -0600 Tim Cook  wrote:

http://www3.sympatico.ca/n.rieck/docs/vms_vs_unix.html


interesting page, if somewhat dated.  e.g. maybe it wasn't true at the
time but don't we now know from the SCO lawsuit that SCO does indeed
own "UNIX"?

as long as we're OT. :)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?

2010-02-02 Thread Marc Nicholas
On Tue, Feb 2, 2010 at 3:11 PM, Frank Cusack 
wrote:

>
> That said, I doubt 2TB drives represent good value for a home user.
> They WILL fail more frequently and as a home user you aren't likely
> to be keeping multiple spares on hand to avoid warranty replacement
> time.


 I'm having a hard time convincing myself to go beyond 500GBboth for
performance (I'm trying to build something with reasonable IOPS) and
reliability reasons.

-marc
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread Marc Nicholas
On Tue, Feb 2, 2010 at 3:45 PM, Peter Jeremy <
peter.jer...@alcatel-lucent.com> wrote:

>
> OTOH, if I'm paying 10x the street drive price upfront, plus roughly
> the street price annually in "support", I can save a fair amount of
> money by just buying a pile of spare drives - when one fails, just
> swap it for a spare and it doesn't matter if it takes weeks for the
> vendor to swap it.
>
>
I can tell you with the Enterprise storage vendor I deal with, the
maintenance over five years (expected lifetime of the system) is equal or
more than initial, discounted purchase price!

-marc
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread Peter Jeremy
On 2010-Feb-03 00:12:43 +0800, Bob Friesenhahn  
wrote:
>On Tue, 2 Feb 2010, David Dyer-Bennet wrote:
>>
>> Now, I'm sure not ALL drives offered at Newegg could qualify; but the
>> question is, how much do I give up by buying an enterprise-grade drive
>> from a major manufacturer, compared to the Sun-certified drive?
>
>If you have a Sun service contract, you give up quite a lot.  If a Sun 
>drive fails every other day, then Sun will replace that Sun drive 
>every other day, even if the system warranty has expired.  But if it 
>is a non-Sun drive, then you have to deal with a disinterested drive 
>manufacturer, which could take weeks or months.

OTOH, if I'm paying 10x the street drive price upfront, plus roughly
the street price annually in "support", I can save a fair amount of
money by just buying a pile of spare drives - when one fails, just
swap it for a spare and it doesn't matter if it takes weeks for the
vendor to swap it.

>Hopefully Oracle will do better than Sun at explaining the benefits 
>and services provided by a service contract.

I know that trying to get Sun to renew a service contract is like
pulling teeth but Oracle is far worse - as far as I can tell, Oracle
contracts are deliberately designed so you can't be certain whether
you are compliant or not.

-- 
Peter Jeremy
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread David Dyer-Bennet

On Tue, February 2, 2010 14:21, Tim Cook wrote:
> On Tue, Feb 2, 2010 at 2:14 PM, David Dyer-Bennet  wrote:
>
>>
>> On Tue, February 2, 2010 11:26, Richard Elling wrote:
>> > On Feb 2, 2010, at 8:49 AM, David Dyer-Bennet wrote:
>> >> On Tue, February 2, 2010 10:21, Marc Nicholas wrote:
>> >>> I agree wholeheartedlyyou're paying to make the problem "go
>> away"
>> >>> in
>> >>> an
>> >>> expedient manner. That said, I see how much we spend on NetApp
>> storage
>> >>> at
>> >>> work and it makes me shudder ;)
>> >>
>> >> Yes, exactly.  Pricing must be about right, people wince but pay it
>> :-).
>> >> If they don't wince it's too low.
>> >
>> > Business 101.
>> > The price will be what the market will bear. If the price seems out of
>> > line with your market, then perhaps you aren't in the same market.
>>
>> Yes, perhaps.  Quite clearly, in this case; I'm not buying enterprise
>> storage myself.  If the market bears it for long, then there's
>> definitely
>> an actual market there; otherwise it might have been a mistake by the
>> company.
>>
>> I want the disk companies to come up with a set of specs for an
>> enterprise-grade drive that can be used in stock form in relatively
>> simple
>> hardware to give good results.  This concept that their enterprise-grade
>> drives need tweaking in the firmware and price to be useful is annoying.
>> Fair enough for people pushing the edges of the envelope, but most
>> people
>> don't, there should be a good solid mainstream solution available.
>>
>> A Solaris-based ZFS box using SAS controllers with 5, 8, 24, and 48
>> drive
>> bay options might just about do it, if it could take a range of stock
>> drives officially.  Kill off a big chunk of people who get forced into
>> enterprise storage against their will.
>>
>
> How exactly do you suggest the drive manufacturers make their drives "just
> work" with every SAS/SATA controller on the market, and all of the quirks
> they have?  You're essentially saying you want the drive manufacturers to
> do
> what the storage vendors are doing today (all of the integration work),
> only not charge you for it.
>
> You can't have your cake and eat it too.

I'm suggesting that the standard for the interface ought to be
sufficiently standardized and well-enough documented that things meeting
it just work, in the way that desktop motherboards and disk drives "just
work", i.e. well enough for nearly everybody.  I understand why people
pushing the limits would need custom-tuned hardware, but I don't think the
middle of the market should need it.

The controllers shouldn't be full of quirks; companies that routinely make
them that way need to clean up their act or be driven out of the market. 
Same for the drives.

>> Yes, my Camry is good for commuting and running across town through
>> unfortunately frequent stop-and-go traffic, and running down to see my
>> mother (about an hour) a lot more than I used to need to.  It can carry
>> 4
>> very comfortably, which we only use every month or so, and it can carry
>> the 3-head studio lighting kit in the trunk very comfortably.
>>
>> Probably still somewhat marginal on your ranch, though better than a
>> Ferrari.  The ground clearance is medium, and it's not mainly a
>> cargo-hauler.
>>
>>
> And how well does your Camry run when you try to replace the Toyota
> transmission with one manufactured by Ford?  A mechanic who knows what
> he's
> doing and has fabrication skills could probably get it to work, and pretty
> darn well, but it isn't ever going to be the same as buying an integrated
> product directly from Toyota...

When I was first in the industry, in 1969, it was fairly normal to only be
able to connect DEC disks to a PDP-11; but even then there were
third-party manufacturers making products and customers buying them.  Now,
forty years down the road, computers are constructed from mostly generic
components.  The disk drive is one of the ones that went generic first. 
It's absurd that we can't handle small enterprise storage on
standards-compliant drives at this point.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread Tim Cook
On Tue, Feb 2, 2010 at 2:14 PM, David Dyer-Bennet  wrote:

>
> On Tue, February 2, 2010 11:26, Richard Elling wrote:
> > On Feb 2, 2010, at 8:49 AM, David Dyer-Bennet wrote:
> >> On Tue, February 2, 2010 10:21, Marc Nicholas wrote:
> >>> I agree wholeheartedlyyou're paying to make the problem "go away"
> >>> in
> >>> an
> >>> expedient manner. That said, I see how much we spend on NetApp storage
> >>> at
> >>> work and it makes me shudder ;)
> >>
> >> Yes, exactly.  Pricing must be about right, people wince but pay it :-).
> >> If they don't wince it's too low.
> >
> > Business 101.
> > The price will be what the market will bear. If the price seems out of
> > line with your market, then perhaps you aren't in the same market.
>
> Yes, perhaps.  Quite clearly, in this case; I'm not buying enterprise
> storage myself.  If the market bears it for long, then there's definitely
> an actual market there; otherwise it might have been a mistake by the
> company.
>
> I want the disk companies to come up with a set of specs for an
> enterprise-grade drive that can be used in stock form in relatively simple
> hardware to give good results.  This concept that their enterprise-grade
> drives need tweaking in the firmware and price to be useful is annoying.
> Fair enough for people pushing the edges of the envelope, but most people
> don't, there should be a good solid mainstream solution available.
>
> A Solaris-based ZFS box using SAS controllers with 5, 8, 24, and 48 drive
> bay options might just about do it, if it could take a range of stock
> drives officially.  Kill off a big chunk of people who get forced into
> enterprise storage against their will.
>

How exactly do you suggest the drive manufacturers make their drives "just
work" with every SAS/SATA controller on the market, and all of the quirks
they have?  You're essentially saying you want the drive manufacturers to do
what the storage vendors are doing today (all of the integration work), only
not charge you for it.

You can't have your cake and eat it too.




> Yes, my Camry is good for commuting and running across town through
> unfortunately frequent stop-and-go traffic, and running down to see my
> mother (about an hour) a lot more than I used to need to.  It can carry 4
> very comfortably, which we only use every month or so, and it can carry
> the 3-head studio lighting kit in the trunk very comfortably.
>
> Probably still somewhat marginal on your ranch, though better than a
> Ferrari.  The ground clearance is medium, and it's not mainly a
> cargo-hauler.
>
>
And how well does your Camry run when you try to replace the Toyota
transmission with one manufactured by Ford?  A mechanic who knows what he's
doing and has fabrication skills could probably get it to work, and pretty
darn well, but it isn't ever going to be the same as buying an integrated
product directly from Toyota...

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread Tim Cook
On Tue, Feb 2, 2010 at 12:54 PM, Orvar Korvar <
knatte_fnatte_tja...@yahoo.com> wrote:

> 100% uptime for 20 years?
>
> So what makes OpenVMS so much more stable than Unix? What is the
> difference?
>
>
>
They had/have clustering software that was/is bulletproof.  I don't think
anyone in the Unix community has duplicated it to date.  As for differences,
google is your friend?
http://www3.sympatico.ca/n.rieck/docs/vms_vs_unix.html

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread David Dyer-Bennet

On Tue, February 2, 2010 11:26, Richard Elling wrote:
> On Feb 2, 2010, at 8:49 AM, David Dyer-Bennet wrote:
>> On Tue, February 2, 2010 10:21, Marc Nicholas wrote:
>>> I agree wholeheartedlyyou're paying to make the problem "go away"
>>> in
>>> an
>>> expedient manner. That said, I see how much we spend on NetApp storage
>>> at
>>> work and it makes me shudder ;)
>>
>> Yes, exactly.  Pricing must be about right, people wince but pay it :-).
>> If they don't wince it's too low.
>
> Business 101.
> The price will be what the market will bear. If the price seems out of
> line with your market, then perhaps you aren't in the same market.

Yes, perhaps.  Quite clearly, in this case; I'm not buying enterprise
storage myself.  If the market bears it for long, then there's definitely
an actual market there; otherwise it might have been a mistake by the
company.

I want the disk companies to come up with a set of specs for an
enterprise-grade drive that can be used in stock form in relatively simple
hardware to give good results.  This concept that their enterprise-grade
drives need tweaking in the firmware and price to be useful is annoying. 
Fair enough for people pushing the edges of the envelope, but most people
don't, there should be a good solid mainstream solution available.

A Solaris-based ZFS box using SAS controllers with 5, 8, 24, and 48 drive
bay options might just about do it, if it could take a range of stock
drives officially.  Kill off a big chunk of people who get forced into
enterprise storage against their will.

> Personally, I think Ferraris are neat. But here on the ranch, I might be
> able to squeeze a bale of hay into the passenger seat, but the low
> ground clearance means I'll have to keep the tractor nearby to pull it
> out  when it gets stuck. So a Ferrari has $0 market value here at the
> ranch.

Yes, my Camry is good for commuting and running across town through
unfortunately frequent stop-and-go traffic, and running down to see my
mother (about an hour) a lot more than I used to need to.  It can carry 4
very comfortably, which we only use every month or so, and it can carry
the 3-head studio lighting kit in the trunk very comfortably.

Probably still somewhat marginal on your ranch, though better than a
Ferrari.  The ground clearance is medium, and it's not mainly a
cargo-hauler.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?

2010-02-02 Thread Frank Cusack
On February 2, 2010 11:58:12 AM -0800 Simon Breden  
wrote:

IIRC the Black range are meant to be the 'performance' models and so are
a bit noisy. What's your opinion? And the 2TB models are not cheap either
for a home user. The 1TB seem a good price. And from what little I read,


It depends what you mean by cheap.  As we've recently learned, cheaper
is not necessarily cheaper. :)

What I mean is, it depends how much data you have.  If 2TB drives allow
you to use only 1 chassis, you save on power consumption.  Fewer spindles
also will save on power consumption.  However, w/ 2TB drives you may
need to add more parity (raidz2 vs raidz1, e.g.) to meet your reliability
requirements -- the time to resilver 2TB may not meet your MTTDR reqs.
So you have to include your reliability needs when you figure cost.

That said, I doubt 2TB drives represent good value for a home user.
They WILL fail more frequently and as a home user you aren't likely
to be keeping multiple spares on hand to avoid warranty replacement
time.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Help needed with zfs send/receive

2010-02-02 Thread Arnaud Brand




Hi folks,

I'm having (as the title suggests) a problem with zfs send/receive.
Command line is like this :
pfexec zfs send -Rp tank/t...@snapshot | ssh remotehost pfexec zfs recv
-v -F -d tank

This works like a charm as long as the snapshot is small enough.

When it gets too big (meaning somewhere between 17G and 900G), I get
ssh errors (can't read from remote host).

I tried various encryption options (the fastest being in my case
arcfour) with no better results.
I tried to setup a script to insert dd on the sending and receiving
side to buffer the flow, still read errors.
I tried with mbuffer (which gives better performance), it didn't get
better.
Today I tried with netcat (and mbuffer) and I got better throughput,
but it failed at 269GB transferred.

The two machines are connected to the switch with 2x1GbE (Intel) joined
together with LACP.
The switch logs show no errors on the ports.
kstat -p | grep e1000g shows one recv error on the sending side.

I can't find anything in the logs which could give me a clue about
what's happening.

I'm running build 131.

If anyone has the slightest clue of where I could look or what I could
do to pinpoint/solve the problem, I'd be very gratefull if (s)he could
share it with me.

Thanks and have a nice evening.

Arnaud





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?

2010-02-02 Thread Marc Nicholas
I'm running the 500GB models myself, but I wouldn't say they're overly
noisyand I've been doing ZFS/iSCSI/IOMeter/Bonnie++ stress testing with
them.

They "whine" rather than "click" FYI.

-marc

On Tue, Feb 2, 2010 at 2:58 PM, Simon Breden  wrote:

> IIRC the Black range are meant to be the 'performance' models and so are a
> bit noisy. What's your opinion? And the 2TB models are not cheap either for
> a home user. The 1TB seem a good price. And from what little I read, it
> seems you can control the error reporting time with the WDTLER.EXE utility
> :)
>
> Cheers,
> Simon
> --
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?

2010-02-02 Thread Simon Breden
IIRC the Black range are meant to be the 'performance' models and so are a bit 
noisy. What's your opinion? And the 2TB models are not cheap either for a home 
user. The 1TB seem a good price. And from what little I read, it seems you can 
control the error reporting time with the WDTLER.EXE utility :)

Cheers,
Simon
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Obtaining zpool volume size from a C coded application.

2010-02-02 Thread Eric C. Taylor
You can use the DKIOCGMEDIAINFO ioctl to get this information.

-  Eric
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?

2010-02-02 Thread Simon Breden
The thing that puts me off the 7K2000 is that it is a 5 platter model. The 
latest 2TB drives use 4 x 500GB platters. A bit less noise, vibration and heat, 
in theory :)

And the latest 1.5TB drives use only 3 x 500GB platters.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zfs over iscsi bad status

2010-02-02 Thread Arnaud Brand
Just for the record, using 127.0.0.1 as the target instead of 
localhost's external IP, the problem didn't show up anymore.



Le 16/01/10 15:55, Arnaud Brand a écrit :

OK, the third question (localhost transmission failure) should have been posted 
to storage-discuss.
I'll subscribe to this list and ask there.


Regarding the first question, after having removed the lun from the target, 
devfsadm -C removes the device and then the pool shows as unavailable. I guess 
that's the proper behaviour.
Still the processes are hung and I can't destroy the pool.

This leads to being unable to open a new session with a user that has a home 
dir.

I copy-pasted some mdb results I found while looking for a way to get rid of 
the pool.
Please note I had failmode=wait for the failling pool.

But since you can't change it once you're stuck, you're bound to reboot in case 
of iscsi failure.
Or am I misunderstanding something ?


d...@nc-tanktsm:/tsmvol2# ps -ef | grep zpool
 root 5 0   0 01:47:33 ?   0:06 zpool-rpool
 root   327 0   0 01:47:50 ?  86:36 zpool-tank
 root  4721  4042   0 15:13:27 pts/3   0:00 zpool online tsmvol 
c9t600144F05DF34C004B51BF950003d0
 root  4617 0   0 14:36:35 ?   0:00 zpool-tsmvol
 root  4752 0   0 15:14:40 ?   0:39 zpool-tsmvol2
 root  4664  4042   0 15:08:34 pts/3   0:00 zpool destroy -f tsmvol
 root  4861  4042   0 15:27:33 pts/3   0:00 grep zpool

d...@nc-tanktsm:/tsmvol2# echo "0t4721::pid2proc|::walk thread|::findstack -v" 
| mdb -k
stack pointer for thread ff040c813c20: ff00196a3aa0
[ ff00196a3aa0 _resume_from_idle+0xf1() ]
   ff00196a3ad0 swtch+0x145()
   ff00196a3b00 cv_wait+0x61(ff03f7ea4e52, ff03f7ea4e18)
   ff00196a3b50 txg_wait_synced+0x7c(ff03f7ea4c40, 0)
   ff00196a3b90 spa_vdev_state_exit+0x78(ff0402d9da80, ff040c832700,
   0)
   ff00196a3c00 vdev_online+0x20a(ff0402d9da80, abe9a540ed085f5c, 0,
   ff00196a3c14)
   ff00196a3c40 zfs_ioc_vdev_set_state+0x83(ff046c08f000)
   ff00196a3cc0 zfsdev_ioctl+0x175(0, 5a0d, 8042310, 13, 
ff04054f4528
   , ff00196a3de4)
   ff00196a3d00 cdev_ioctl+0x45(0, 5a0d, 8042310, 13, ff04054f4528,
   ff00196a3de4)
   ff00196a3d40 spec_ioctl+0x5a(ff03e3218180, 5a0d, 8042310, 13,
   ff04054f4528, ff00196a3de4, 0)
   ff00196a3dc0 fop_ioctl+0x7b(ff03e3218180, 5a0d, 8042310, 13,
   ff04054f4528, ff00196a3de4, 0)
   ff00196a3ec0 ioctl+0x18e(3, 5a0d, 8042310)
   ff00196a3f10 _sys_sysenter_post_swapgs+0x149()
d...@nc-tanktsm:/tsmvol2# echo "0t4664::pid2proc|::walk thread|::findstack -v" 
| mdb -k
stack pointer for thread ff03ec9898a0: ff00195ccc20
[ ff00195ccc20 _resume_from_idle+0xf1() ]
   ff00195ccc50 swtch+0x145()
   ff00195ccc80 cv_wait+0x61(ff0403008658, ff0403008650)
   ff00195cccb0 rrw_enter_write+0x49(ff0403008650)
   ff00195ccce0 rrw_enter+0x22(ff0403008650, 0, f79da8a0)
   ff00195ccd40 zfsvfs_teardown+0x3b(ff0403008580, 1)
   ff00195ccd90 zfs_umount+0xe1(ff0403101b80, 400, ff04054f4528)
   ff00195ccdc0 fsop_unmount+0x22(ff0403101b80, 400, ff04054f4528)
   ff00195cce10 dounmount+0x5f(ff0403101b80, 400, ff04054f4528)
   ff00195cce60 umount2_engine+0x5c(ff0403101b80, 400, ff04054f4528,
   1)
   ff00195ccec0 umount2+0x142(80c1fd8, 400)
   ff00195ccf10 _sys_sysenter_post_swapgs+0x149()
d...@nc-tanktsm:/tsmvol2# ps -ef | grep iozone
 root  4631  3809   0 14:37:16 pts/2   0:00 
/usr/benchmarks/iozone/iozone -a -b results2.xls
 root  4879  4042   0 15:28:06 pts/3   0:00 grep iozone
d...@nc-tanktsm:/tsmvol2# echo "0t4631::pid2proc|::walk thread|::findstack -v" 
| mdb -k
stack pointer for thread ff040c7683e0: ff001791e050
[ ff001791e050 _resume_from_idle+0xf1() ]
   ff001791e080 swtch+0x145()
   ff001791e0b0 cv_wait+0x61(ff04ec895328, ff04ec895320)
   ff001791e0f0 zio_wait+0x5d(ff04ec895020)
   ff001791e150 dbuf_read+0x1e8(ff0453f1ea48, 0, 2)
   ff001791e1c0 dmu_buf_hold+0x93(ff03f60bdcc0, 3, 0, 0, 
ff001791e1f8
   )
   ff001791e260 zap_lockdir+0x67(ff03f60bdcc0, 3, 0, 1, 1, 0,
   ff001791e288)
   ff001791e2f0 zap_lookup_norm+0x55(ff03f60bdcc0, 3, ff001791e720, 
8
   , 1, ff001791e438, 0, 0, 0, 0)
   ff001791e350 zap_lookup+0x2d(ff03f60bdcc0, 3, ff001791e720, 8, 1,
   ff001791e438)
   ff001791e3d0 zfs_match_find+0xfd(ff0403008580, ff040aeb64b0,
   ff001791e720, 0, 1, 0, 0, ff001791e438)
   ff001791e4a0 zfs_dirent_lock+0x3d1(ff001791e4d8, ff040aeb64b0,
   ff001791e720, ff001791e4d0, 6, 0, 0)
   ff001791e540 zfs_dirlook+0xd9(ff040aeb64b0, ff001791e720,
   ff001791e6f0, 1, 0, 0)
   ff001791e5c0 zfs_lookup+0x25f(ff040b230300, ff0

Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?

2010-02-02 Thread Marc Nicholas
On Tue, Feb 2, 2010 at 1:38 PM, Brandon High  wrote:

> On Sat, Jan 16, 2010 at 9:47 AM, Simon Breden  wrote:
> > Which consumer-priced 1.5TB drives do people currently recommend?
>
> I happened to be looking at the Hitachi product information, and
> noticed that the Deskstar 7K2000 appears to be supported in RAID
> configurations. One of the applications listed is "Video editing
> arrays".
>
> http://www.hitachigst.com/portal/site/en/products/deskstar/7K2000/
>
>
I've been having good success with the Western Digital Caviar Black
drives...which are cousins of their Enterprise RE3 platform. AFAIK, you're
stuck at 1TB or 2TB capacities but I've managed to get some good deals on
them...

-marc
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2010-02-02 Thread Simon Breden
If I'm not mistaken then the WD2002FYPS is an enterprise model: WD RE4-GP (RAID 
Edition, Green Power), so you almost certainly have the firmware that allows 
(1) the idle time before spindown to be modified with WDIDLE3.EXE and (2) the 
error reporting time to be modified with WDTLER.EXE. 

So I expect your drives are spinning down to save power as they are Green 
series drives. But if this spindown is causing odd things to happen you could 
see if it's possible to increase the spindown time with WDIDLE3.EXE.

Let us know if you get any news back from WD.

Cheers,
Simon
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread Orvar Korvar
1) SAS HBA seems to be an I/O card which has SAS cable connection. It sits in 
the OSol server. It is basically just a simple I/O card, right? I hope these 
cards are cheap? 

2) So I can buy a disk chassi with 24 disks, connect all disks to one SAS cable 
and connect that SAS cable to my OSol server, which has a SAS HBA I/O card? Is 
this correct? Or do I need one SAS cable for each disk, hence I need 24 SAS 
cables?

3) Does the SAS HBA card need Solaris drivers? 

4) Will there be performance hit if I connect 24 disks to one SAS cable/HBA? 
Maybe band width in one SAS cable will not suffice for 24 disks?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread Ed Fang
Also, both of those chassis come in SAS expander version and JBOD.  the SAS 
expander version is the E1 version of the case.  With the SAS Expander, and a 
motherboard using the LSI2008 or LSI1068 chipset, you can attach one cable from 
the SAS port (SFF8087) to the SAS expander and have all the drives online 
rather than connecting 24/36 drives individually. . .
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread Orvar Korvar
100% uptime for 20 years? 

So what makes OpenVMS so much more stable than Unix? What is the difference?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?

2010-02-02 Thread Brandon High
On Sat, Jan 16, 2010 at 9:47 AM, Simon Breden  wrote:
> Which consumer-priced 1.5TB drives do people currently recommend?

I happened to be looking at the Hitachi product information, and
noticed that the Deskstar 7K2000 appears to be supported in RAID
configurations. One of the applications listed is "Video editing
arrays".

http://www.hitachigst.com/portal/site/en/products/deskstar/7K2000/

-B

-- 
Brandon High : bh...@freaks.com
If violence doesn't solve your problem, you're not using enough of it.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread Frank Cusack

On February 2, 2010 12:08:13 PM -0600 Tim Cook  wrote:

Not exactly unix, but there's still VMS clusters running around out there
with 100% uptime for over 20 years.  I wouldn't mind seeing it opened up.


Agreed, I'd love to see that opened up.  Might even give it new life.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread Brandon High
On Tue, Feb 2, 2010 at 5:41 AM, Orvar Korvar
 wrote:
> I see 24 drives in an external chassi. I presume that chassis does only hold 
> drives, it does not hold a motherboard.
>
> How do you connect all drives to your OpenSolaris server? Do you place them 
> next to each other, and then you have three 8 SATA ports in your OpenSolaris 
> server, and have 24 SATA cables in the air? And the chassis are wide open?
>
> Or, am I wrong, does the chassi also hold a motherboard?

Both of the cases I posted can hold a motherboard. There is also a kit
available for about $100 that lets you use the chassis with just disk
in it.

-B

-- 
Brandon High : bh...@freaks.com
For sale: One moral compass, never used.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is LSI SAS3081E-R suitable for a ZFS NAS ?

2010-02-02 Thread Gregory Youngblood
The problems you had with the x8dtn, did the affect the x8dtn+-o that you know 
of, or just the -6?

I'm thinking of building a system around this due to the PCI-X so I can use the 
AOC-SAT2-MV8.

Any thoughts?

Thanks.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread Tim Cook
On Tue, Feb 2, 2010 at 12:00 PM, Frank Cusack
wrote:

> On February 2, 2010 11:58:17 AM -0600 Tim Cook  wrote:
>
>> On Tue, Feb 2, 2010 at 11:53 AM, Frank Cusack
>> wrote:
>>
>>  On February 2, 2010 8:57:32 AM -0800 Orvar Korvar <
>>> knatte_fnatte_tja...@yahoo.com> wrote:
>>>
>>>  I love that Sun shares their products for free. Which other big Unix
 vendor does that?


>>> Who's left?
>>>
>>>
>>>  Pretty sure HP and IBM are still alive and well.
>>
>
> Yeah but who would want it, even for free. :P
>


Not exactly unix, but there's still VMS clusters running around out there
with 100% uptime for over 20 years.  I wouldn't mind seeing it opened up.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to grow ZFS on growing pool?

2010-02-02 Thread David Champion
* On 02 Feb 2010, Richard Elling wrote: 
> 
> This behaviour has changed twice.  Long ago, the pools would autoexpand.
> This is a bad thing, by default, so it was changed such that the expansion
> would only occur on pool import (around 3-4 years ago). The autoexpand 
> property allows you to expand without an export/import (and arrived around
> 18 months ago). It is not surprising that various Solaris 10 releases/patches
> would have one of the three behaviours.

Well well, I guess it's been a while since I actually tested this.
:)  Thanks, Richard.  I'll watch for autoexpand in next releases of
s10/osol.

-- 
 -D.d...@uchicago.eduNSITUniversity of Chicago
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread Frank Cusack

On February 2, 2010 11:58:17 AM -0600 Tim Cook  wrote:

On Tue, Feb 2, 2010 at 11:53 AM, Frank Cusack
wrote:


On February 2, 2010 8:57:32 AM -0800 Orvar Korvar <
knatte_fnatte_tja...@yahoo.com> wrote:


I love that Sun shares their products for free. Which other big Unix
vendor does that?



Who's left?



Pretty sure HP and IBM are still alive and well.


Yeah but who would want it, even for free. :P
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread David Champion
* On 02 Feb 2010, Orvar Korvar wrote:
> Ok, I see that the chassi contains a mother board. So never mind that
> question.
>
> Another q:  Is it possible to have large chassi with lots of drives,
> and the opensolaris in another chassi, how do you connect them both?

The J4500 and most other storage products being discussed are not
servers: they are SATA concentrators with SAS uplinks.  You plug in
a bunch of cheap SATA disk, and you connect the chassis to a server
with SAS.  The logic board on the storage tray just converts the SAS
signalling to SATA.  It is not a computer in the usual sense.

In many cases such products also have SAS expander ports, so that you
can link multiple storage trays to a single SAS host bus adapter on your
server by daisy-chaining them.

So you need at least one SAS HBA on your OpenSolaris box, and SAS cables
to hook up the trays containing the SATA drives.


To the original question: you can purchase a J4x00 with a limited
number of drives (empty is generally not an option), but there is no
officially-sanctioned way to obtain the drive adapters except to buy Sun
disks.  You need either a SAS or a SATA drive bracket to adapt the drive
to the J4x00 backplane, but they are not sold separately: one ships with
each drive.

As mentioned there are companies that sell remanufactured or discarded
components, or machine their own substitutes.  (Re)marketing Sun or
compatible drive brackets has always been a lively business for a
few small outfits.  But Sun has no involvement with this, and may be
unwilling to support a frankenstein server.

Sun state that their OEM drives are of higher quality than OTS drives
from manufacturers or retailers, and that they have custom firmware that
improves their performance and reliability in Sun storage trays/arrays.
I see no reason to disbelieve that, but it is quite a steep price to pay
for that premium edge.  When cost is a bigger concern than performance
or reliability, I have generally bought the StorEdge product with the
smallest drives I can (250 GB or 500 GB) and upgraded them myself to the
size I really want.  It's cheaper to buy 20 drives from CDW than 10 from
Sun even when you account for the tiny throwaway drive, and you can keep
the 10 extra as cold spares.  At low enough scale the financial savings
are worth the time to replace them as they fail.

(I wish I could say the same of the StorEdge arrays themselves.  Fully
half of my 2540 controllers have failed, costing me huge amounts of
time in both direct and contractual service, and I'm given up on them
completely as a product line.  I'll be thrilled to switch to JBOD.)

For larger and less fault-tolerant systems, when money is available,
I'm happy to pay Sun's premium.

However, as others say, the other brands sometimes offer decent enough
products to use instead of Sun's enterprise line.  As always, it depends
on your site's requirements and budget.  I assume that a home NAS is
comparatively low on both: therefore I wouldn't even shop with Sun
unless you have a line on cheap castoffs from an enterprise shop.

-- 
 -D.d...@uchicago.eduNSITUniversity of Chicago
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread Tim Cook
On Tue, Feb 2, 2010 at 11:53 AM, Frank Cusack
wrote:

> On February 2, 2010 8:57:32 AM -0800 Orvar Korvar <
> knatte_fnatte_tja...@yahoo.com> wrote:
>
>> I love that Sun shares their products for free. Which other big Unix
>> vendor does that?
>>
>
> Who's left?
>
>
Pretty sure HP and IBM are still alive and well.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread Frank Cusack
On February 2, 2010 8:57:32 AM -0800 Orvar Korvar 
 wrote:

I love that Sun shares their products for free. Which other big Unix
vendor does that?


Who's left?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to grow ZFS on growing pool?

2010-02-02 Thread Cindy Swearingen

Hi David,

This feature integrated into build 117, which would be beyond
your OpenSolaris 2009.06. We anticipate this feature will be
available in an upcoming Solaris 10 release.

You can read about it here:

http://docs.sun.com/app/docs/doc/817-2271/githb?a=view

ZFS Device Replacement Enhancements

Thanks,

cindy

On 02/02/10 10:29, David Champion wrote:
* On 02 Feb 2010, Darren J Moffat wrote: 

zpool get autoexpand test


This seems to be a new property -- it's not in my Solaris 10 or
OpenSolaris 2009.06 systems, and they have always expanded immediately
upon replacement.  In what build number or official release does
autoexpand appear, and does it always default to off?  This will be
important to know for upgrades.

Thanks.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to grow ZFS on growing pool?

2010-02-02 Thread Richard Elling
On Feb 2, 2010, at 9:29 AM, David Champion wrote:

> * On 02 Feb 2010, Darren J Moffat wrote: 
>> 
>> zpool get autoexpand test
> 
> This seems to be a new property -- it's not in my Solaris 10 or
> OpenSolaris 2009.06 systems, and they have always expanded immediately
> upon replacement.  In what build number or official release does
> autoexpand appear, and does it always default to off?  This will be
> important to know for upgrades.

[without digging through the release notes, relying instead on grey memory :-)]

This behaviour has changed twice.  Long ago, the pools would autoexpand.
This is a bad thing, by default, so it was changed such that the expansion
would only occur on pool import (around 3-4 years ago). The autoexpand 
property allows you to expand without an export/import (and arrived around
18 months ago). It is not surprising that various Solaris 10 releases/patches
would have one of the three behaviours.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to grow ZFS on growing pool?

2010-02-02 Thread Darren J Moffat

On 02/02/2010 17:29, David Champion wrote:

* On 02 Feb 2010, Darren J Moffat wrote:


zpool get autoexpand test


This seems to be a new property -- it's not in my Solaris 10 or
OpenSolaris 2009.06 systems, and they have always expanded immediately
upon replacement.  In what build number or official release does
autoexpand appear, and does it always default to off?  This will be
important to know for upgrades.


changeset:   10155:847676ec1c5b
date:Mon Jun 08 10:35:50 2009 -0700
description:
PSARC 2008/353 zpool autoexpand property
6475340 when lun expands, zfs should expand too
6563887 in-place replacement allows for smaller devices
6606879 should be able to grow pool without a reboot or 
export/import

6844090 zfs should be able to mirror to a smaller disk

Which would be build 94 which given 2009.06 was build 111b it should be 
there.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to grow ZFS on growing pool?

2010-02-02 Thread David Champion
* On 02 Feb 2010, Darren J Moffat wrote: 
> 
> zpool get autoexpand test

This seems to be a new property -- it's not in my Solaris 10 or
OpenSolaris 2009.06 systems, and they have always expanded immediately
upon replacement.  In what build number or official release does
autoexpand appear, and does it always default to off?  This will be
important to know for upgrades.

Thanks.

-- 
 -D.d...@uchicago.eduNSITUniversity of Chicago
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread Richard Elling
On Feb 2, 2010, at 8:49 AM, David Dyer-Bennet wrote:
> On Tue, February 2, 2010 10:21, Marc Nicholas wrote:
>> I agree wholeheartedlyyou're paying to make the problem "go away" in
>> an
>> expedient manner. That said, I see how much we spend on NetApp storage at
>> work and it makes me shudder ;)
> 
> Yes, exactly.  Pricing must be about right, people wince but pay it :-). 
> If they don't wince it's too low.

Business 101.
The price will be what the market will bear. If the price seems out of
line with your market, then perhaps you aren't in the same market.

Personally, I think Ferraris are neat. But here on the ranch, I might be
able to squeeze a bale of hay into the passenger seat, but the low
ground clearance means I'll have to keep the tractor nearby to pull it
out  when it gets stuck. So a Ferrari has $0 market value here at the ranch.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2010-02-02 Thread Mark Nipper
> That's good to hear. Which revision are they: 00R6B0
> or 00P8B0? It's marked on the drive top.

Interesting.  I wonder if this is the issue too with the 01U1B0 2.0TB drives?  
I have 24 WD2002FYPS-01U1B0 drives under OpenSolaris with an LSI 1068E 
controller that have weird timeout issues and I have 4 more on a 3ware 9650SE 
(a 9650SE-4LPML to be exact) under Linux which often show up with a status of 
DEVICE-ERROR even though the drive otherwise appears to be fine.  All of this 
could be explained by these drives not responding in a timely manner I think to 
the controller.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread Orvar Korvar
This reminds me of this attorney that charged very much for a contract template 
he copied and gave to a client. To that, he responded:
-You dont pay for me finding this template and copying to you, which took me 5 
minutes. You pay me because I sat 5 years in the university, and have 15 years 
of experience. That is why you pay me.

I agree that Sun hardware is way too pricey for a home user, but this is 
Enterprise stuff. And you should look at IBM prices, they are 5-10x higher than 
Sun's prices.

I think the Enterprise customers, do not pay for Sun bringing you a disk from 
the cellar, which takes 5 minutes. But they pay for all the research, 
development, and Sun making sure everything works as intended.

I love that Sun shares their products for free. Which other big Unix vendor 
does that?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to grow ZFS on growing pool?

2010-02-02 Thread Darren J Moffat

On 02/02/2010 16:48, Cindy Swearingen wrote:

Hi Joerg,

Eabling the autoexpand property after the disk replacement is complete
should expand the pool. This looks like a bug. I can reproduce this
issue with files. It seems to be working as expected for disks.
See the output below.


If you use lofi on top of files rather than files directly it works too.

--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread David Dyer-Bennet

On Tue, February 2, 2010 10:21, Marc Nicholas wrote:
> I agree wholeheartedlyyou're paying to make the problem "go away" in
> an
> expedient manner. That said, I see how much we spend on NetApp storage at
> work and it makes me shudder ;)

Yes, exactly.  Pricing must be about right, people wince but pay it :-). 
If they don't wince it's too low.

There are certainly places (NASDAQ, say) who have to have absolute
reliability (on both their main and disaster-recovery sites).  That level
of reliability costs money, and they pay it, probably even fairly
cheerfully.

Since spare parts have to be stocked and field service people trained, it
makes sense that service contracts cover limited sets of equipment.  And
that's a strong argument for staying within that set of equipment.

I do think that a lot of companies buy higher up the reliability curve
than they need.  But that's their choice.

> I think someone was wondering if the large storage vendors have their own
> microcode on drives? I can tell you that NetApp do...and that's one way
> they
> "lock you in" (if the drive doesn't report NetApp firmware, the filer will
> "reject" the drive) and also how they do tricks like
> soft-failure/re-validation, 520-byte sectors, etc.

Sigh.  It makes perfect sense that getting some special tricks in the
drives actually pays off.  And yet it's inevitable that they ALSO use it
as a lockin.

I've seen people down extra days while locked-in parts are shipped to
them; the parts were essentially identical to what you could buy that day
at retail locally, but the locally-available version wouldn't work.


-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to grow ZFS on growing pool?

2010-02-02 Thread Cindy Swearingen

Hi Joerg,

Eabling the autoexpand property after the disk replacement is complete
should expand the pool. This looks like a bug. I can reproduce this
issue with files. It seems to be working as expected for disks.
See the output below.

Thanks, Cindy

Create pool test with 2 68 GB drives:

# zpool create test c2t2d0 c2t3d0
# zpool list test
NAME   SIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
test   136G   126K   136G 0%  1.00x  ONLINE  -
# zfs list test
NAME   USED  AVAIL  REFER  MOUNTPOINT
test  73.5K   134G21K  /test

Replace 2 68 GB drive with 136 GB drives and set autoreplace:

# zpool replace test c2t2d0 c0t8d0
# zpool replace test c2t3d0 c0t9d0
# zpool list test
NAME   SIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
test   136G   166K   136G 0%  1.00x  ONLINE  -
# zfs list test
NAME   USED  AVAIL  REFER  MOUNTPOINT
test90K   134G21K  /test
# zpool set autoexpand=on test
# zpool list test
NAME   SIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
test   273G   150K   273G 0%  1.00x  ONLINE  -


On 02/02/10 09:22, Joerg Schilling wrote:

Darren J Moffat  wrote:


Did yours grow?
If yes, what did I do wrong?

What does this return:

zpool get autoexpand test


zpool get autoexpand test
NAME  PROPERTYVALUE   SOURCE
test  autoexpand  off default

Thank you for this hint!

BTW: setting autoexpand later did not help but setting it before
I replaced the "media" resulted in a grown zpool and zfs.

Jörg


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread Joerg Schilling
Marc Nicholas  wrote:

> I think someone was wondering if the large storage vendors have their own
> microcode on drives? I can tell you that NetApp do...and that's one way they
> "lock you in" (if the drive doesn't report NetApp firmware, the filer will
> "reject" the drive) and also how they do tricks like
> soft-failure/re-validation, 520-byte sectors, etc.

Since IBM started to use SCSI drives more than 20 years ago for their 
mainframes, you can format most drives with any sector size. IBM used 800
or 8000 byte sectors (10 or 100 punch cards ;-), but you also may reformat
a drive with 520 bytes per sector.

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to grow ZFS on growing pool?

2010-02-02 Thread Joerg Schilling
Darren J Moffat  wrote:

> > Did yours grow?
> > If yes, what did I do wrong?
>
> What does this return:
>
> zpool get autoexpand test

zpool get autoexpand test
NAME  PROPERTYVALUE   SOURCE
test  autoexpand  off default

Thank you for this hint!

BTW: setting autoexpand later did not help but setting it before
I replaced the "media" resulted in a grown zpool and zfs.

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread Marc Nicholas
I agree wholeheartedlyyou're paying to make the problem "go away" in an
expedient manner. That said, I see how much we spend on NetApp storage at
work and it makes me shudder ;)

I think someone was wondering if the large storage vendors have their own
microcode on drives? I can tell you that NetApp do...and that's one way they
"lock you in" (if the drive doesn't report NetApp firmware, the filer will
"reject" the drive) and also how they do tricks like
soft-failure/re-validation, 520-byte sectors, etc.

-marc


On Tue, Feb 2, 2010 at 11:12 AM, Bob Friesenhahn <
bfrie...@simple.dallas.tx.us> wrote:

> On Tue, 2 Feb 2010, David Dyer-Bennet wrote:
>
>>
>> Now, I'm sure not ALL drives offered at Newegg could qualify; but the
>> question is, how much do I give up by buying an enterprise-grade drive
>> from a major manufacturer, compared to the Sun-certified drive?
>>
>
> If you have a Sun service contract, you give up quite a lot.  If a Sun
> drive fails every other day, then Sun will replace that Sun drive every
> other day, even if the system warranty has expired.  But if it is a non-Sun
> drive, then you have to deal with a disinterested drive manufacturer, which
> could take weeks or months.
>
> My experiences thus far is that if you pay for a Sun service contract, then
> you should definitely pay extra for Sun branded parts.
>
> Hopefully Oracle will do better than Sun at explaining the benefits and
> services provided by a service contract.
>
> Bob
> --
> Bob Friesenhahn
> bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to grow ZFS on growing pool?

2010-02-02 Thread Darren J Moffat

On 02/02/2010 16:09, Joerg Schilling wrote:

Carsten Aulbert  wrote:


Hi Jörg,

On Tuesday 02 February 2010 16:40:50 Joerg Schilling wrote:

After that, the zpool did notice that there is more space:

zpool list
NAME   SIZE   USED  AVAILCAP  HEALTH  ALTROOT
test   476M  1,28M   475M 0%  ONLINE  -



That's the size already after the initial creation, after exporting and
importing it again:

# zpool list
NAMESIZE   USED  AVAILCAP  HEALTH  ALTROOT
test976M   252K   976M 0%  ONLINE  -


Mmm, it seems that I make a mistake whil interpreting the results.
My zpool did not grow either.

Did yours grow?
If yes, what did I do wrong?


What does this return:

zpool get autoexpand test


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread Bob Friesenhahn

On Tue, 2 Feb 2010, David Dyer-Bennet wrote:


Now, I'm sure not ALL drives offered at Newegg could qualify; but the
question is, how much do I give up by buying an enterprise-grade drive
from a major manufacturer, compared to the Sun-certified drive?


If you have a Sun service contract, you give up quite a lot.  If a Sun 
drive fails every other day, then Sun will replace that Sun drive 
every other day, even if the system warranty has expired.  But if it 
is a non-Sun drive, then you have to deal with a disinterested drive 
manufacturer, which could take weeks or months.


My experiences thus far is that if you pay for a Sun service contract, 
then you should definitely pay extra for Sun branded parts.


Hopefully Oracle will do better than Sun at explaining the benefits 
and services provided by a service contract.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to grow ZFS on growing pool?

2010-02-02 Thread Joerg Schilling
Carsten Aulbert  wrote:

> Hi Jörg,
>
> On Tuesday 02 February 2010 16:40:50 Joerg Schilling wrote:
> > After that, the zpool did notice that there is more space:
> > 
> > zpool list
> > NAME   SIZE   USED  AVAILCAP  HEALTH  ALTROOT
> > test   476M  1,28M   475M 0%  ONLINE  -
> > 
>
> That's the size already after the initial creation, after exporting and 
> importing it again:
>
> # zpool list
> NAMESIZE   USED  AVAILCAP  HEALTH  ALTROOT
> test976M   252K   976M 0%  ONLINE  -

Mmm, it seems that I make a mistake whil interpreting the results.
My zpool did not grow either.

Did yours grow?
If yes, what did I do wrong?

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2010-02-02 Thread Simon Breden
Hi Tonmaus,

That's good to hear. Which revision are they: 00R6B0 or 00P8B0? It's marked on 
the drive top.

>From what I've seen elsewhere, people seem to be complaining about the newer 
>00P8B0 revision, so I'd be interested to hear from you. These revision numbers 
>are listed in the first post of the thread below, and refer to the 1.5TB model 
>(WD15EADS), but might also be applicable to the WD20EADS model too.

http://opensolaris.org/jive/thread.jspa?threadID=121871

Cheers,
Simon
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread David Dyer-Bennet

On Tue, February 2, 2010 09:58, Tim Cook wrote:

> It's called spreading the costs around.  Would you really rather pay 10x
> the price on everything else besides the drives?

This seems to miss the point.  I presented an argument for why I think the
qualified drives are a huge profit-center, not just making a reasonable
profit on the work of qualification.

In general, I'd much rather pay reasonable costs for each piece, rather
than weird costs artificially shoved around to make things come out some
strange way somebody favors.

> This is essentially Sun's
> way
> of tiered pricing.  Rather than charge you a software fee based on how
> much
> storage you have, they increase the price of the drives.  Seems fairly
> reasonable to me... it gives a low point of entry for people that don't
> need
> that much storage without using ridiculous capacity based licensing on
> software.

It works great for me personally -- I'm using the software with other
people's hardware, for free.

But why should people who need a lot of storage pay proportionally more? 
I don't get that, that's grossly wrong.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread Tim Cook
On Tue, Feb 2, 2010 at 9:45 AM, David Dyer-Bennet  wrote:

>
> On Tue, February 2, 2010 01:27, Tim Cook wrote:
>
> > Except you think the original engineering is just a couple grand, and
> > that's
> > where you're wrong.  I hate the prices just as much as the next guy, but
> > they do in fact need to feed their families.  In fact, they need to do a
> > hell of a lot more than that, including paying off what is likely bumping
> > up
> > against a 6-figure education for the engineering degrees they hold to
> > design
> > that hardware.  If it were easy, everyone would be doing it, and it
> > wouldn't
> > be expensive.
>
> I don't think the complaint is mostly about the part Sun engineered,
> though; it seems to me the complaint is about the price of what looks and
> smells to many of us like essentially the same disk drive we can buy for
> 1/10 the price at Newegg.
>
> Now, I'm sure not ALL drives offered at Newegg could qualify; but the
> question is, how much do I give up by buying an enterprise-grade drive
> from a major manufacturer, compared to the Sun-certified drive?
>
> I can easily believe that Sun and the vendors spend many man-months
> qualifying a drive, and that perhaps there are even custom firmware
> versions for the Sun-certified drives.  For a single drive, what's a
> not-crazy estimate?  12 man-months?  Fully-loaded man-months costing about
> $250k/year?  If they sell 10,000 of that drive, the per-drive cost of
> qualification would seem to be $25.  Of course, I made up all the numbers,
> and have no idea if the quantity in particular is sane or not.  Anyway,
> it's thinking like this that leads some of us to feel that a $900 premium
> is not reasonable.  I suppose anybody who really knows sales numbers can't
> talk about them, and the same for some of the other bits.  Still, the way
> to combat misperceptions (if these are in fact misperceptions) is with
> more accurate information.
>
> (I actually worked in the group Thumper came out of for a bit; I was
> working on the user interface software for the video streaming software
> that Thumper was developed to provide storage for.  I was with Sun
> 2005-2008.)
>
> > If you think they're overcharging, you're more than welcome to go into
> > business, undercut the shit out of them, and still make a ton of money,
> > since you think they 're charging 10x market value.
>
> I also think I'm not qualified to do that; I'm a software guy, neither
> management nor marketing nor hardware engineering.
>
> Also, if they really are charging 10x, then they can easily cut prices to
> compete with any upstarts, a fact that potential investors would take note
> of.
>
>
>
It's called spreading the costs around.  Would you really rather pay 10x the
price on everything else besides the drives?  This is essentially Sun's way
of tiered pricing.  Rather than charge you a software fee based on how much
storage you have, they increase the price of the drives.  Seems fairly
reasonable to me... it gives a low point of entry for people that don't need
that much storage without using ridiculous capacity based licensing on
software.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to grow ZFS on growing pool?

2010-02-02 Thread Carsten Aulbert
Hi Jörg,

On Tuesday 02 February 2010 16:40:50 Joerg Schilling wrote:
> After that, the zpool did notice that there is more space:
> 
> zpool list
> NAME   SIZE   USED  AVAILCAP  HEALTH  ALTROOT
> test   476M  1,28M   475M 0%  ONLINE  -
> 

That's the size already after the initial creation, after exporting and 
importing it again:

# zpool list
NAMESIZE   USED  AVAILCAP  HEALTH  ALTROOT
test976M   252K   976M 0%  ONLINE  -

> the ZFS however did not grow:
> 
> zfs list
> NAME USED  AVAIL  REFER  MOUNTPOINT
> test 728K   251M   297K  /test
> 

# zfs list test
NAME   USED  AVAIL  REFER  MOUNTPOINT
test   139K   549M  37.5K  /test


I think you fell into the tarp that zpool just adds up all rows, especially 
visible on a thumper when it's under heavy load, the read and write operations 
per time slice for each vdev seem to be just the individual sums of the 
devices underneath.

But this still does not explain why the pool is larger ater exporting and 
reimporting.

Cheers

Carsten

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread David Dyer-Bennet

On Tue, February 2, 2010 01:27, Tim Cook wrote:

> Except you think the original engineering is just a couple grand, and
> that's
> where you're wrong.  I hate the prices just as much as the next guy, but
> they do in fact need to feed their families.  In fact, they need to do a
> hell of a lot more than that, including paying off what is likely bumping
> up
> against a 6-figure education for the engineering degrees they hold to
> design
> that hardware.  If it were easy, everyone would be doing it, and it
> wouldn't
> be expensive.

I don't think the complaint is mostly about the part Sun engineered,
though; it seems to me the complaint is about the price of what looks and
smells to many of us like essentially the same disk drive we can buy for
1/10 the price at Newegg.

Now, I'm sure not ALL drives offered at Newegg could qualify; but the
question is, how much do I give up by buying an enterprise-grade drive
from a major manufacturer, compared to the Sun-certified drive?

I can easily believe that Sun and the vendors spend many man-months
qualifying a drive, and that perhaps there are even custom firmware
versions for the Sun-certified drives.  For a single drive, what's a
not-crazy estimate?  12 man-months?  Fully-loaded man-months costing about
$250k/year?  If they sell 10,000 of that drive, the per-drive cost of
qualification would seem to be $25.  Of course, I made up all the numbers,
and have no idea if the quantity in particular is sane or not.  Anyway,
it's thinking like this that leads some of us to feel that a $900 premium
is not reasonable.  I suppose anybody who really knows sales numbers can't
talk about them, and the same for some of the other bits.  Still, the way
to combat misperceptions (if these are in fact misperceptions) is with
more accurate information.

(I actually worked in the group Thumper came out of for a bit; I was
working on the user interface software for the video streaming software
that Thumper was developed to provide storage for.  I was with Sun
2005-2008.)

> If you think they're overcharging, you're more than welcome to go into
> business, undercut the shit out of them, and still make a ton of money,
> since you think they 're charging 10x market value.

I also think I'm not qualified to do that; I'm a software guy, neither
management nor marketing nor hardware engineering.

Also, if they really are charging 10x, then they can easily cut prices to
compete with any upstarts, a fact that potential investors would take note
of.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 3ware 9650 SE

2010-02-02 Thread Alexandre MOREL
Hi,

It's a few day now that I try to use a 9650SE 3ware controller to work on 
opensolaris and I found the following problem :
the tw driver seems work, I can see my controller whith the tw_cli of 3ware. I 
can see that 2 drives are created with the controller, but when I try to use 
"pfexec format", it doesn't detect the drive. 

I use : SunOS nas-ceat 5.11 snv_131 i86pc i386 i86pc Solaris with the latest 
version of the driver of the 3ware support. 

Someone can help me to solve my problem ? 

Thanks in advance,

Alex
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] How to grow ZFS on growing pool?

2010-02-02 Thread Joerg Schilling
Hi,

I am trying to find a way to grow the filesystems in a thumper.
The idea is to take single disks offline and to replace them by bigger
ones.

For this reason, I did run the following test:

mkfile 100m f1
mkfile 100m f2
mkfile 100m f3
mkfile 100m f4
mkfile 100m f5

mkfile 200m F1
mkfile 200m F2
mkfile 200m F3
mkfile 200m F3
mkfile 200m F4

zpool create test raidz2 /export/home/tmp/Z/f*

create some files ...


zpool offline test /export/home/tmp/Z/f5 
zpool replace test /export/home/tmp/Z/f5 /export/home/tmp/Z/F5 
zpool offline test /export/home/tmp/Z/f4 
zpool replace test /export/home/tmp/Z/f4 /export/home/tmp/Z/F4 
zpool offline test /export/home/tmp/Z/f3 
zpool replace test /export/home/tmp/Z/f3 /export/home/tmp/Z/F3 
zpool offline test /export/home/tmp/Z/f2 
zpool replace test /export/home/tmp/Z/f2 /export/home/tmp/Z/F2 
zpool offline test /export/home/tmp/Z/f1 
zpool replace test /export/home/tmp/Z/f1 /export/home/tmp/Z/F1 

After that, the zpool did notice that there is more space:

zpool list 
NAME   SIZE   USED  AVAILCAP  HEALTH  ALTROOT 
test   476M  1,28M   475M 0%  ONLINE  - 

the ZFS however did not grow:

zfs list 
NAME USED  AVAIL  REFER  MOUNTPOINT 
test 728K   251M   297K  /test 

How can I tell the ZFS that it should use the whole new space?

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool status output confusing

2010-02-02 Thread Tonmaus
Hi James,

am I right to understand that in a nutshell the problem is that if page 80/83 
information is present but corrupt/inaccurate/forged (name it as you want), zfs 
will not get to down to the GUID?

regards,

Tonmaus
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool status output confusing

2010-02-02 Thread Tonmaus
Thanks. That fixed it.

Tonmaus
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread David Dyer-Bennet

On Tue, February 2, 2010 01:26, James C. McPherson wrote:

> The engineering ratings are different to what you can buy from
> your local corner PC store, and the firmware is different. The
> qualification is done with the assumption that the disks will be
> spinning every single second for a number of years, and that
> they will have a much, *much* higher duty cycle than consumer
> grade hardware.
>
> Please stop assuming that all this only costs a few pennies.
> It doesn't.

I'm pretty doubtful that the hardware differs from what I can buy from
Newegg or whatever *IF* I buy the same enterprise-grade drive model (WD
S25 or RE-4, say, rather than Caviar Blue) (I don't know what WD drives,
if any, are currently qualified for use in any Sun products.)  Just to be
clear, are you asserting that?  Or are you only asserting that the drives
that get qualified are not the cheap drive models most easily found at
your handy corner PC store?  (I have less trouble believing they might
have non-standard microcode.)

I'm simply not Sun's market (home NAS); $1000 disk drives simply do not
exist in my home; my budget doesn't stretch that far.  And I do find it
somewhat offensive that software, controller, and drives can't find enough
common standards to actually be able to work (and I'll pay the money I
need for the duty cycle I need; but in fact my home gear has run 24/7
since 1985, and I've had very little problem with disk drives.  In fact
drives most often die for me when the equipment is power-cycled).

(I've still got the corpse of at least one 300MB drive from long ago that
I paid $1500 for, come to think of it!)

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool status output confusing

2010-02-02 Thread Cindy Swearingen


Even if the pool is created with whole disks, you'll need to
use the s* identifier as I provided in the earlier reply:

# zdb -l /dev/dsk/cvtxdysz

Cindy

On 02/02/10 01:07, Tonmaus wrote:

If I run

 # zdb -l /dev/dsk/c#t#d#

the result is "failed to unpack label" for any disk attached to controllers running on ahci or arcmsr controllers.  


Cheers,

Tonmaus

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2010-02-02 Thread Tonmaus
Hi Simon,

I am running 5 WD20EADS in a raidz-1+spare on ahci controller without any 
problems I could relate to TLER or head parking.

Cheers,

Tonmaus
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread Rob Logan

> true. but I buy a Ferrari for the engine and bodywork and chassis
> engineering. It is totally criminal what Sun/EMC/Dell/Netapp do charging

its interesting to read this with another thread containing:

> timeout issue is definitely the WD10EARS disks.
> replaced 24 of them with ST32000542AS (f/w CC34), and the problem departed 
with the WD disks.

everyone needs to eat, if Ferrari spreads their NRE over
the wheels, it might be because they are light and have
been tested to not melt from the heat. Sun/EMC/Dell/Netapp
tests each of their components and sells the total "car".

I'm thankful Sun shares their research and we can build on it.
(btw, netapp ontap 8 is freebsd, and runs on std hardware
after alittle bios work :-)

Rob
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread David Magda
On Tue, February 2, 2010 02:24, matthew patton wrote:

> true. but I buy a Ferrari for the engine and bodywork and chassis
> engineering. It is totally criminal what Sun/EMC/Dell/Netapp do charging
> customers 10x the open-market rate for standard drives. A RE3/4 or NS
> drive is the same damn thing no matter if I buy it from ebay or my local
> distributor. Dell/Sun/Netapp buy drives by the container load. Oh sure, I
> don't mind paying an extra couple pennies/GB for all the strenuous efforts
> the vendors spend on firmware verification (HA!).

Tell that to Intel and their SSD firmware team, or Seagate:

http://mswhs.com/2009/01/21/seagate-hard-drive-firmware-bug/

Heck, even things as "low-end" as Netgear's ReadyNAS product specify using
only certain versions of the firmware on many drives:

http://www.readynas.com/?page_id=82

You buy enterprise drives to make sure they work as advertised and don't
drop SYNC commands on the floor and then lie to ZFS about it.

Disk may be cheap, but redundancy, iops, backups, and testing are not.

As someone else suggested, you may want to skip the "enterprise" enclosure
and go with a "consumer" one instead if price is a concern. From home
workloads it may be sufficient to meet your needs.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2010-02-02 Thread Simon Breden
> My timeout issue is definitely the WD10EARS disks.
> WD has chosen to cripple their consumer grade disks
> when used in quantities greater than one.
> 
> I'll now need to evaluate alternative supplers of low
> cost disks for low end high volume storage.
> 
> Mark.
> 
> typo ST32000542AS not NS

This was the conclusion I came to. I'm also on the hunt for some decent 
consumer-priced drives for use in a ZFS RAID setup, and I created a thread to 
try to find which ones people recommend. See here:
http://opensolaris.org/jive/thread.jspa?threadID=121871

So far, I'm inclined to think that the Samsung HD154UI 1.5TB, and possibly the 
Samsung HD203WI 2TB drives might be the most reliable choices at the moment, 
based on the data in that thread and checking user reports.

Cheers,
Simon

http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread Orvar Korvar
Ok, I see that the chassi contains a mother board. So never mind that question.

Another q:
Is it possible to have large chassi with lots of drives, and the opensolaris in 
another chassi, how do you connect them both?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread Orvar Korvar
A dumb question:

I see 24 drives in an external chassi. I presume that chassis does only hold 
drives, it does not hold a motherboard. 

How do you connect all drives to your OpenSolaris server? Do you place them 
next to each other, and then you have three 8 SATA ports in your OpenSolaris 
server, and have 24 SATA cables in the air? And the chassis are wide open?

Or, am I wrong, does the chassi also hold a motherboard?

Is it possible to have the server in one chassi, and the drives in another 
chassi - how do you connect everything?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool status output confusing

2010-02-02 Thread James C. McPherson

On  2/02/10 06:52 PM, Moshe Vainer wrote:

I beileve to have seen the same issue. Mine was documented as:

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6843555

Areca did issue a fixed firmware, but i can't say whether that indeed
was the end of it, since we didn't do a controlled disk mixing
experiment since then.

I did find it strange that this is needed, since zfs is supposed imho
to id the devices. However, I got an explanation from James McPherson
about how the disk identification works, and it seems to reasonably
explain why it will work with some controllers and not with others.

I will ask James's permission to publish parts of his e-mail here. Hope
he has no issue with that.



Here's what I sent to Moshe a few months back, in the context
of http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6843555
(Invalid vdevs after adding drives in snv_111)

===

What ZFS looks for firstly is the device ID (devid), which is part
of the SCSI INQUIRY Page83 response.

I have just now requested that Areca ensure that this information
does _not_ wander around, but stays with the physical device.

If the Page83 information is not available, then the devid framework
falls back to using the Page80 information, and if *that* fails,
then it fakes a response based on the device's reported serial number.

If ZFS cannot open the device via devid, it falls back to looking
at the physical path.

If the devid does not wander, then there is no need to look at the
physical path to open the device, hence there is no problem for ZFS.

Assuming that Areca's fix does in fact resolve this wandering problem,
then there is no problem elsewhere.

===



James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread Brandon High
On Mon, Feb 1, 2010 at 8:58 PM, matthew patton  wrote:
> what with the home NAS conversations, what's the trick to buy a J4500 without 
> any drives? SUN like every other "enterprise" storage vendor thinks it's ok 
> to rape their customers and I for one, am not interested in paying 10x for a 
> silly SATA hard drive.

To get the topic back to the original question...

There are Supermicro chassis that you can use. This one holds 36
drives, 24 front and 12 rear:
http://www.supermicro.com/products/chassis/4U/847/SC847E16-R1400.cfm ($1800)

This one holds "only" 24 drives:
http://www.supermicro.com/products/chassis/4U/846/SC846TQ-R900.cfm ($950)

Either of those with the CSE-PTJBOD-CB1 ($30), CBL-0166L ($40) and
CBL-0167L($40) parts will be a monster JBOD.

That being said, we used to get Dell JBODs at a previous job, and I
remember the 12 drive shelves being cheap - cheaper than buying bare
drives. This was 8 years ago, so I'm not sure if they're discounting
their drives as much.

-B

-- 
Brandon High : bh...@freaks.com
Mistakes are often the stepping stones to utter failure.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool status output confusing

2010-02-02 Thread Moshe Vainer
I beileve to have seen the same issue. Mine was documented as:

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6843555

Areca did issue a fixed firmware, but i can't say whether that indeed was the 
end of it, since we didn't do a controlled disk mixing experiment since then. 

I did find it strange that this is needed, since zfs is supposed imho to id the 
devices. However, I got an explanation from James McPherson about how the disk 
identification works, and it seems to reasonably explain why it will work with 
some controllers and not with others.

I will ask James's permission to publish parts of his e-mail here. Hope he has 
no issue with that.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool status output confusing

2010-02-02 Thread Tonmaus
Goog morning Cindy,

> Hi,
> 
> Testing how ZFS reacts to a failed disk can be
> difficult to anticipate
> because some systems don't react well when you remove
> a disk.
I am in the process of finding that out for my systems. That's why I am doing 
these tests. 
> On an
> x4500, for example, you have to unconfigure a disk
> before you can remove
> it.
I have made similar experience already with disks attached over ahci. Still 
zpool status won't recognize that they have been removed immediately or 
sometimes not at all. But that's stuff for another thread.
> 
> Before removing a disk, I would consult your h/w docs
> to see what the
> recommended process is for removing components.
Spec-wise all drives, backplanes, controllers and their drivers I am using 
would support hotplug. Still, ZFS seems to have difficulties.
> 
> Swapping disks between the main pool and the spare
> pool isn't an
> accurate test of a disk failure and a spare kicking
> in.

That's correct. You may want to note that it wasn't subject of my test 
procedure. I have just intentionally mixed up some disks.

> 
> If you want to test a spare in a ZFS storage pool
> kicking in, then yank 
> a disk from the main pool (after reviewing your h/w
> docs) and observe 
> the spare behavior.
I am aware of that procedure. Thanks. 

> If a disk fails in real time, I
> doubt it will be
> when the pool is exported and the system is shutdown.

Agreed. Once again: the export, reboot, import sequence was specifically 
followed to eliminate any side fx of hotplug behaviour.

> 
> In general, ZFS pools don't need to be exported to
> replace failed disks.
> I've seen unpredictable behavior when
> devices/controllers change on live 
> pools. I would review the doc pointer I provided for
> recommended disk
> replacement practices.
> 
> I can't comment on the autoreplace behavior with a
> pool exported and
> a swap of disks. Maybe someone else can. The point of
> the autoreplace
> feature is to allow you to take a new replacement
> disk and automatically
> replace a failed disk without having to use the zpool
> replace command.
> Its not a way to swap existing disks in the same
> pool.

The interesting point about this is to finding out if one will be able to i.e. 
replace a controller with a different type in case of a hardware failure, or 
even just move the physical discs to a different enclosure for any imaginable 
reason. Once again, the naive assumption was that ZFS will automatically find 
the members of a previously exported pool by information (metadata) present on 
each of the pool members (disks, vdevs, files, whatever).
The situation now after scrub has finished is that the pool reports without any 
"known data errors", but still with the dubious reporting of the same device 
c7t11d0 both in available Spare status and online pool member at the same time. 
The status sticks with another export/import cycle (this time without an 
intermediate reboot).
The next steps for me will be to change the controller with a mpt driven type 
and rebuild the pool from scratch. Then I may repeat the test.
Thanks so far for your support. I have learned a lot.

Regards,

Sebastian
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool status output confusing

2010-02-02 Thread Tonmaus
If I run

 # zdb -l /dev/dsk/c#t#d#

the result is "failed to unpack label" for any disk attached to controllers 
running on ahci or arcmsr controllers.  

Cheers,

Tonmaus
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss