Re: [zfs-discuss] ZFS & iSCSI: where do do the mirroring/raidz

2006-08-02 Thread Spencer Shepler
On Wed, Darren J Moffat wrote:
> I have 12 36G disks (in a single D2 enclosure) connected to a V880 that 
> I want to "share" to a v40z that is on the same gigabit network switch.
> I've already decided that NFS is not the answer - the performance of ON 
> consolidation builds over NFS just doesn't cut it for me.

?

With a locally attached 3510 array on a 4-way v40z, I have been 
able to do a full nighly build in 1 hour 7 minutes.  
With NFSv3 access, from the same system, to a couple of 
different NFS servers, I have been able to achieve 1 hour 15 minutes 
in one case and 1 hour 22 minutes in the other.

Is that too slow?

Spencer

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Clones and "rm -rf"

2006-08-02 Thread Tom Simpson
Can anyone help? I have a cloned filesystem (/u05) from a snapshot of /u02.

The owner/group of the clone is (oracle:dba).

If I do

oracle% cd /u05/app
oracle% rm -rf R2DIR

.. All the files in the R2DIR tree are removed, but none of the 
(sub)directories.  If I run the same "rm -rf" as root, the directory tree 
itself is removed (ie. what I would expect)

Any ideas?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Best Practices for StorEdge 3510 Array and ZFS

2006-08-02 Thread prasad
I have a StorEdge 3510 FC array which is currently configured in the following 
way:

* logical-drives 

LDLD-IDSize  Assigned  Type   Disks Spare  Failed Status 

ld0   255ECBD0   2.45TB  Primary   RAID5  102  0  Good  
 Write-Policy Default  StripeSize 128KB


What are the best practices of using ZFS on this array so that I can benefit 
from both ZFS and HW RAID?

Thanks in advance,
-- prasad
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Clones and "rm -rf"

2006-08-02 Thread Tom Simpson
Actually, just tried this on a non-cloned filesystem with the same results.  I 
can't believe there is a bug with "rm -rf", so is this something to do with 
ACLs ?

Help!

Tom
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Web Administration Tool

2006-08-02 Thread Ron Halstead
Why does the Java Web Console service keep going into maintenance mode? This 
has happened for the past few builds (current is nv44). It works for a day or 
so after a new install then it breaks. Here the the symptoms:

sol11:$ svcs -x
svc:/system/webconsole:console (java web console)
 State: maintenance since Wed Aug 02 08:33:26 2006
Reason: Start method exited with $SMF_EXIT_ERR_FATAL.
   See: http://sun.com/msg/SMF-8000-KS
   See: smcwebserver(1M)
   See: /var/svc/log/system-webconsole:console.log
Impact: This service is not running.

sol11:$ tail /var/svc/log/system-webconsole:console.log(machine 
has just booted).
[ Aug  2 08:33:06 Executing start method ("/lib/svc/method/svc-webconsole 
start") ]
Sun Java(TM) Web Console status can not be determined.
Run "smcwebserver stop" to make sure the server has stopped.
[ Aug  2 08:33:26 Method "start" exited with status 95 ]

I've run smcwebserver stop and svcadm clear svc:/system/webconsole:console then 
svcadm enable webconsole and get the same results.

The Java Web Console works perfectly on Solaris 10 6/06.

Ron Halstead
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Clones and "rm -rf"

2006-08-02 Thread Anton B. Rang
Just a thought -- unmount /u05 and check what 'ls -l /u05' shows.  If the 
permissions on the directory that you mount onto are wrong (not 
world-executable, should be 0755), rm -r (and many other commands) will fail in 
mysterious ways
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Clones and "rm -rf"

2006-08-02 Thread Tom Simpson
I'm not at the machine to check at the moment, but I didn't create the /u05 
mountpoint manually.  ZFS created it automatically when I did :-

% zfs set mountpoint=/u05 zfspool/u05

You would hope that ZFS didn't get the underlying permissions wrong!
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Clones and "rm -rf"

2006-08-02 Thread Mark Shellenbaum

Tom Simpson wrote:

Can anyone help? I have a cloned filesystem (/u05) from a snapshot of /u02.

The owner/group of the clone is (oracle:dba).

If I do

oracle% cd /u05/app
oracle% rm -rf R2DIR



Are you sure you have adequate permissions to descend into and remove 
the subdirectories?




.. All the files in the R2DIR tree are removed, but none of the (sub)directories.  If I 
run the same "rm -rf" as root, the directory tree itself is removed (ie. what I 
would expect)

Any ideas?
 
 
This message posted from opensolaris.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Clones and "rm -rf"

2006-08-02 Thread Tom Simpson
I'm not at the machine to check at the moment, but I didn't create the /u05 
mountpoint manually.  ZFS created it automatically when I did :-

% zfs set mountpoint=/u05 zfspool/u05

You would hope that ZFS didn't get the underlying permissions wrong!
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 3510 JBOD ZFS vs 3510 HW RAID

2006-08-02 Thread Torrey McMahon

Luke Lonergan wrote:

Torrey,

On 8/1/06 10:30 AM, "Torrey McMahon" <[EMAIL PROTECTED]> wrote:

  

http://www.sun.com/storagetek/disk_systems/workgroup/3510/index.xml

Look at the specs page.



I did.

This is 8 trays, each with 14 disks and two active Fibre channel
attachments.

That means that 14 disks, each with a platter rate of 80MB/s will be driven
over a 400MB/s pair of Fibre Channel connections, a slowdown of almost 3 to
1.

This is probably the most expensive, least efficient way to get disk
bandwidth available to customers.
  



Luke - I think you have latched on to a comparison of Thumper to a 3510. 
With the exception of my note concerning blanket blanket statements and 
assumptions I've been referring to the original question and subject of 
comparing the performance of  3510 JBOD to 3510RAID.



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS & iSCSI: where do do the mirroring/raidz

2006-08-02 Thread Richard Elling

Darren J Moffat wrote:

performance, availability, space, retention.


OK, something to work with.  I would recommend taking advantage of ZFS'
dynamic stripe over 2-disk mirrors.  This should give good performance,
with good data availability.  If you monitor the status of the disks
regularly, or do not have a 24x7x365 requirement, then you may want the
performance of two more disks over the availability and retention gained
by spares.

In general, the more devices you have, the better performance you can get
(iops * N), but also the worse reliability (MTBF / N).  High availability
is achieved by a combination of reducing risk (diversity), adding
redundancy, and decreasing recovery time (spares).  High retention is
gained by increasing redundancy and decreasing recovery time.

[for the archives]
If you do not have a large up-front performance or space requirement, then
you can take advantage ZFS' dynamic growth.  For example, if today you
only need 30 GBytes, then you could have a 2-disk mirror with a bunch of
spares.  Spin down (luxadm stop)[1] the spares or turn off the power to the
unused disks (luxadm power_off)[2] to improve their reliability and save power.
As your space needs grow, add disks in mirrored pairs.  This will optimize
your space usage and reliability -> better availability and retention.

[1] somebody will probably chime in and say that this isn't supported.
It does work well, though.  For spun-down disks, Solaris will start them
when an I/O operation is issued.

[2] may not work for many devices.
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Clones and "rm -rf"

2006-08-02 Thread Mark Shellenbaum

Tom Simpson wrote:

After I created the filesystem and moved all the data in, I did :-

root% chown -R oracle:dba /u05


All that does is change the owner/group of the files/directories.  It 
doesn't change the permissions of the directories and files.


What are the permissions of the directories you are trying to delete?

Can you gather some truss output from the "rm"?

  -Mark
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] raidz -> raidz2

2006-08-02 Thread Frank Cusack

Will it be possible to update an existing raidz to a raidz2?  I wouldn't
think so, but maybe I'll be pleasantly surprised.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] raidz -> raidz2

2006-08-02 Thread Noel Dellofano
Your suspicions are correct,  it's not possible to upgrade an  
existing raidz pool to raidz2.  You'll actually have to create the  
raidz2 pool from scratch.


Noel
On Aug 2, 2006, at 10:02 AM, Frank Cusack wrote:

Will it be possible to update an existing raidz to a raidz2?  I  
wouldn't

think so, but maybe I'll be pleasantly surprised.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 3510 JBOD ZFS vs 3510 HW RAID

2006-08-02 Thread Jonathan Edwards


On Aug 1, 2006, at 22:23, Luke Lonergan wrote:


Torrey,

On 8/1/06 10:30 AM, "Torrey McMahon" <[EMAIL PROTECTED]> wrote:


http://www.sun.com/storagetek/disk_systems/workgroup/3510/index.xml

Look at the specs page.


I did.

This is 8 trays, each with 14 disks and two active Fibre channel
attachments.

That means that 14 disks, each with a platter rate of 80MB/s will  
be driven
over a 400MB/s pair of Fibre Channel connections, a slowdown of  
almost 3 to

1.

This is probably the most expensive, least efficient way to get disk
bandwidth available to customers.

WRT the discussion about "blow the doors", etc., how about we see some
bonnie++ numbers to back it up.



actually .. there's SPC-2 vdbench numbers out at:
http://www.storageperformance.org/results

see the full disclosure report here:
http://www.storageperformance.org/results/b5_Sun_SPC2_full- 
disclosure_r1.pdf


of course that's a 36GB 15K FC system with 2 expansion trays, 4HBAs  
and 3 yrs maintenance in the quote that was spec'd at $72K list (or  
$56/GB) .. (i'll use list numbers for comparison since they're the  
easiest )


if you've got a copy of the vdbench tool you might want to try the  
profiles in the appendix on a thumper - I believe the bonnie/bonnie++  
numbers tend to skew more on single threaded low blocksize memory  
transfer issues.


now to bring the thread full circle to the original question of price/ 
performance and increasing the scope to include the X4500 .. for  
single attached low cost systems, thumper is *very* compelling  
particularly when you factor in the density .. for example using list  
prices from http://store.sun.com/


X4500 (thumper) w/ 48 x 250GB SATA drives = $32995 = $2.68/GB
X4500 (thumper) w/ 48 x 500GB SATA drives = $69995 = $2.84/GB
SE3511 (dual controller) w/ 12 x 500GB SATA drives = $36995 = $6.17/GB
SE3510 (dual controller) w/ 12 x 300GB FC drives = $48995 = $13.61/GB

So a 250GB SATA drive configured thumper (server attached with 16GB  
of cache .. err .. RAM) is 5x less in cost/GB than a 300GB FC drive  
configured 3510 (dual controllers w/ 2 x 1GB typically mirrored  
cache) and a 500GB SATA drive configured thumper (server attached) is  
2.3x less in cost/GB than a 500GB SATA drive configured 3511 (again  
dual controllers w/ 2 x 1GB typically mirrored cache)


For a single attached system - you're right - 400MB/s is your  
effective throttle (controller speeds actually) on the 3510 and your  
realistic throughput on the 3511 is probably going to be less than  
1/2 that number if we factor in the back pressure we'll get on the  
cache against the back loop  .. your bonnie ++ block transfer numbers  
on a 36 drive thumper were showing about 424MB/s on 100% write and  
about 1435MB/s on 100% read .. it'd be good to see the vdbench  
numbers as well (but i've have a hard time getting my hands on one  
since most appear to be out at customer sites)


Now with thumper - you are SPoF'd on the motherboard and operating  
system - so you're not really getting the availability aspect from  
dual controllers .. but given the value - you could easily buy 2 and  
still come out ahead .. you'd have to work out some sort of timely  
replication of transactions between the 2 units and deal with failure  
cases with something like a cluster framework.  Then for multi- 
initiator cross system access - we're back to either some sort of NFS  
or CIFS layer or we could always explore target mode drivers and  
virtualization .. so once again - there could be a compelling  
argument coming in that arena as well.  Now, if you already have a  
big shared FC infrastructure - throwing dense servers in the middle  
of it all may not make the most sense yet - but on the flip side, we  
could be seeing a shrinking market for single attach low cost arrays.


Lastly (for this discussion anyhow) there's the reliability and  
quality issues with SATA vs FC drives (bearings, platter materials,  
tolerances, head skew, etc) .. couple that with the fact that dense  
systems aren't so great when they fail .. so I guess we're right back  
to choosing the right systems for the right purposes (ZFS does some  
great things around failure detection and workaround) .. but i think  
we've beat that point to death ..


---
.je
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 3510 JBOD ZFS vs 3510 HW RAID

2006-08-02 Thread Richard Elling

Jonathan Edwards wrote:
Now with thumper - you are SPoF'd on the motherboard and operating 
system - so you're not really getting the availability aspect from dual 
controllers .. but given the value - you could easily buy 2 and still 
come out ahead .. you'd have to work out some sort of timely replication 
of transactions between the 2 units and deal with failure cases with 
something like a cluster framework.


No.  Shared data clusters require that both nodes have access to the
storage.  This is not the case for a thumper, where the disks are not
dual-ported and there is no direct access to the disks from an external
port.  Thumper is not a conventional highly-redundant RAID array.
Comparing thumper to a SE3510 on a feature-by-feature basis is truly
like comparing apples and oranges.

As far as SPOFs go, all systems which provide a single view of data
have at least one SPOF.  Claiming a RAID array does not have a SPOF is
denying truth.

Then for multi-initiator cross 
system access - we're back to either some sort of NFS or CIFS layer or 
we could always explore target mode drivers and virtualization .. so 
once again - there could be a compelling argument coming in that arena 
as well.  Now, if you already have a big shared FC infrastructure - 
throwing dense servers in the middle of it all may not make the most 
sense yet - but on the flip side, we could be seeing a shrinking market 
for single attach low cost arrays.


From a space perspective, I can put a TByte on my desktop today.  Death
of the low-end array is assured by bigger drives.

Lastly (for this discussion anyhow) there's the reliability and quality 
issues with SATA vs FC drives (bearings, platter materials, tolerances, 
head skew, etc) .. couple that with the fact that dense systems aren't 
so great when they fail .. so I guess we're right back to choosing the 
right systems for the right purposes (ZFS does some great things around 
failure detection and workaround) .. but i think we've beat that point 
to death ..


Agree, in principle.  However, the protocol used to connect to the host
is immaterial to the quality of the device.  The market segments determine
the quality of the device, and the drive vendors find it in their best
interest to keep consumer devices inexpensive at all costs, and achieve higher
margins on enterprise class devices.  What we've done for thumper is to use a
top-of-the-line quality SATA drive.  AFAIK today, the vendor is Hitachi,
though we like to have multiple sources, if they can meet the specifications.
Often the vendor and part information is available on the SunSolve Systems
Handbook, http://sunsolve.sun.com/handbook_pub/Systems under the Full
Components List selection for the specific system.  Today, the Sun Fire
X4500 is not listed as it has not reached general availability, yet.  Look
for it soon.

So, what is thumper good for?  Clearly, it can store a lot of data in a
redundant manner (eg. good for retention).  GreenPlum, http://www.greenplum.com
is building data warehouses with them.  Various people are interested in them
for streaming media.  We don't really know what else it will be used for,
there isn't much to compare against in the market.  What we do know is that
it won't be appropriate for replacing your SE9985 on your ERP system.
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 3510 JBOD ZFS vs 3510 HW RAID

2006-08-02 Thread Luke Lonergan
Richard,


On 8/2/06 11:37 AM, "Richard Elling" <[EMAIL PROTECTED]> wrote:

>> Now with thumper - you are SPoF'd on the motherboard and operating
>> system - so you're not really getting the availability aspect from dual
>> controllers .. but given the value - you could easily buy 2 and still
>> come out ahead .. you'd have to work out some sort of timely replication
>> of transactions between the 2 units and deal with failure cases with
>> something like a cluster framework.
> 
> No.  Shared data clusters require that both nodes have access to the
> storage.  This is not the case for a thumper, where the disks are not
> dual-ported and there is no direct access to the disks from an external
> port.  Thumper is not a conventional highly-redundant RAID array.
> Comparing thumper to a SE3510 on a feature-by-feature basis is truly
> like comparing apples and oranges.

That's why Thumper DW is a shared nothing fully redundant data warehouse.
We replicate the data among systems so that we can lose up to half of the
total server count while processing.

Basket of Apples  >>> one big apple with a worm in it.

- Luke


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 3510 JBOD ZFS vs 3510 HW RAID

2006-08-02 Thread Torrey McMahon

Richard Elling wrote:

Jonathan Edwards wrote:
Now with thumper - you are SPoF'd on the motherboard and operating 
system - so you're not really getting the availability aspect from 
dual controllers .. but given the value - you could easily buy 2 and 
still come out ahead .. you'd have to work out some sort of timely 
replication of transactions between the 2 units and deal with failure 
cases with something like a cluster framework.


No.  Shared data clusters require that both nodes have access to the
storage.  This is not the case for a thumper, where the disks are not
dual-ported and there is no direct access to the disks from an external
port.  Thumper is not a conventional highly-redundant RAID array.
Comparing thumper to a SE3510 on a feature-by-feature basis is truly
like comparing apples and oranges.


Apples and pomegranates perhaps?

You could drop the iSCSI target on it and share the drives ala zvols. 
The "what is an array, what is a server, what is both" discussion gets 
interesting based on the qualities of the thing that holds the disks.




As far as SPOFs go, all systems which provide a single view of data
have at least one SPOF.  Claiming a RAID array does not have a SPOF is
denying truth.



Its the amount of SPOFs and the overall reliability that I think 
Jonathan was referring too. Of course, we're all systems folks so 
component failure is always in the back of our mind, right? ;)




From a space perspective, I can put a TByte on my desktop today.  Death
of the low-end array is assured by bigger drives.



Its a sliding window. What was midrange ten years ago is low-end or 
desktop today in the capacity and many cases performance context. 
Reliability and availability not so much.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS Web Administration Tool

2006-08-02 Thread Stephen Talley
From talking with the web console (Lockhart) folks, this appears to
be a manifestation of:

6430996 The SMF services related to smcwebserver goes to maintainance
state after node reboot

This will be fixed in build 46 of Solaris Nevada.

Details, including workaround:

> I believe this is Lockhart 3.0.1.  You may be hitting a known
> problem (I think, based on the SVC log messages).  The problem
> arises when the system is stopped without explicitly stopping the
> Lockhart console thats running.  This leaves some bad config files
> leftover that prevent the next start from determining if the web
> server process gets started.  This then results in returning a fatal
> error to the SMF restarter; and you are in maintenance mode.
> Clearing does not help as long as the bad files are left around.
>
> Sometimes things are a bit more complicated; we sometimes fail to
> stop the server process and it hangs around.  This also prevents
> restart.  The documented workaround for this is:
>
> 1) Stop the console before rebooting the OS
>
> 2) If cannot do (2), then after reboot (if things fail)
>
>svcadm disable system/webconsole:console
>ps -ef | grep noaccess | grep server
>kill  // If ps finds a process
>rm -f /var/webconsole/tmp/console_*.tmp
>smcwebserver start
>
>smcwebserver enable   // If need to start on reboot

Thanks,

Steve

Ron Halstead wrote:

> Why does the Java Web Console service keep going into maintenance mode? This 
> has happened for the past few builds (current is nv44). It works for a day or 
> so after a new install then it breaks. Here the the symptoms:
>
> sol11:$ svcs -x
> svc:/system/webconsole:console (java web console)
>  State: maintenance since Wed Aug 02 08:33:26 2006
> Reason: Start method exited with $SMF_EXIT_ERR_FATAL.
>See: http://sun.com/msg/SMF-8000-KS
>See: smcwebserver(1M)
>See: /var/svc/log/system-webconsole:console.log
> Impact: This service is not running.
>
> sol11:$ tail /var/svc/log/system-webconsole:console.log
> (machine has just booted).
> [ Aug  2 08:33:06 Executing start method ("/lib/svc/method/svc-webconsole 
> start") ]
> Sun Java(TM) Web Console status can not be determined.
> Run "smcwebserver stop" to make sure the server has stopped.
> [ Aug  2 08:33:26 Method "start" exited with status 95 ]
>
> I've run smcwebserver stop and svcadm clear svc:/system/webconsole:console 
> then svcadm enable webconsole and get the same results.
>
> The Java Web Console works perfectly on Solaris 10 6/06.
>
> Ron Halstead
>
>
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


pgph9HC6rgoH5.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best Practices for StorEdge 3510 Array and ZFS

2006-08-02 Thread Torrey McMahon

prasad wrote:

I have a StorEdge 3510 FC array which is currently configured in the following 
way:

* logical-drives 

LDLD-IDSize  Assigned  Type   Disks Spare  Failed Status 

ld0   255ECBD0   2.45TB  Primary   RAID5  102  0  Good  
 Write-Policy Default  StripeSize 128KB



What are the best practices of using ZFS on this array so that I can benefit 
from both ZFS and HW RAID?


Are any other hosts using the array? Do you plan on carving LUNs out of 
the RAID5 LD and assigning them to other hosts?


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 3510 JBOD ZFS vs 3510 HW RAID

2006-08-02 Thread Rich Teer
On Wed, 2 Aug 2006, Richard Elling wrote:

> From a space perspective, I can put a TByte on my desktop today.  Death
> of the low-end array is assured by bigger drives.

I respectfully disagree.  I think there will always be a need for low-end
arrays, regardless of the size of the individual disks.  I like to keep my
OS and data/apps separate (on separate drives preferably)--and I doubt I'm
alone.  Many of todays smaller servers come with only two disks, which is
fine for mirroring root and swap, but the only place to put one's data is
on an external array.

There are many situations where low-end storage (in terms of numbers of
spindles) would be very useful, hence my blog entry a while ago wishing
that Sun would produce a 1U, 8-drive SAS array at an affordable price
(at least one company has such a product, but I want to buy only Sun HW).

-- 
Rich Teer, SCNA, SCSA, OpenSolaris CAB member

President,
Rite Online Inc.

Voice: +1 (250) 979-1638
URL: http://www.rite-group.com/rich
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: [Fwd: [zones-discuss] Zone boot problems after installing patches]

2006-08-02 Thread George Wilson

Dave,

I'm copying the zfs-discuss alias on this as well...

It's possible that not all necessary patches have been installed or they 
maybe hitting CR# 6428258. If you reboot the zone does it continue to 
end up in maintenance mode? Also do you know if the necessary ZFS/Zones 
patches have been updated?


Take a look at our webpage which includes the patch list required for 
Solaris 10:


http://rpe.sfbay/bin/view/Tech/ZFS

Thanks,
George

Mahesh Siddheshwar wrote:



 Original Message 
Subject: [zones-discuss] Zone boot problems after installing patches
Date: Wed, 02 Aug 2006 13:47:46 -0400
From: Dave Bevans <[EMAIL PROTECTED]>
To: zones-discuss@opensolaris.org, [EMAIL PROTECTED], 
[EMAIL PROTECTED]




Hi,

I  have a customer with the following problem.

He has a V440 running Solaris 10 1/06 with zones. In the case notes he 
says that he installed a couple Sol 10 patches and now he has problems 
booting his zones. After doing  some checking he found that it appears 
to be related to a couple of ZFS patches (122650 and 122640).  I found a 
bug (6271309 / lack of zvol breaks all ZFS commands), but not sure if it 
applies to this situation. Any ideas on this.


Here is the customers problem description...

Hardware Platform: Sun Fire V440
Component Affected: OS Base
OS and Kernel Version: SunOS snb-fton-bck2 5.10 Generic_118833-18 sun4u 
sparc SUNW,Sun-Fire-V440


Describe the problem: Patch 122650-02 combined with patch 122640-05 
seems to have broken no global zones at boot time. I'm just guessing at 
the exact patches since they were both added recently, and involve the 
files /usr/sbin/zfs and /lib/svc/method/fs-local which combined, cause 
the issue.


This section of code in /lib/svc/method/fs-local:

if [ -x /usr/sbin/zfs ]; then
   /usr/sbin/zfs mount -a >/dev/msglog 2>&1
   rc=$?
   if [ $rc -ne 0 ]; then
   msg="WARNING: /usr/sbin/zfs mount -a failed: exit status 
$rc"

   echo $msg
   echo "$SMF_FMRI:" $msg >/dev/msglog
   result=$SMF_EXIT_ERR_FATAL
   fi
fi

causes the local file system service to exit with an error, and stop the 
boot process. The reason why is that the non global zone does not have 
access to /dev/zfs so the "/usr/sbin/zfs mount -a" command exits with an 
error code.

This system is SRS Net Connect enabled: No
I will be sending an Explorer file: No
List steps to reproduce the problem(if applicable): Global zone:

bash-3.00# /usr/sbin/zfs mount -a
bash-3.00# echo $?
0


CVS Zone:

bash-3.00# zlogin cvs
[Connected to zone 'cvs' pts/2]
Last login: Tue Aug  1 11:51:58 on pts/2
Sun Microsystems Inc.   SunOS 5.10  Generic January 2005
# /usr/sbin/zfs mount -a
internal error: unable to open ZFS device

# echo $?
1


=
It Looks like /dev/zfs is not created in the non-global zone, but is 
required for

the startup script change included in patch 122650-02:


Global Zone:

bash-3.00# truss -fald -t open /usr/sbin/zfs mount -a
Base time stamp:  115288.9594  [ Tue Aug  1 11:58:08 ADT 2006 ]
16159/1: 0. execve("/sbin/zfs", 0xFFBFFD8C, 0xFFBFFD9C)  
argc = 3

16159/1: argv: /usr/sbin/zfs mount -a
...
16159/1: 0.0192 open("/etc/mnttab", O_RDONLY)   = 3
16159/1: 0.0203 open("/dev/zfs", O_RDWR)= 4


CVS Zone:
# truss -fald -t open /usr/sbin/zfs mount -a
Base time stamp:  115344.9469  [ Tue Aug  1 11:59:04 ADT 2006 ]
16198/1: 0. execve("/sbin/zfs", 0xFFBFFECC, 0xFFBFFEDC)  
argc = 3

16198/1: argv: /usr/sbin/zfs mount -a
...
16198/1: 0.0181 open("/etc/mnttab", O_RDONLY)   = 3
16198/1: 0.0191 open("/dev/zfs", O_RDWR)
Err#2 ENOENT

internal error: unable to open ZFS device

# ls -l "/dev/zfs"
/dev/zfs: No such file or directory

==
bash-3.00# zonecfg -z cvs info
zonepath: /oracle/zones/cvs
autoboot: true
pool:
inherit-pkg-dir:
   dir: /lib
inherit-pkg-dir:
   dir: /platform
inherit-pkg-dir:
   dir: /sbin
inherit-pkg-dir:
   dir: /usr
fs:
   dir: /data
   special: /data
   raw not specified
   type: lofs
   options: []
net:
   address: 142.139.95.4
   physical: ce0

When was the problem first noticed: August 1.
The problem is: staying the same
Any changes recently?: New Patch Applied
What software is having the problem?: bash-3.00# uname -a SunOS 
snb-fton-bck2 5.10 Generic_118833-18 sun4u sparc SUNW,Sun-Fire-V440 
bash-3.00# cat /etc/releaseSolaris 10 1/06 
s10s_u1wos_19a SPARCCopyright 2005 Sun Microsystems, Inc.  
All Rights Reserved. Use is subject to license 
terms.Assembled 07 December 2005





___
zones-discuss mailing list
zones-discuss@opensolaris.org


__

[zfs-discuss] Re: Best Practices for StorEdge 3510 Array and ZFS

2006-08-02 Thread prasad
Torrey McMahon <[EMAIL PROTECTED]> wrote:

> Are any other hosts using the array? Do you plan on carving LUNs out of
> the RAID5 LD and assigning them to other hosts?

There are no other hosts using the array. We need all the available space 
(2.45TB) on just one host. One option was to create 2 LUN's and use raidz.

-- prasad
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS performance using slices vs. entire disk?

2006-08-02 Thread Joseph Mocker

I know this is going to sound a little vague but...

A coworker said he read somewhere that ZFS is more efficient if you 
configure pools from entire disks instead of just slices of disks. I'm 
curious if there is any merit to this?


The use case that we had been discussing was something to the effect of 
building a 2 disk system, install the OS on slice 0 of disk 0 and make 
the rest of the disk available for 1/2 of a zfs mirror. Then disk 1 
would probably be partitioned the same, but the only thing active would 
be the other 1/2 of a zfs mirror.


Now clearly there is a contention issue between the OS and the data 
partition, which would be there if SVM mirrors were used instead. But 
besides this, is zfs any less efficient with just using a portion of a 
disk versus the entire disk?



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance using slices vs. entire disk?

2006-08-02 Thread Rich Teer
On Wed, 2 Aug 2006, Joseph Mocker wrote:

> The use case that we had been discussing was something to the effect of
> building a 2 disk system, install the OS on slice 0 of disk 0 and make the
> rest of the disk available for 1/2 of a zfs mirror. Then disk 1 would probably
> be partitioned the same, but the only thing active would be the other 1/2 of a
> zfs mirror.

Why wouldn't you mirror (using SVM) the OS slice on disk 1 too?

Sorry, can't answer the ZFS bit of the question...

-- 
Rich Teer, SCNA, SCSA, OpenSolaris CAB member

President,
Rite Online Inc.

Voice: +1 (250) 979-1638
URL: http://www.rite-group.com/rich
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS Web Administration Tool

2006-08-02 Thread Ron Halstead
Thanks Steve. The workaround (rm -f /var/webconsole/tmp/console_*.tmp) and a 
restart fixed it.
I appreciate the quick response. You guys are good!

Ron
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance using slices vs. entire disk?

2006-08-02 Thread Torrey McMahon

Joseph Mocker wrote:

I know this is going to sound a little vague but...

A coworker said he read somewhere that ZFS is more efficient if you 
configure pools from entire disks instead of just slices of disks. I'm 
curious if there is any merit to this?



If the entire disk is used in a zpool then the disk cache can, and in 
most cases is, enabled. This speeds operations up quite a bit in some 
scenarios.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance using slices vs. entire disk?

2006-08-02 Thread Robert Milkowski
Hello Joseph,

Thursday, August 3, 2006, 2:02:28 AM, you wrote:

JM> I know this is going to sound a little vague but...

JM> A coworker said he read somewhere that ZFS is more efficient if you 
JM> configure pools from entire disks instead of just slices of disks. I'm
JM> curious if there is any merit to this?

JM> The use case that we had been discussing was something to the effect of
JM> building a 2 disk system, install the OS on slice 0 of disk 0 and make
JM> the rest of the disk available for 1/2 of a zfs mirror. Then disk 1 
JM> would probably be partitioned the same, but the only thing active would
JM> be the other 1/2 of a zfs mirror.

JM> Now clearly there is a contention issue between the OS and the data 
JM> partition, which would be there if SVM mirrors were used instead. But 
JM> besides this, is zfs any less efficient with just using a portion of a
JM> disk versus the entire disk?

ZFS will try to enable write cache if whole disks is given.

Additionally keep in mind that outer region of a disk is much faster.
So if you want to put OS and then designate rest of the disk for
application then probably putting ZFS on a slice beginning on cyl 0 is
best in most scenarios.

-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS & iSCSI: where do do the mirroring/raidz

2006-08-02 Thread Darren J Moffat

Spencer Shepler wrote:

On Wed, Darren J Moffat wrote:
I have 12 36G disks (in a single D2 enclosure) connected to a V880 that 
I want to "share" to a v40z that is on the same gigabit network switch.
I've already decided that NFS is not the answer - the performance of ON 
consolidation builds over NFS just doesn't cut it for me.


?

With a locally attached 3510 array on a 4-way v40z, I have been 
able to do a full nighly build in 1 hour 7 minutes.  
With NFSv3 access, from the same system, to a couple of 
different NFS servers, I have been able to achieve 1 hour 15 minutes 
in one case and 1 hour 22 minutes in the other.


That would be perfectly acceptable.  I note you do say NFSv3 though and 
not NFSv4.  Is there a reason why you said NFSv3 and not v4 ?  I haven't 
changed the config on either machine so I'm defaulting to v4.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS & iSCSI: where do do the mirroring/raidz

2006-08-02 Thread Spencer Shepler
On Thu, Darren J Moffat wrote:
> Spencer Shepler wrote:
> >On Wed, Darren J Moffat wrote:
> >>I have 12 36G disks (in a single D2 enclosure) connected to a V880 that 
> >>I want to "share" to a v40z that is on the same gigabit network switch.
> >>I've already decided that NFS is not the answer - the performance of ON 
> >>consolidation builds over NFS just doesn't cut it for me.
> >
> >?
> >
> >With a locally attached 3510 array on a 4-way v40z, I have been 
> >able to do a full nighly build in 1 hour 7 minutes.  
> >With NFSv3 access, from the same system, to a couple of 
> >different NFS servers, I have been able to achieve 1 hour 15 minutes 
> >in one case and 1 hour 22 minutes in the other.
> 
> That would be perfectly acceptable.  I note you do say NFSv3 though and 
> not NFSv4.  Is there a reason why you said NFSv3 and not v4 ?  I haven't 
> changed the config on either machine so I'm defaulting to v4.

Mainly because that was the data I had at hand.  I have been collecting
various pieces of data and have yet to pick up the NFSv4 data.

There is additional overhead with the NFSv4 client because of the
protocol's introduction of OPEN/CLOSE operations.  Therefore, for
some workloads and hardware platforms, NFSv4 will be slower.
Builds are one of those thing that is sensitive to hardware platform
at the client.  

Once I get the data, I will followup.

Spencer
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Best Practices for StorEdge 3510 Array and ZFS

2006-08-02 Thread Jonathan Edwards


On Aug 2, 2006, at 17:03, prasad wrote:


Torrey McMahon <[EMAIL PROTECTED]> wrote:

Are any other hosts using the array? Do you plan on carving LUNs  
out of

the RAID5 LD and assigning them to other hosts?


There are no other hosts using the array. We need all the available  
space (2.45TB) on just one host. One option was to create 2 LUN's  
and use raidz.


raidz on RAID5 isn't very efficient and you'll want at least 3 lun's  
to do it .. you're calculating double parity and tying up too much of  
your drive bandwidth.


if you're going to some variation of RAID5 the best throughput you'll  
see is to *either* pick the HW RAID characteristics *or* ZFS raidz ..  
but not both .. if you want a *lot* of redundancy you could create a  
bunch of RAID10 volumes and then do a raidz on the zpool - but you're  
really going to lose a lot of capacity that way.


What you really want to do is make efficient use of the array cache  
*and* the copy on write zfs "cache" so you're doing mostly memory to  
memory transfers.  so that leaves us with 2 options (each with slight  
variations)


option 1 - raidz:
I would use all the disks in the 3510 to make either 4 x 3 disk or 6  
x 2 disk R0 volumes and balance them across the controllers (assuming  
you have 2) .. then create your raidz zpool out of all the disks ..  
the disadvantage (or advantage depending on how you look at it) here  
is that you're not using the parity engine in the 3510 and you can't  
really hot spare  from the array.. the advantage though is the  
software based error correction you'll be able to do.


option 2 - RAID5
either use the volume you already have or make 2 R5 volumes if you  
have 2 controllers to balance the LUNs .. it won't matter if they're  
the same size or not, and you should only really need 1 global hot  
spare .. then create a standard zpool with these .. the disadvantage  
is that you won't get the lovely raidz features .. but the possible  
advantage is that you've offloaded the parity calculation and  
workload from the host


Keep in mind that zfs was originally designed with JBOD in mind ..  
there's still ongoing discussions on how hw RAID fits into the  
picture with the new and lovely sw raidz and whether or not socks  
will be worn when testing one vs the other ..


---
.je

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance using slices vs. entire disk?

2006-08-02 Thread Jeff Bonwick
> ZFS will try to enable write cache if whole disks is given.
> 
> Additionally keep in mind that outer region of a disk is much faster.

And it's portable.  If you use whole disks, you can export the
pool from one machine and import it on another.  There's no way
to export just one slice and leave the others behind...

Jeff

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance using slices vs. entire disk?

2006-08-02 Thread Jeff Bonwick
> is zfs any less efficient with just using a portion of a 
> disk versus the entire disk?

As others mentioned, if we're given a whole disk (i.e. no slice
is specified) then we can safely enable the write cache.

One other effect -- probably not huge -- is that the block placement
algorithm is most optimal for an outer-to-inner track diameter ratio
of about 2:1, which reflects typical platters.  To quote the source:

http://cvs.opensolaris.org/source/xref/on/usr/src/uts/common/fs/zfs/metaslab.c#m
etaslab_weight

/*
 * Modern disks have uniform bit density and constant angular velocity.
 * Therefore, the outer recording zones are faster (higher bandwidth)
 * than the inner zones by the ratio of outer to inner track diameter,
 * which is typically around 2:1.  We account for this by assigning
 * higher weight to lower metaslabs (multiplier ranging from 2x to 1x).
 * In effect, this means that we'll select the metaslab with the most
 * free bandwidth rather than simply the one with the most free space.
 */

But like I said, the effect isn't huge -- the high-order bit that we
have a preference for low LBAs.  It's a second-order optimization
to bias the allocation based on the maximum free bandwidth, which is
currently based on an assumption about physical disk construction.
In the future we'll do the smart thing and compute each metaslab's
allocation bias based on its actual observed bandwidth.

Jeff

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss