Re: [zfs-discuss] Running on Dell hardware?

2010-10-23 Thread Henrik Johansen

'Tim Cook' wrote:

[... snip ... ]


Dell requires Dell branded drives as of roughly 8 months ago.  I don't
think there was ever an H700 firmware released that didn't require
this.  I'd bet you're going to waste a lot of money to get a drive the
system refuses to recognize.


This should no longer be an issue as Dell has abandoned that practice
because of customer pressure.


--Tim





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



--
Med venlig hilsen / Best Regards

Henrik Johansen
hen...@scannet.dk
Tlf. 75 53 35 00

ScanNet Group
A/S ScanNet 
___

zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-14 Thread Henrik Johansen

'Edward Ned Harvey' wrote:

From: Henrik Johansen [mailto:hen...@scannet.dk]

The 10g models are stable - especially the R905's are real workhorses.


You would generally consider all your machines stable now?
Can you easily pdsh to all those machines?


Yes - the only problem child has been 1 R610 (the other 2 that we have
in production have not shown any signs of trouble)


kstat | grep current_cstate ; kstat | grep supported_max_cstates

I'd really love to see if some current_cstate is higher than
supported_max_cstates is an accurate indicator of system instability.


Here's a little sample from different machines : 


R610 #1

current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  0
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2

R610 #2

current_cstate  3
current_cstate  0
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2

PE2900

current_cstate  1
current_cstate  1
current_cstate  0
current_cstate  1
current_cstate  1
current_cstate  0
current_cstate  1
current_cstate  1
supported_max_cstates   1
supported_max_cstates   1
supported_max_cstates   1
supported_max_cstates   1
supported_max_cstates   1
supported_max_cstates   1
supported_max_cstates   1
supported_max_cstates   1

PER905 
current_cstate  1

current_cstate  1
current_cstate  1
current_cstate  1
current_cstate  0
current_cstate  1
current_cstate  1
current_cstate  1
current_cstate  1
current_cstate  1
current_cstate  1
current_cstate  1
current_cstate  1
current_cstate  0
current_cstate  1
current_cstate  1
supported_max_cstates   0
supported_max_cstates   0
supported_max_cstates   0
supported_max_cstates   0
supported_max_cstates   0
supported_max_cstates   0
supported_max_cstates   0

Re: [zfs-discuss] Running on Dell hardware?

2010-10-13 Thread Henrik Johansen

'Edward Ned Harvey' wrote:

I have a Dell R710 which has been flaky for some time.  It crashes
about once per week.  I have literally replaced every piece of hardware
in it, and reinstalled Sol 10u9 fresh and clean.

I am wondering if other people out there are using Dell hardware, with
what degree of success, and in what configuration?


We are running (Open)Solaris on lots of 10g servers (PE2900, PE1950, PE2950,
R905) and some 11g (R610 and soon some R815) with both PERC and non-PERC
controllers and lots of MD1000's.

The 10g models are stable - especially the R905's are real workhorses.

We have had only one 11g server (R610) which caused trouble. The box
froze at least once a week - after replacing almost the entire box I
switched from the old iscsitgt to COMSTAR and the box has been stable
since. Go figure ...

I might add that none of these machine use the onboard Broadcom nic's.


The failure seems to be related to the perc 6i.  For some period around
the time of crash, the system still responds to ping, and anything
currently in memory or running from remote storage continues to
function fine.  But new processes that require the local storage
... Such as inbound ssh etc, or even physical login at the console
... those are all hosed.  And eventually the system stops responding to
ping.  As soon as the problem starts, the only recourse is power cycle.

I can't seem to reproduce the problem reliably, but it does happen
regularly.  Yesterday it happened several times in one day, but
sometimes it will go 2 weeks without a problem.

Again, just wondering what other people are using, and experiencing.
To see if any more clues can be found to identify the cause.



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



--
Med venlig hilsen / Best Regards

Henrik Johansen
hen...@scannet.dk
Tlf. 75 53 35 00

ScanNet Group
A/S ScanNet 
___

zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] future of OpenSolaris

2010-02-22 Thread Henrik Johansen

On 02/22/10 12:00 PM, Michael Ramchand wrote:

I think Oracle have been quite clear about their plans for OpenSolaris.
They have publicly said they plan to continue to support it and the
community.

They're just a little distracted right now because they are in the
process of on-boarding many thousand Sun employees, and trying to get
them feeling happy, comfortable and at home in their new surroundings so
that they can start making money again.

The silence means that you're in a queue and they forgot to turn the
hold music on. Have patience. :-)


Well - once thing that makes me feel a bit uncomfortable is the fact 
that you no longer can buy OpenSolaris Support subscriptions.


Almost every trace of it has vanished from the Sun/Oracle website and a 
quick call to our local Sun office confirmed that they apparently no 
longer sell them.



On 02/22/10 09:22, Eugen Leitl wrote:

Oracle's silence is starting to become a bit ominous. What are
the future options for zfs, should OpenSolaris be left dead
in the water by Suracle? I have no insight into who core
zfs developers are (have any been fired by Sun even prior to
the merger?), and who's paying them. Assuming a worst case
scenario, what would be the best candidate for a fork? Nexenta?
Debian already included FreeBSD as a kernel flavor into its
fold, it seems Nexenta could be also a good candidate.

Maybe anyone in the know could provide a short blurb on what
the state is, and what the options are.








--
Med venlig hilsen / Best Regards

Henrik Johansen
hen...@scannet.dk
Tlf. 75 53 35 00

ScanNet Group
A/S ScanNet
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] future of OpenSolaris

2010-02-22 Thread Henrik Johansen

On 02/22/10 03:35 PM, Jacob Ritorto wrote:

On 02/22/10 09:19, Henrik Johansen wrote:

On 02/22/10 02:33 PM, Jacob Ritorto wrote:

On 02/22/10 06:12, Henrik Johansen wrote:

Well - once thing that makes me feel a bit uncomfortable is the fact
that you no longer can buy OpenSolaris Support subscriptions.

Almost every trace of it has vanished from the Sun/Oracle website and a
quick call to our local Sun office confirmed that they apparently no
longer sell them.


I was actually very startled to see that since we're using it in
production here. After digging through the web for hours, I found that
OpenSolaris support is now included in Solaris support. This is a win
for us because we never know if a particular box, especially a dev box,
is going to remain Solaris or OpenSolaris for the duration of a support
purchase and now we're free to mix and mingle. If you refer to the
Solaris support web page (png attached if the mailing list allows),
you'll see that OpenSolaris is now officially part of the deal and is no
longer being treated as a second class support offering.


That would be *very* nice indeed. I have checked the URL in your
screenshot but I am getting a different result (png attached).

Ohwell - I'll just have to wait and see.


Confirmed your finding Henrik.  This is a showstopper for us as the
higherups are already quite leery of Sun/Oracle and the future of
Solaris.  I'm calling Oracle to see if I can get some answers.  The SUSE
folks recently took a big chunk of our UNIX business here and
OpenSolaris was my main tool in battling that.  For us, the loss of
OpenSolaris and its support likely indicates the end of Solaris altogether.


Well - I too am reluctant to put more OpenSolaris boxes into production 
until this matter has been resolved.


--
Med venlig hilsen / Best Regards

Henrik Johansen
hen...@scannet.dk
Tlf. 75 53 35 00

ScanNet Group
A/S ScanNet
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [indiana-discuss] future of OpenSolaris

2010-02-22 Thread Henrik Johansen

On 02/22/10 09:52 PM, Tim Cook wrote:



On Mon, Feb 22, 2010 at 2:21 PM, Jacob Ritorto jacob.rito...@gmail.com
mailto:jacob.rito...@gmail.com wrote:


Since it seems you have absolutely no grasp of what's happening here,


Coming from the guy proclaiming the sky is falling without actually
having ANY official statement whatsoever to back up that train of thought.

perhaps it would be best for you to continue to sit idly by and let
this happen.  Thanks helping out with the crude characterisations
though.


Idly let what happen?  The unconfirmed death of opensolaris that you've
certified for us all without any actual proof?


Well - the lack of support subscriptions *is* a death sentence for 
OpenSolaris in many companies and I believe that this is what the OP 
complained about.




Do you understand that the OpenSolaris page has a sunset in
it and the Solaris page doesn't?


I understand previous versions of every piece of software Oracle sells
have Sunset pages, yes.  If you read the page I sent you, it clearly
states that every release of Opensolaris gets 5 years of support from
GA.  That doesn't mean they aren't releasing another version.  That
doesn't mean they're ending the opensolaris project.  That doesn't mean
they are no longer selling support for it.  Had you actually read the
link I posted, you'd have figured that out.

Sun provides contractual support on the OpenSolaris OS for up to five
years from the product's first General Availability (GA) date as
described http://www.sun.com/service/eosl/eosl_opensolaris.html.
OpenSolaris Package Updates are released approximately every 6 months.
OpenSolaris Subscriptions entitle customers during the term of the
Customer's Subscription contract to receive support on their current
version of OpenSolaris, as well as receive individual Package Updates
and OpenSolaris Support Repository Package Updates when made
commercially available by Sun. Sun may require a Customer to download
and install Package Updates or OpenSolaris OS Updates that have been
released since Customer's previous installation of OpenSolaris,
particularly when fixes have already been

  Have you spent enough (any) time
trying to renew your contracts only to see that all mentions of
OpenSolaris have been deleted from the support pages over the last few
days?


Can you tell me which Oracle rep you've spoken to who confirmed the
cancellation of Opensolaris?  It's funny, nobody I've talked to seems to
have any idea what you're talking about.  So please, a name would be
wonderful so I can direct my inquiry to this as-of-yet unnamed source.


I have spoken to our local Oracle sales office last week because I 
wanted to purchase a new OpenSolaris support contract - I was informed 
that this was no longer possible and that Oracle is unable to provide 
paid support for OpenSolaris at this time.




  This, specifically, is what has been yanked out from under me
and my company.  This represents years of my and my team's effort and
investment.


Again, without some sort of official word, nothing has changed...


I take the official Oracle website to be rather ... official ?

Lets recap, shall we ?

a) Almost every trace of OpenSolaris Support subscriptions vanished from 
the official website within the last 14 days.


b) An Oracle sales rep informed me personally last week that I could no 
longer purchase support subscriptions for OpenSolaris.


Please, do me a favor and call your local Oracle rep and ask for an 
Opensolaris Support subscription quote and let us know how it goes ...




It says right here those contracts are for both solaris AND opensolaris.

http://www.sun.com/service/subscriptions/index.jsp

Click Sun System Service Plans
http://www.sun.com/service/serviceplans/sunspectrum/index.jsp:
http://www.sun.com/service/serviceplans/sunspectrum/index.jsp


  Sun System Service Plans for Solaris

Sun System Service Plans for the Solaris Operating System provide
integrated hardware and* Solaris OS (or OpenSolaris OS)* support service
coverage to help keep your systems running smoothly. This single price,
complete system approach is ideal for companies running Solaris on Sun
hardware.



Sun System Service Plans != (Open)Solaris Support subscriptions


But thank you for the scare chicken little.





--Tim



--
Med venlig hilsen / Best Regards

Henrik Johansen
hen...@scannet.dk
Tlf. 75 53 35 00

ScanNet Group
A/S ScanNet
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Large scale ZFS deployments out there (200 disks)

2010-01-29 Thread Henrik Johansen

On 01/28/10 11:13 PM, Lutz Schumann wrote:

While thinking about ZFS as the next generation filesystem without
limits I am wondering if the real world is ready for this kind of
incredible technology ...

I'm actually speaking of hardware :)

ZFS can handle a lot of devices. Once in the import bug
(http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6761786)
is fixed it should be able to handle a lot of disks.


That was fixed in build 125.


I want to ask the ZFS community and users what large scale deploments
are out there.  How man disks ? How much capacity ? Single pool or
many pools on a server ? How does resilver work in those
environtments ? How to you backup ? What is the experience so far ?
Major headakes ?

It would be great if large scale users would share their setups and
experiences with ZFS.


The largest ZFS deployment that we have is currently comprised of 22 
Dell MD1000 enclosures (330 750 GB Nearline SAS disks). We have 3 head 
nodes and use one zpool per node, comprised of rather narrow (5+2) 
RAIDZ2 vdevs. This setup is exclusively used for storing backup data.


Resilver times could be better - I am sure that this will improve once 
we upgrade from S10u9 to 2010.03.


One of the things that I am missing in ZFS is the ability to prioritize 
background operations like scrub and resilver. All our disks are idle 
during daytime and I would love to be able to take advantage of this, 
especially during resilver operations.


This setup has been running for about a year with no major issues so 
far. The only hickups we've had were all HW related (no fun in firmware 
upgrading 200+ disks).



Will you ? :) Thanks, Robert



--
Med venlig hilsen / Best Regards

Henrik Johansen
hen...@scannet.dk
Tlf. 75 53 35 00

ScanNet Group
A/S ScanNet
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Large scale ZFS deployments out there (200 disks)

2010-01-29 Thread Henrik Johansen

On 01/29/10 07:36 PM, Richard Elling wrote:

On Jan 29, 2010, at 12:45 AM, Henrik Johansen wrote:

On 01/28/10 11:13 PM, Lutz Schumann wrote:

While thinking about ZFS as the next generation filesystem
without limits I am wondering if the real world is ready for this
kind of incredible technology ...

I'm actually speaking of hardware :)

ZFS can handle a lot of devices. Once in the import bug
(http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6761786)



is fixed it should be able to handle a lot of disks.


That was fixed in build 125.


I want to ask the ZFS community and users what large scale
deploments are out there.  How man disks ? How much capacity ?
Single pool or many pools on a server ? How does resilver work in
those environtments ? How to you backup ? What is the experience
so far ? Major headakes ?

It would be great if large scale users would share their setups
and experiences with ZFS.


The largest ZFS deployment that we have is currently comprised of
22 Dell MD1000 enclosures (330 750 GB Nearline SAS disks). We have
3 head nodes and use one zpool per node, comprised of rather narrow
(5+2) RAIDZ2 vdevs. This setup is exclusively used for storing
backup data.


This is an interesting design.  It looks like a good use of hardware
and redundancy for backup storage. Would you be able to share more of
the details? :-)


Each head node (Dell PE 2900's) has 3 PERC 6/E controllers (LSI 1078 
based) with 512 MB cache each.


The PERC 6/E supports both load-balancing and path failover so each 
controller has 2 SAS connections to a daisy chained group of 3 MD1000 
enclosures.


The RAIDZ2 vdev layout was chosen because it gives a reasonable 
performance vs space ratio and it maps nicely onto the 15 disk MD1000's 
( 2 x (5+2) +1 ).


There is room for improvement in the design (fewer disks per controller, 
faster PCI Express slots, etc) but performance is good enough for our 
current needs.




Resilver times could be better - I am sure that this will improve
once we upgrade from S10u9 to 2010.03.


Nit: Solaris 10 u9 is 10/03 or 10/04 or 10/05, depending on what you
read. Solaris 10 u8 is 11/09.


One of the things that I am missing in ZFS is the ability to
prioritize background operations like scrub and resilver. All our
disks are idle during daytime and I would love to be able to take
advantage of this, especially during resilver operations.


Scrub I/O is given the lowest priority and is throttled. However, I
am not sure that the throttle is in Solaris 10, because that source
is not publicly available. In general, you will not notice a resource
cap until the system utilization is high enough that the cap is
effective.  In other words, if the system is mostly idle, the scrub
consumes the bulk of the resources.


That's not what I am seeing - resilver operations crawl even when the 
pool is idle.



This setup has been running for about a year with no major issues
so far. The only hickups we've had were all HW related (no fun in
firmware upgrading 200+ disks).


ugh. -- richard




--
Med venlig hilsen / Best Regards

Henrik Johansen
hen...@scannet.dk
Tlf. 75 53 35 00

ScanNet Group
A/S ScanNet
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-08-27 Thread Henrik Johansen


Ross Walker wrote:

On Aug 27, 2009, at 4:30 AM, David Bond david.b...@tag.no wrote:


Hi,

I was directed here after posting in CIFS discuss (as i first  
thought that it could be a CIFS problem).


I posted the following in CIFS:

When using iometer from windows to the file share on opensolaris  
svn101 and svn111 I get pauses every 5 seconds of around 5 seconds  
(maybe a little less) where no data is transfered, when data is  
transfered it is at a fair speed and gets around 1000-2000 iops with  
1 thread (depending on the work type). The maximum read response  
time is 200ms and the maximum write response time is 9824ms, which  
is very bad, an almost 10 seconds delay in being able to send data  
to the server.
This has been experienced on 2 test servers, the same servers have  
also been tested with windows server 2008 and they havent shown this  
problem (the share performance was slightly lower than CIFS, but it  
was consistent, and the average access time and maximums were very  
close.



I just noticed that if the server hasnt hit its target arc size, the  
pauses are for maybe .5 seconds, but as soon as it hits its arc  
target, the iops drop to around 50% of what it was and then there  
are the longer pauses around 4-5 seconds. and then after every pause  
the performance slows even more. So it appears it is definately  
server side.


This is with 100% random io with a spread of 33% write 66% read, 2KB  
blocks. over a 50GB file, no compression, and a 5.5GB target arc size.




Also I have just ran some tests with different IO patterns and 100  
sequencial writes produce and consistent IO of 2100IOPS, except when  
it pauses for maybe .5 seconds every 10 - 15 seconds.


100% random writes produce around 200 IOPS with a 4-6 second pause  
around every 10 seconds.


100% sequencial reads produce around 3700IOPS with no pauses, just  
random peaks in response time (only 16ms) after about 1 minute of  
running, so nothing to complain about.


100% random reads produce around 200IOPS, with no pauses.

So it appears that writes cause a problem, what is causing these  
very long write delays?


A network capture shows that the server doesnt respond to the write  
from the client when these pauses occur.


Also, when using iometer, the initial file creation doesnt have and  
pauses in the creation, so it  might only happen when modifying files.


Any help on finding a solution to this would be really appriciated.


What version? And system configuration?

I think it might be the issue where ZFS/ARC write caches more then the  
underlying storage can handle writing in a reasonable time.


There is a parameter to control how much is write cached, I believe it  
is zfs_write_override.


You should be able to disable the write throttle mechanism altogether
with the undocumented zfs_no_write_throttle tunable.

I never got around to testing this though ...



-Ross
 
___

zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--
Med venlig hilsen / Best Regards

Henrik Johansen
hen...@scannet.dk
Tlf. 75 53 35 00

ScanNet Group
A/S ScanNet 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906

2009-08-05 Thread Henrik Johansen

Ross Walker wrote:

On Aug 4, 2009, at 8:36 PM, Carson Gaspar car...@taltos.org wrote:


Ross Walker wrote:

I get pretty good NFS write speeds with NVRAM (40MB/s 4k sequential  
write). It's a Dell PERC 6/e with 512MB onboard.

...
there, dedicated slog device with NVRAM speed. It would be even  
better to have a pair of SSDs behind the NVRAM, but it's hard to  
find compatible SSDs for these controllers, Dell currently doesn't  
even support SSDs in their RAID products :-(


Isn't the PERC 6/e just a re-branded LSI? LSI added SSD support  
recently.


Yes, but the LSI support of SSDs is on later controllers.


Sure that's not just a firmware issue ?

My PERC 6/E seems to support SSD's : 


# ./MegaCli -AdpAllInfo -a2 | grep -i ssd
Enable Copyback to SSD on SMART Error   : No
Enable SSD Patrol Read  : No
Allow SSD SAS/SATA Mix in VD : No
Allow HDD/SSD Mix in VD  : No


Controller info : 
   Versions


Product Name: PERC 6/E Adapter
Serial No   : 
FW Package Build: 6.0.3-0002

Mfg. Data

Mfg. Date   : 06/08/07
Rework Date : 06/08/07
Revision No : 
Battery FRU : N/A


Image Versions in Flash:

FW Version : 1.11.82-0473
BIOS Version   : NT13-2
WebBIOS Version: 1.1-32-e_11-Rel
Ctrl-R Version : 1.01-010B
Boot Block Version : 1.00.00.01-0008


I currently have 2 x Intel X25-E (32 GB) as dedicated slogs and 1 x
Intel X25-M (80 GB) for the L2ARC behind a PERC 6/i on my Dell R905
testbox.

So far there have been no problems with them.



-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--
Med venlig hilsen / Best Regards

Henrik Johansen
hen...@scannet.dk
Tlf. 75 53 35 00

ScanNet Group
A/S ScanNet 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906

2009-08-05 Thread Henrik Johansen

Ross Walker wrote:
On Aug 4, 2009, at 10:22 PM, Bob Friesenhahn bfrie...@simple.dallas.tx.us 
 wrote:



On Tue, 4 Aug 2009, Ross Walker wrote:
Are you sure that it is faster than an SSD?  The data is indeed  
pushed closer to the disks, but there may be considerably more  
latency associated with getting that data into the controller  
NVRAM cache than there is into a dedicated slog SSD.


I don't see how, as the SSD is behind a controller it still must  
make it to the controller.


If you take a look at 'iostat -x' output you will see that the  
system knows about a queue for each device.  If it was any other  
way, then a slow device would slow down access to all of the other  
devices.  If there is concern about lack of bandwidth (PCI-E?) to  
the controller, then you can use a separate controller for the SSDs.


It's not bandwidth. Though with a lot of mirrors that does become a  
concern.


Well the duplexing benefit you mention does hold true. That's a  
complex real-world scenario that would be hard to benchmark in  
production.


But easy to see the effects of.


I actually meant to say, hard to bench out of production.

Tests done by others show a considerable NFS write speed advantage  
when using a dedicated slog SSD rather than a controller's NVRAM  
cache.


I get pretty good NFS write speeds with NVRAM (40MB/s 4k sequential  
write). It's a Dell PERC 6/e with 512MB onboard.


I get 47.9 MB/s (60.7 MB/s peak) here too (also with 512MB NVRAM),  
but that is not very good when the network is good for 100 MB/s.   
With an SSD, some other folks here are getting essentially network  
speed.


In testing with ram disks I was only able to get a max of around 60MB/ 
s with 4k block sizes, with 4 outstanding.


I can do 64k blocks now and get around 115MB/s.


I just ran some filebench microbenchmarks against my 10 Gbit testbox
which is a Dell R905, 4 x 2.5 Ghz AMD Quad Core CPU's and 64 GB RAM.

My current pool is comprised of 7 mirror vdevs (SATA disks), 2 Intel
X25-E as slogs and 1 Intel X25-M for the L2ARC.

The pool is a MD1000 array attached to a PERC 6/E using 2 SAS cables.

The nic's are ixgbe based.

Here are the numbers : 

Randomwrite benchmark - via 10Gbit NFS : 
IO Summary: 4483228 ops, 73981.2 ops/s, (0/73981 r/w) 578.0mb/s, 44us cpu/op, 0.0ms latency


Randomread benchmark - via 10Gbit NFS :
IO Summary: 7663903 ops, 126467.4 ops/s, (126467/0 r/w) 988.0mb/s, 5us cpu/op, 
0.0ms latency

The real question is if these numbers can be trusted - I am currently
preparing new test runs with other software to be able to do a
comparison. 

There is still bus and controller plus SSD latency. I suppose one  
could use a pair of disks as an slog mirror, enable NVRAM just for  
those and let the others do write-through with their disk caches


But this encounters the problem that when the NVRAM becomes full  
then you hit the wall of synchronous disk write performance.  With  
the SSD slog, the write log can be quite large and disk writes are  
then done in a much more efficient ordered fashion similar to non- 
sync writes.


Yes, you have a point there.

So, what SSD disks do you use?

-Ross


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--
Med venlig hilsen / Best Regards

Henrik Johansen
hen...@scannet.dk
Tlf. 75 53 35 00

ScanNet Group
A/S ScanNet 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906

2009-08-05 Thread Henrik Johansen

Ross Walker wrote:

On Aug 4, 2009, at 10:17 PM, James Lever j...@jamver.id.au wrote:



On 05/08/2009, at 11:41 AM, Ross Walker wrote:


What is your recipe for these?


There wasn't one! ;)

The drive I'm using is a Dell badged Samsung MCCOE50G5MPQ-0VAD3.


So the key is the drive needs to have the Dell badging to work?

I called my rep about getting a Dell badged SSD and he told me they  
didn't support those in MD series enclosures so therefore were  
unavailable.


If the Dell branded SSD's are Samsung's then you might want to search
the archives - if I remember correctly there were mentionings of
less-than-desired performance using them but I cannot recall the
details.



Maybe it's time for a new account rep.

-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--
Med venlig hilsen / Best Regards

Henrik Johansen
hen...@scannet.dk
Tlf. 75 53 35 00

ScanNet Group
A/S ScanNet 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906

2009-08-05 Thread Henrik Johansen

Joseph L. Casale wrote:

Quick snipped from zpool iostat :

  mirror 1.12G   695G  0  0  0  0
c8t12d0  -  -  0  0  0  0
c8t13d0  -  -  0  0  0  0
  c7t2d04K  29.0G  0  1.56K  0   200M
  c7t3d04K  29.0G  0  1.58K  0   202M

The disks on c7 are both Intel X25-E 


Henrik,
So the SATA discs are in the MD1000 behind the PERC 6/E and how
have you configured/attached the 2 SSD slogs and L2ARC drive? If
I understand you, you have sued 14 of the 15 slots in the MD so
I assume you have the 3 SSD's in the R905, what controller are
they running on?


The internal PERC 6/i controller - but I've had them on the PERC 6/E
during other test runs since I have a couple of spare MD1000's at hand. 


Both controllers work well with the SSD's.


Thanks!
jlc
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--
Med venlig hilsen / Best Regards

Henrik Johansen
hen...@scannet.dk
Tlf. 75 53 35 00

ScanNet Group
A/S ScanNet 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best controller card for 8 SATA drives ?

2009-06-23 Thread Henrik Johansen

Erik Ableson wrote:
The problem I had was with the single raid 0 volumes (miswrote RAID 1  
on the original message)


This is not a straight to disk connection and you'll have problems if  
you ever need to move disks around or move them to another controller.


Would you mind explaining exactly what issues or problems you had ? I
have moved disks around several controllers without problems. You must
remember however to create the RAID 0 lun throught LSI's megaraid CLI
tool and / or to clear any foreign config before the controller will
expose the disk(s) to the OS.

The only real problem that I can think of is that you cannot use the
autoreplace functionality of recent ZFS versions with these controllers.

I agree that the MD1000 with ZFS is a rocking, inexpensive setup (we  
have several!) but I'd recommend using a SAS card with a true JBOD  
mode for maximum flexibility and portability. If I remember correctly,  
I think we're using the Adaptec 3085. I've pulled 465MB/s write and  
1GB/s read off the MD1000 filled with SATA drives.


Cordialement,

Erik Ableson

+33.6.80.83.58.28
Envoyé depuis mon iPhone

On 23 juin 2009, at 21:18, Henrik Johansen hen...@scannet.dk wrote:


Kyle McDonald wrote:

Erik Ableson wrote:


Just a side note on the PERC labelled cards: they don't have a  
JBOD mode so you _have_ to use hardware RAID. This may or may not  
be an issue in your configuration but it does mean that moving  
disks between controllers is no longer possible. The only way to  
do a pseudo JBOD is to create broken RAID 1 volumes which is not  
ideal.



It won't even let you make single drive RAID 0 LUNs? That's a shame.


We currently have 90+ disks that are created as single drive RAID 0  
LUNs

on several PERC 6/E (LSI 1078E chipset) controllers and used by ZFS.

I can assure you that they work without any problems and perform very
well indeed.

In fact, the combination of PERC 6/E and MD1000 disk arrays has worked
so well for us that we are going to double the number of disks during
this fall.

The lack of portability is disappointing. The trade-off though is  
battery backed cache if the card supports it.


-Kyle



Cordialement,

Erik Ableson

+33.6.80.83.58.28
Envoyé depuis mon iPhone

On 23 juin 2009, at 04:33, Eric D. Mudama edmud...@bounceswoosh.org 
 wrote:


 On Mon, Jun 22 at 15:46, Miles Nordin wrote:
 edm == Eric D Mudama edmud...@bounceswoosh.org writes:

  edm We bought a Dell T610 as a fileserver, and it comes with an
  edm LSI 1068E based board (PERC6/i SAS).

 which driver attaches to it?

 pciids.sourceforge.net says this is a 1078 board, not a 1068  
board.


 please, be careful.  There's too much confusion about these  
cards.


 Sorry, that may have been confusing.  We have the cheapest storage
 option on the T610, with no onboard cache.  I guess it's called  
the

 Dell SAS6i/R while they reserve the PERC name for the ones with
 cache.  I had understood that they were basically identical  
except for

 the cache, but maybe not.

 Anyway, this adapter has worked great for us so far.


 snippet of prtconf -D:


 i86pc (driver name: rootnex)
pci, instance #0 (driver name: npe)
pci8086,3411, instance #6 (driver name: pcie_pci)
pci1028,1f10, instance #0 (driver name: mpt)
sd, instance #1 (driver name: sd)
sd, instance #6 (driver name: sd)
sd, instance #7 (driver name: sd)
sd, instance #2 (driver name: sd)
sd, instance #4 (driver name: sd)
sd, instance #5 (driver name: sd)


 For this board the mpt driver is being used, and here's the  
prtconf

 -pv info:


  Node 0x1f
assigned-addresses:
81020010..fc00..0100.83020014..

 df2ec000..4000.8302001c.
 .df2f..0001
reg:
0002.....01020010....0100.03020014....4000.0302001c.

 ...0001
compatible: 'pciex1000,58.1028.1f10.8' +  
'pciex1000,58.1028.1f10'  + 'pciex1000,58.8' + 'pciex1000,58' +  
'pciexclass,01' +  'pciexclass,0100' +  
'pci1000,58.1028.1f10.8' +  'pci1000,58.1028.1f10' +  
'pci1028,1f10' + 'pci1000,58.8' +  'pci1000,58' + 'pciclass, 
01' + 'pciclass,0100'

model:  'SCSI bus controller'
power-consumption:  0001.0001
devsel-speed:  
interrupts:  0001
subsystem-vendor-id:  1028
subsystem-id:  1f10
unit-address:  '0'
class-code:  0001
revision-id:  0008
vendor-id:  1000
device-id:  0058
pcie-capid-pointer:  0068
pcie-capid-reg:  0001
name:  'pci1028,1f10'


 --eric


 --
 Eric D. Mudama
 edmud...@mail.bounceswoosh.org

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Large zpool design considerations

2008-07-04 Thread Henrik Johansen
Chris Cosby wrote:
I'm going down a bit of a different path with my reply here. I know that all
shops and their need for data are different, but hear me out.

1) You're backing up 40TB+ of data, increasing at 20-25% per year. That's
insane. Perhaps it's time to look at your backup strategy no from a hardware
perspective, but from a data retention perspective. Do you really need that
much data backed up? There has to be some way to get the volume down. If
not, you're at 100TB in just slightly over 4 years (assuming the 25% growth
factor). If your data is critical, my recommendation is to go find another
job and let someone else have that headache.

Well, we are talking about backup for ~900 servers that are in
production. Our retention period is 14 days for stuff like web servers,
and 3 weeks for SQL and such. 

We could deploy deduplication but it makes me a wee bit uncomfortable to
blindly trust our backup software.

2) 40TB of backups is, at the best possible price, 50-1TB drives (for spares
and such) - $12,500 for raw drive hardware. Enclosures add some money, as do
cables and such. For mirroring, 90-1TB drives is $22,500 for the raw drives.
In my world, I know yours is different, but the difference in a $100,000
solution and a $75,000 solution is pretty negligible. The short description
here: you can afford to do mirrors. Really, you can. Any of the parity
solutions out there, I don't care what your strategy, is going to cause you
more trouble than you're ready to deal with.

Good point. I'll take that into consideration.

I know these aren't solutions for you, it's just the stuff that was in my
head. The best possible solution, if you really need this kind of volume, is
to create something that never has to resilver. Use some nifty combination
of hardware and ZFS, like a couple of somethings that has 20TB per container
exported as a single volume, mirror those with ZFS for its end-to-end
checksumming and ease of management.

That's my considerably more than $0.02

On Thu, Jul 3, 2008 at 11:56 AM, Bob Friesenhahn 
[EMAIL PROTECTED] wrote:

 On Thu, 3 Jul 2008, Don Enrique wrote:
 
  This means that i potentially could loose 40TB+ of data if three
  disks within the same RAIDZ-2 vdev should die before the resilvering
  of at least one disk is complete. Since most disks will be filled i
  do expect rather long resilvering times.

 Yes, this risk always exists.  The probability of three disks
 independently dying during the resilver is exceedingly low. The chance
 that your facility will be hit by an airplane during resilver is
 likely higher.  However, it is true that RAIDZ-2 does not offer the
 same ease of control over physical redundancy that mirroring does.
 If you were to use 10 independent chassis and split the RAIDZ-2
 uniformly across the chassis then the probability of a similar
 calamity impacting the same drives is driven by rack or facility-wide
 factors (e.g. building burning down) rather than shelf factors.
 However, if you had 10 RAID arrays mounted in the same rack and the
 rack falls over on its side during resilver then hope is still lost.

 I am not seeing any options for you here.  ZFS RAIDZ-2 is about as
 good as it gets and if you want everything in one huge pool, there
 will be more risk.  Perhaps there is a virtual filesystem layer which
 can be used on top of ZFS which emulates a larger filesystem but
 refuses to split files across pools.

 In the future it would be useful for ZFS to provide the option to not
 load-share across huge VDEVs and use VDEV-level space allocators.

 Bob
 ==
 Bob Friesenhahn
 [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




-- 
chris -at- microcozm -dot- net
=== Si Hoc Legere Scis Nimium Eruditionis Habes

-- 
Med venlig hilsen / Best Regards

Henrik Johansen
[EMAIL PROTECTED]


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Large zpool design considerations

2008-07-03 Thread Henrik Johansen
[Richard Elling] wrote:
 Don Enrique wrote:
 Hi,

 I am looking for some best practice advice on a project that i am working on.

 We are looking at migrating ~40TB backup data to ZFS, with an annual data 
 growth of
 20-25%.

 Now, my initial plan was to create one large pool comprised of X RAIDZ-2 
 vdevs ( 7 + 2 )
 with one hotspare per 10 drives and just continue to expand that pool as 
 needed.

 Between calculating the MTTDL and performance models i was hit by a rather 
 scary thought.

 A pool comprised of X vdevs is no more resilient to data loss than the 
 weakest vdev since loss
 of a vdev would render the entire pool unusable.
   

 Yes, but a raidz2 vdev using enterprise class disks is very reliable.

That's nice to hear.

 This means that i potentially could loose 40TB+ of data if three disks 
 within the same RAIDZ-2
 vdev should die before the resilvering of at least one disk is complete. 
 Since most disks
 will be filled i do expect rather long resilvering times.

 We are using 750 GB Seagate (Enterprise Grade) SATA disks for this project 
 with as much hardware
 redundancy as we can get ( multiple controllers, dual cabeling, I/O 
 multipathing, redundant PSUs,
 etc.)
   

 nit: SATA disks are single port, so you would need a SAS implementation
 to get multipathing to the disks.  This will not significantly impact the
 overall availability of the data, however.  I did an availability  
 analysis of
 thumper to show this.
 http://blogs.sun.com/relling/entry/zfs_raid_recommendations_space_vs

Yeah, I read your blog. Very informative indeed. 

I am using SAS HBA cards and SAS enclosures with SATA disks so I should
be fine.

 I could use multiple pools but that would make data management harder which 
 in it self is a lengthy
 process in our shop.

 The MTTDL figures seem OK so how much should i need to worry ? Anyone having 
 experience from
 this kind of setup ?
   

 I think your design is reasonable.  We'd need to know the exact
 hardware details to be able to make more specific recommendations.
 -- richard

Well, my choice of hardware is kind of limited by 2 things :

1. We are a 100% Dell shop.
2. We already have lots of enclosures that i would like to reuse for my project.

The HBA cards are SAS 5/E (LSI SAS1068 chipset) cards, the enclosures are
Dell MD1000 diskarrays.



-- 
Med venlig hilsen / Best Regards

Henrik Johansen
[EMAIL PROTECTED]


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss