Re: [zfs-discuss] Entire client hangs every few seconds

2011-07-26 Thread Ian D
Hi Garrett-
It is something that could happen at any time on a system that has been working 
fine for a while?  That system has 256G of RAM, I think "adequate" is not a 
concern here :)

We'll try 3.1 as soon as we can download it.
Ian
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Entire client hangs every few seconds

2011-07-26 Thread Ian D
No dedup.  

The hiccups started around 2am on Sunday while (obviously) nobody was 
interacting with neither the clients or the server.  It's been running for 
months (as is) without any problem.

My guess is that it's a defective hard drive that instead of totally failing, 
just stutters.  Or maybe it's the cache.  We disabled the SLOG with no effect, 
but we haven't tried with the L2ARC.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Entire client hangs every few seconds

2011-07-26 Thread Ian D
To add to that... iostat on the client boxes show the connection to always be 
around 98% util and tops at 100% whenever it hangs.   The same clients are 
connected to another ZFS server with much lower specs and a smaller number of 
slower disks, it performs much better and rarely get past 5% util.  They share 
the same network.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Entire client hangs every few seconds

2011-07-26 Thread Ian D
Hi all-
We've been experiencing a very strange problem for two days now.  

We have three client (Linux boxes) connected to a ZFS box (Nexenta) via iSCSI.  
Every few seconds (seems random), iostats shows the clients go from an normal 
80K+ IOPS to zero.  It lasts up to a few seconds and things are fine again.  
When that happens, I/Os on the local disks stops too, even the totally 
unrelated ones. How can that be?  All three clients show the same pattern and 
everything was fine prior to Sunday.  Nothing has changed on neither the 
clients or the server. The ZFS box is not even close to be saturated, nor the 
network.

We don't even know where to start... any advices?
Ian
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mixing different disk sizes in a pool?

2010-12-18 Thread Ian D
> The answer really depends on what you want to do with
> pool(s).  You'll 
> have to provide more information.

Get the maximum of very random IOPS I get can out of those drives for database 
usage.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mixing different disk sizes in a pool?

2010-12-18 Thread Ian D
Another question:  all those disks are on Dell MD1000 JBODs (11 of them) and we 
have 12 SAS ports on three LSI 9200-16e HBAs.  Is there any point connecting 
each JBOD on a separate port or is it ok cascading them in groups of three?  Is 
there a bandwidth limit we'll be hitting doing that?

Thanks
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-12-17 Thread Ian D
Here's a long due update for you all...

After updating countless drivers, BIOSes and Nexenta, it seems that our issue 
has disappeared.  We're slowly moving our production to our three appliances 
and things are going well so far.  Sadly we don't know exactly what update 
fixed our issue. I wish I would know, but we tried so many different things 
that we lost count.

We've also updated our DDRDrive X1s and they're giving us stellar performances.

Thanks to the people and Nexenta and DDRDrive!  It's been more challenging than 
we expected, but we're now optimistic about the future of all this.

Ian
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Mixing different disk sizes in a pool?

2010-12-17 Thread Ian D
I have 159x 15K RPM SAS drives I want to build a ZFS appliance with.  

75x 145G
60x 300G
24x 600G

The box has 4 CPUs, 256G of RAM, 14x 100G SLC SSDs for the cache and a mirrored 
pair of 4G DDRDrive X1s for the SLOG.

My plan is to mirror all these drives and keep some hot spares.

My question is:  should I create three pools (one for each size of drives) and 
share the cache and slog among them or should I create a single pool with them 
all?

I'm thinking of creating a single pool to get maximum IOPS early on, even tho I 
understand that once I'll have filled the 145G drives I'll end up having 
smaller stripes and less performances.

Would you do the same?  Is there any reason not to do that?

Thanks
Ian
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-11-02 Thread Ian D
> Then set the zfs_write_limit_override to a reasonable
> value.

Our first experiments are showing progress.  We'll play with it some more and 
let you know. Thanks!

Ian
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-11-01 Thread Ian D
> Maybe you are experiencing this:
> http://opensolaris.org/jive/thread.jspa?threadID=11942

It does look like this... Is this really the expected behaviour?  That's just 
unacceptable.  It is so bad it sometimes drop connection and fail copies and 
SQL queries...

Ian
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-11-01 Thread Ian D
> You "doubt" AMD or Intel cpu's suffer from bad cache
> mgmt?

In order to clear that out, we've tried using an older server (about 4 years 
old) as the head and we see the same pattern.  It's actually more obvious that 
it consumes a whole lot of CPU cycles.  Using the same box as a Linux-based NFS 
server and running the same tests barely has an impact on the CPUs.

Ian
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-11-01 Thread Ian D
> If you do a dd to the storage from the heads do
> you still get the same issues? 

no, local read/writes are great, they never choke.  It's whenever NFS or iSCSI 
are involved and that the read/writes are done from a remote box that we 
experience the problem. Local operations barely affects the CPU load.

Ian
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-31 Thread Ian D
I get that multi-cores doesn't necessarily better performances, but I doubt 
that both the latest AMD CPUs (the Magny-Cours) and the latest Intel CPUs (the 
Beckton) suffer from incredibly bad cache management.   Our two test system 
have 2 and 4 of each respectively.  The thing is that the performances are not 
consistently slow, they are great for a while and *stops* for a few seconds 
before they get great again.  It's like they are choking or something...

Thanks
Ian
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-30 Thread Ian D
I owe you all an update...

We found out a clear pattern we can now recreate at will.  Whenever we 
read/write the pool, it gives expected throughput and IOPS for a while, but at 
some point it slows down to a crawl, nothing is responding and pretty much 
"hang" for a few seconds and then things go back to normal for a little more 
while.  Sometimes the problem is barely noticeable and only happen once every 
few minutes, at other times it is every few seconds.  We could be doing the 
exact same operation and sometimes it is fast and sometimes it is slow. The 
more clients are connected the worse the problem typically gets- and no, it's 
not happening every 30 seconds when things are committed to disk. 

Now... every time that slow down occurs, the load on the Nexenta box gets crazy 
high- it can reach 35 and more and the console dont even respond anymore.  The 
rest of the time the load barely reaches 3.  The box has four 7500 series Intel 
Xeon CPUs and 256G of RAM and use 15K SAS HDDs in mirrored stripes on LSI 
9200-8e HBAs- so we're certainly not underpowered.  We also have the same issue 
when using a box with two of the latest AMD Opteron CPUs (the Magny-Cours) and 
128G of RAM.

We are able to reach 800MB/sec and more over the network when things go well, 
but the average get destroyed by the slow downs when there is zero throughput.

These tests are run without any L2ARC or SLOG, but past tests have shown the 
same issue when using them.  We've tried with 12x 100G Samsung SLC SSDs and 
DDRDrive X1s among other thing- and while they make the whole thing much 
faster, they don't prevent those intermittent slow downs from happening.

Our next step is to isolate the process that take all that CPU...

Ian
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-23 Thread Ian D
> I don't think the switch model was ever identified...perhaps it is a 1 GbE 
> switch with a few 10 GbE ports?  (Drawing at straws.)


It is a a Dell 8024F.  It has 24 SPF+ 10GbE ports and every NICs we connect to 
it are Intel X520.  One issue we do have with it is when we turn jumbo frames 
on, the linux boxes crashes.  So all our tests are done without jumbo frames, 
but that alone cannot make that much of a difference.  We know the hardware of 
the Nexenta box can do several times better- it does when we run Linux on it.

> What happens when Windows is the iSCSI initiator
> connecting to an iSCSI
> target on ZFS?  If that is also slow, the issue
> is likely not in Windows or in Linux.
> 
> Do CIFS shares (connected to from Linux and from
> Windows) show the same
> performance problems as iSCSI and NFS?  If yes,
> this would suggest a
> common cause item on the ZFS side.


We need to try that.  We did try two versions of Linux (RedHat and SuSE) and 
ended up with the same problem, but we haven't tried with Windows/Mac yet.  

Thanks all!
Ian
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-23 Thread Ian D
> Likely you don't have enough ram or CPU in the box.

The Nexenta box has 256G of RAM and the latest X7500 series CPUs.  That said, 
the load does get crazy high (like 35+) very quickly.  We can't figure out 
what's taking so much CPU.  It happens even when checksum/compression/unduping 
are off.

Ian
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-23 Thread Ian D
> A network switch that is being maxed out?  Some
> switches cannot switch 
> at rated line speed on all their ports all at the
> same time.  Their 
> internal buses simply don't have the bandwidth needed
> for that.  Maybe 
> you are running into that limit?  (I know you
> mentioned bypassing the 
> switch completely in some other tests and not
> noticing any difference.)

We're using a 10GbE switch and NICs and they have their own separate network- 
we're not even close to the limit.  
Thanks
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-22 Thread Ian D
Some numbers...

zpool status

  pool: Pool_sas
 state: ONLINE
 scan: none requested
config:

NAME STATE READ WRITE CKSUM
Pool_sas ONLINE   0 0 0
  c4t5000C506A6D3d0  ONLINE   0 0 0
  c4t5000C506A777d0  ONLINE   0 0 0
  c4t5000C506AA43d0  ONLINE   0 0 0
  c4t5000C506AC4Fd0  ONLINE   0 0 0
  c4t5000C506AEF7d0  ONLINE   0 0 0
  c4t5000C506B27Fd0  ONLINE   0 0 0
  c4t5000C506B28Bd0  ONLINE   0 0 0
  c4t5000C506B46Bd0  ONLINE   0 0 0
  c4t5000C506B563d0  ONLINE   0 0 0
  c4t5000C506B643d0  ONLINE   0 0 0
  c4t5000C506B6D3d0  ONLINE   0 0 0
  c4t5000C506BBE7d0  ONLINE   0 0 0
  c4t5000C506C407d0  ONLINE   0 0 0
  c4t5000C506C657d0  ONLINE   0 0 0

errors: No known data errors

  pool: Pool_test
 state: ONLINE
 scan: none requested
config:

NAME STATE READ WRITE CKSUM
Pool_testONLINE   0 0 0
  c4t5000C5002103F093d0  ONLINE   0 0 0
  c4t5000C50021101683d0  ONLINE   0 0 0
  c4t5000C50021102AA7d0  ONLINE   0 0 0
  c4t5000C500211034D3d0  ONLINE   0 0 0
  c4t5000C500211035DFd0  ONLINE   0 0 0
  c4t5000C5002110480Fd0  ONLINE   0 0 0
  c4t5000C50021104F0Fd0  ONLINE   0 0 0
  c4t5000C50021119A43d0  ONLINE   0 0 0
  c4t5000C5002112392Fd0  ONLINE   0 0 0

errors: No known data errors

  pool: syspool
 state: ONLINE
 scan: none requested
config:

NAMESTATE READ WRITE CKSUM
syspool ONLINE   0 0 0
  c0t0d0s0  ONLINE   0 0 0

errors: No known data errors
=

Pool_sas is made of 14x 146G 15K SAS Drives in a big stripe.  For this test 
there is no log device or cache.  Connected to it is a RedHat box using iSCSI 
through an Intel X520 10GbE NIC. It runs several large MySQL queries at once- 
each taking minutes to compute.

Pool_test is a stripe of 2TB SATA drives and a terrabyte of files is being 
copied to it for another box during this test.

Here's the pastebin of "iostat -xdn 10" on the Linux box:
http://pastebin.com/431ESYaz

Here's the pastebin of "iostat -xdn 10" on the Nexenta box:
http://pastebin.com/9g7KD3Ku

Here's the pastebin "zpool iostat -v 10" on the Nexenta box:
http://pastebin.com/05fJL5sw

>From these numbers it looks like the Linux box is waiting for data all the 
>time while the Nexenta box isn't pulling nearly as much throughput and IOPS as 
>it could.  Where is the bottleneck?

One thing suspicious is that we notice a slow down of one pool when the other 
is under load.  How can that be?

Ian
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-15 Thread Ian D
> Has anyone suggested either removing L2ARC/SLOG
> entirely or relocating them so that all devices are
> coming off the same controller? You've swapped the
> external controller but the H700 with the internal
> drives could be the real culprit. Could there be
> issues with cross-controller IO in this case? Does
> the H700 use the same chipset/driver as the other
> controllers you've tried? 

We'll try that.  We have a couple other devices we could use for the SLOG like 
a DDRDrive X1 and an OCZ Z-Drive which are both PCIe cards and don't use the 
local controller.

Thanks
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-15 Thread Ian D
A little setback  We found out that we also have the issue with the Dell 
H800 controllers, not just the LSI 9200-16e.  With the Dell it's initially 
faster as we benefit from the cache, but after a little while it goes sour- 
from 350MB/sec down to less than 40MB/sec.  We've also tried with a LSI 9200-8e 
with the same results.

So to recap...  No matter what HBA we use, copying through the network to/from 
the external drives is painfully slow when access is done through either NFS or 
iSCSI.  HOWEVER, it is plenty fast when we do a scp where the data is written 
to the external drives (or internal ones for that matter) when they are seen by 
the Nexenta box as local drives- ie when neither NFS or iSCSI are involved.  

What now?  :)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-15 Thread Ian D
After contacting LSI they say that the 9200-16e HBA is not supported in 
OpenSolaris, just Solaris.  Aren't Solaris drivers the same as OpenSolaris?

Is there anyone here using 9200-16e HBAs?  What about the 9200-8e?  We have a 
couple lying around and we'll test one shortly.

Ian
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-15 Thread Ian D
> Does the Linux box have the same issue to any other
> server ?
> What if the client box isn't Linux but Solaris or
> Windows or MacOS X ?

That would be a good test.  We'll try that.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-15 Thread Ian D
> As I have mentioned already, it would be useful to
>  know more about the 
> onfig, how the tests are being done, and to see some
> basic system 
> performance stats.

I will shortly.  Thanks!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-15 Thread Ian D
> You mentioned a second Nexenta box earlier.  To rule
> out client-side issues, have you considered testing
> with Nexenta as the iSCSI/NFS client?

If you mean running the NFS client AND server on the same box then yes, and it 
doesn't show the same performance issues.  It's only when a Linux box 
SEND/RECEIVE data to the NFS/iSCSI shares that we have problems.  But if the 
Linux box send/receive file through scp on the external disks mounted by the 
Nexenta box as a local filesystem then there is no problem.

Ian
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-15 Thread Ian D
> He already said he has SSD's for dedicated log.  This
> means the best
> solution is to disable WriteBack and just use
> WriteThrough.  Not only is it
> more reliable than WriteBack, it's faster.
> 
> And I know I've said this many times before, but I
> don't mind repeating:  If
> you have slog devices, then surprisingly, it actually
> hurts performance to
> enable the WriteBack on the HBA.

The HBA who gives us problem is a LSI 9200-16e which has no cache whatsoever.  
We do get great performances with a Dell H800 that has cache.  We'll use H800s 
if we have to, but i really would like to find a way to make the LSI's work.

Thanks!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-15 Thread Ian D
As I have mentioned already, we have the same performance issues whether we 
READ or we WRITE to the array, shouldn't that rule out caching issues?

Also we can get great performances with the LSI HBA if we use the JBODs as a 
local file system.  The issues only arise when it is done through iSCSI and NFS.

I'm opening tickets with LSI to see if they can help.

Thanks all!
Ian
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-14 Thread Ian D
> Earlier you said you had eliminated the ZIL as an
> issue, but one difference
> between the Dell H800 and the LSI HBA is that the
> H800 has an NV cache (if
> you have the battery backup present).
> 
> A very simple test would be when things are running
> slow, try disabling
> the ZIL temporarily, to see if that makes things go
> fast.

We'll try that, but keep in mind that we're having the issue even when we READ 
from the JBODs, not just during WRITES.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-14 Thread Ian D
> Our next test is to try with a different kind of HBA,
> we have a Dell H800 lying around.

ok... we're making progress.  After swapping the LSI HBA for a Dell H800 the 
issue disappeared.  Now, I'd rather not use those controllers because they 
don't have a JBOD mode. We have no choice but to make individual RAID0 volumes 
for each disks which means we need to reboot the server every time we replace a 
failed drive.  That's not good...

What can we do with the LSI HBA?  Would you call LSI's support?  Is there 
anything we should try besides the obvious (using the latests firmware/driver)?

To resume the issue, when we copy files from/to the JBODs connected to that HBA 
using NFS/iSCSI, we get slow transfer rate <20M/s and a 1-2 second pause 
between each file.   When we do the same experiment locally using the external 
drives as a local volume (no NFS/iSCSI involved) then it goes upward of 
350M/sec with no delay between files. 

Ian

Message was edited by: reward72
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-14 Thread Ian D
I've had a few people sending emails directly suggesting it might have 
something to do with the ZIL/SLOG.   I guess I should have said that the issue 
happen both ways, whether we copy TO or FROM the Nexenta box.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-14 Thread Ian D
> Sounding more and more like a networking issue - are
> the network cards set up in an aggregate? I had some
> similar issues on GbE where there was a mismatch
> between the aggregate settings on the switches and
> the LACP settings on the server. Basically the
> network was wasting a ton of time trying to
> renegotiate the LACP settings and slowing everything
> down.
> 
> Ditto for the Linux networking - single port or
> aggregated dual port?

We're only using one port on both boxes (we never have been able to saturate 
them yet), but maybe they are somehow set wrong.  We'll investigate.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-13 Thread Ian D
More stuff...

We ran the same tests on another Nexenta box with fairly similar hardware and 
had the exact same issues.  The two boxes have the same models of HBAs, NICs 
and JBODs but different CPUs and motherboards.

Our next test is to try with a different kind of HBA, we have a Dell H800 lying 
around.

Ian
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-13 Thread Ian D
> Would it be possible to install OpenSolaris to an USB
> disk and boot from it and try? That would take 1-2h
> and could maybe help you narrow things down further?

I'm a little afraid to lose my data, i wouldnt be the end of the world, but I'd 
rather avoid that.  I'll do it in last resort.

Ian
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-13 Thread Ian D
> The only thing that still stands out is that network
> operations (iSCSI and NFS) to external drives are
> slow, correct?

Yes, that pretty much resume it. 

> Just for completeness, what happens if you scp a file
> to the three different pools?  If the results are the
> same as NFS and iSCSI, then I think the network can
> be ruled out.

This is where it gets interesting...

When we use the external disks as a local file system of the Nexenta box then 
it is fast.  We can scp through the network some files from the linux box to 
the external drives without problem as long as we address the local file system 
(wherever the disks are).  BUT, whenever iSCSI or NFS are involved it all goes 
sour.  

> I would be leaning toward thinking there is some
> mismatch between the network protocols and the
> external controllers/cables/arrays.

Sounds like it.  The arrays are plenty fast on the same 
machine/controller/cables/arrays  when we're not using a network protocol.

> Are the controllers the same hardware/firmware/driver
> for the internal vs. external drives?

No.  The internal drives (the syspool and 13 SSDs for the cache) are on a Dell 
H700 RAID controller.  The external drives are on 3x LSI 9200-8e SAS HBAs 
connected to 7x Dell MD1000 and MD1200 JBODs. 

Thanks a lot!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-13 Thread Ian D
Here are some more findings...

The Nexenta box has 3 pools:
syspool: made of 2 mirrored (hardware RAID) local SAS disks
pool_sas: made of 22 15K SAS disks in ZFS mirrors on 2 JBODs on 2 controllers
pool_sata: made of 42 SATA disks in 6 RAIDZ2 vdevs on a single controller

When we copy data from any linux box to either the pool_sas or pool_sata, it is 
painfully slow.

When we copy data from any linux box directly to the syspool, it is plenty fast

When we copy data locally on the Nexenta box from the syspool to either the 
pool_sas or pool_sata, it is crazy fast.

We also see the same pattern whether we use iSCSI or NFS. We've also tested 
using different NICs (some at 1GbE, some at 10GbE) and even tried bypassing the 
switch by directly connecting the two boxes with a cable- and it didn't made 
any difference.  We've also tried not using the SSD for the ZIL.

So...  
We've ruled out iSCSI, the networking, the ZIL device, even the HBAs as it is 
fast when it is done locally.

Where should we look next?

Thank you all for your help!
Ian
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-10 Thread Ian D
> From the Linux side, it appears the drive in question
> is either sdb or dm-3, and both appear to be the same
> drive.  Since switching to zfs, my Linux-disk-fu has
> become a bit rusty.  Is one an alias for the other?

Yes, dm-3 is the alias created by LVM while sdb is the "physical" (or raw) 
device

> The Linux disk appears to top out at around 50MB/s
> or so.  That looks suspiciously like it is running
>  on a gigabit connection with some problems.

That's what I believe too.  It's a 10GbE connection...

> From what I can see, you can narrow the search down
> to a few things:
> 1. Linux network stack
> 2. Linux iSCSI issues
> 3. Network cabling/switch between the devices
> 4. Nexenta CPU constraints (unlikely, I know, but
> let's be thorough)
> 5. Nexenta network stack
> 6. COMSTAR problems

We'll run your tests and share the results.  It is unlikely to be the CPUs on 
any side, they are the latests of Intel (Nexenta box) and AMD (Linux box) and 
are barely used.

Thanks for helping!
Ian
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-09 Thread Ian D
\> We're aware of that.  The original plan was to use
> mirrored DDRDrive X1s but we're experiencing
> stability issues.  Chris George is being very
> responsible and we'll help us out investigate that
> once we figure out our most pressing performance
> problems.

I feel I need to add to my comment that our experience with the X1s has been 
stellar.  The thing is amazingly fast and I can't wait to put it back to 
production.  We've been having issues making our system stable simply because 
it mixes many many devices that fight each other for resources- it is in no way 
the X1s fault and I sure don't want to give anybody that impression. 

Chris George and Richard Elling have been very responsive helping us with this 
and I'll be happy to share our success with the community once we figure it out.

Ian
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-09 Thread Ian D
> If you have a single SSD for dedicated log, that will
> surely be a bottleneck
> for you.  

We're aware of that.  The original plan was to use mirrored DDRDrive X1s but 
we're experiencing stability issues.  Chris George is being very responsible 
and we'll help us out investigate that once we figure out our most pressing 
performance problems.

> Also, with so much cache & ram, it wouldn't surprise
> me a lot to see really
> low disk usage just because it's already cached.  But
> that doesn't explain
> the ridiculously slow performance...

We do see the load of those drives get much higher when we have several servers 
(all linux boxes) connected to it at once.  The problem really seem to be at 
the individual linux boxes level.  Something to do with iSCSI/networking even 
tho the 10GbE network can certainly handle a lot more than that.

> I'll suggest trying something completely different,
> like, dd if=/dev/zero
> bs=1024k | pv | ssh othermachine 'cat > /dev/null'
> ...  Just to verify there
> isn't something horribly wrong with your hardware
> (network).
> 
> In linux, run "ifconfig" ... You should see
> "errors:0"

We'll do that.  Thanks!

Ian
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-09 Thread Ian D


>A couple of notes:  we know the "Pool_sata" is resilvering, but we're 
>concerned about the performances of the other pool ("Pool_sas").  We also know 
>that we're >not using jumbo frames as for some reason it makes the linux box 
>crash.  Could that explain it all?
>What sort of drives are these? It looks like iSCSI or FC device names, and not 
>local drives
The "Pool_sas" is made of 15K SAS drives on external JBOD arrays (Dell MD1000) 
connected on mirrored LSI 9200-8e SAS HBAs. 
The "Pool_sata" is made of SATA drives on other JBODs. 
The shorter device names are from the onboard Dell H700 RAID adapter (the SSDs 
and system pool are local) while the longer ones are from the LSI cards.
Does that make sense?


  ___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-09 Thread Ian D

> I'll suggest trying something completely different, like, dd if=/dev/zero
> bs=1024k | pv | ssh othermachine 'cat > /dev/null' ...  Just to verify there
> isn't something horribly wrong with your hardware (network).
> 
> In linux, run "ifconfig" ... You should see "errors:0"
> 
> Make sure each machine has an entry for the other in the hosts file.  I
> haven't seen that cause a problem for iscsi, but certainly for ssh.
> 


we'll try that, thanks!

Here's some numbers... 

This is a pastebin from iostat running on the linux box:
http://pastebin.com/8mN8mchH

This from the Nexenta box:
http://pastebin.com/J1E4V1b3


A couple of notes:  we know the "Pool_sata" is resilvering, but we're concerned 
about the performances of the other pool ("Pool_sas").  We also know that we're 
not using jumbo frames as for some reason it makes the linux box crash.  Could 
that explain it all?

Thanks for helping out!
Ian


  ___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Performance issues with iSCSI under Linux

2010-10-08 Thread Ian D

Hi!We're trying to pinpoint our performance issues and we could use all the 
help to community can provide.  We're running the latest version of Nexenta on 
a pretty powerful machine (4x Xeon 7550, 256GB RAM, 12x 100GB Samsung SSDs for 
the cache, 50GB Samsung SSD for the ZIL, 10GbE on a dedicated switch, 11x pairs 
of 15K HDDs for the pool).  We're connecting a single Linux box to it using 
iSCSI and using "top" we see it spent all its time in iowait.  Using "zpool 
iostat" on the Nexenta box I can see that the disks are barely working, 
typically reading <500K/sec and doing about 50 IOPS each- they obviously can do 
much better.   We see the same pattern wether we're doing SQL queries (MySQL) 
or simply doing file copies.  What's particularly curious is that with file 
copies, there is a long pause (can be a few seconds) between each file that is 
copied.  Even reading a folder (ls) is slow. 
Where should we look at?  What more information should I provide?
Thanks a lot!Ian  ___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Expected throughput

2010-07-05 Thread Ian D

>Just a short question - wouldn't it be easier, and perhaps faster, to just 
>have the MySQL DB on an NFS share? iSCSI adds 
>complexity, both on the target and the initiator.


Yes, we did tried both and we didn't notice any difference in term of 
performances.  I've read conflicting opinions on which is best and the majority 
seems to say that iSCSI is better for databases, but I don't have any strong 
preference myself...  

>Also, are you using jumbo frames? That can usually help a bit with either 
>access protocol


Yes.  It was off early on and we did notice a significant difference once we 
switched it on.  Turning "naggle" off as suggested by Richard also seem to have 
a make a little difference.  Thanks 

  ___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Expected throughput

2010-07-04 Thread Ian D

> Is that 38% of one CPU or 38% of all CPU's?  How many CPU's does the
> Linux box have?  I don't mean the number of sockets, I mean number of
> sockets * number of cores * number of threads per core.  My

The server has two Intel X5570s, they are quad core and have hyperthreading.  
It would say 800% if it was fully used.  I've never seen that, but I've seen 
processes running at 350%+  
> One other area of investigation that I didn't mention before: Your> stats 
> imply that the Linux box is getting data 32 KB at a time.  How
> does 32 KB compare to the database block size?  How does 32 KB compare
> to the block size on the relevant zfs filesystem or zvol?  Are blocks
> aligned at the various layers?

Those are all good questions but they are going beyond my level of expertise. 
That's why I'll be wise and soon retain the services of our friend Richard 
Elling here for a few days of consulting :)
Thanks!Ian___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Expected throughput

2010-07-04 Thread Ian D

> In what way is CPU contention being monitored?  "prstat" without
> options is nearly useless for a multithreaded app on a multi-CPU (or
> multi-core/multi-thread) system.  mpstat is only useful if threads
> never migrate between CPU's.  "prstat -mL" gives a nice picture of how
> busy each LWP (thread) is.


Using "prstat -mL" on the Nexenta box shows no serious activity

> Oh, since the database runs on Linux I guess you need to dig up top's
> equivalent of "prstat -mL".  Unfortunately, I don't think that Linux
> has microstate accounting and as such you may not have visibility into
> time spent on traps, text faults, and data faults on a per-process
> basis.


If CPU is the bottleneck then it's probably on the Linux box.  Using "top" the 
following is typical of what I get:

top - 15:04:11 up 24 days,  4:13,  6 users,  load average: 5.87, 5.79, 5.85
Tasks: 307 total,   1 running, 306 sleeping,   0 stopped,   0 zombie
Cpu0  :  0.6%us,  0.3%sy,  0.0%ni, 98.4%id,  0.0%wa,  0.3%hi,  0.3%si,  0.0%st
Cpu1  :  0.0%us,  0.0%sy,  0.0%ni, 96.2%id,  3.8%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  :  2.2%us,  5.1%sy,  0.0%ni, 55.0%id, 36.4%wa,  0.0%hi,  1.3%si,  0.0%st
Cpu3  :  3.3%us,  1.3%sy,  0.0%ni,  0.0%id, 95.0%wa,  0.3%hi,  0.0%si,  0.0%st
Cpu4  :  0.0%us,  0.7%sy,  0.0%ni, 98.7%id,  0.3%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu5  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu6  :  0.0%us,  0.0%sy,  0.0%ni, 99.3%id,  0.7%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu7  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu8  :  1.0%us,  0.0%sy,  0.0%ni, 99.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu9  :  0.3%us,  0.0%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu10 : 16.6%us, 10.9%sy,  0.0%ni,  0.0%id, 70.6%wa,  0.3%hi,  1.6%si,  0.0%st
Cpu11 :  0.6%us,  1.0%sy,  0.0%ni, 66.9%id, 31.5%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu12 :  0.3%us,  0.3%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu13 :  0.3%us,  0.0%sy,  0.0%ni, 95.7%id,  4.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu14 :  1.5%us,  0.0%sy,  0.0%ni, 98.5%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu15 :  0.0%us,  0.7%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  74098512k total, 73910728k used,   187784k free,96948k buffers
Swap:  2104488k total,  208k used,  2104280k free, 63210472k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
  
17652 mysql 20   0 3553m 3.1g 5472 S   38  4.4 247:51.80 mysqld 
   
16301 mysql 20   0 4275m 3.3g 5980 S4  4.7   5468:33 mysqld 
   
16006 mysql 20   0 4434m 3.3g 5888 S3  4.6   5034:06 mysqld 
   
12822 root  15  -5 000 S2  0.0  22:00.50 scsi_wq_39 
   

> Have you done any TCP tuning? 

Some, yes, but since I've seen much more throughput on other tests I've made, I 
dont think it's the bottleneck here.
Thanks!   ___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Expected throughput

2010-07-04 Thread Ian D


>Ok... so we've rebuilt the pool as 14 pairs of mirrors, each pair having one 
>disk in each of the two JBODs.  >Now we're getting about 500-1000 IOPS 
>(according to zpool iostat) and 20-30MB/sec in random read on a big >database. 
> Does that sounds right?>Seems right, as Erik said. Btw, do you use SSDs for 
>L2ARC/SLOG here? If not, that might help quite a bit.
 
I have 8x 100GB SLC SDDs (the Samsung ones that Dell sells) for the L2ARC and 
2x 4GB DDRDrive X1s in mirror for the SLOG.  The server also has 128GB of RAM, 
I can see 100GB+ are used for the ARC.  I'll also double the RAM to 256GB and 
add 4 more SSDs (total 1.2TB) for the L2ARC once I'm ready to go to production. 
 I will eventually connect total of 75 SATA drives and 84 SAS 15K drives to it, 
I just want to make sure that I get the most of what I have.  When I run a 
dozen large SQL queries at once (they can take >10 mins) I consistently get 
300-1000 IOPs and 10-30MB/sec from the pool (according to zpool iostat).
What I don't understand is why, when I run a single query I get <100 IOPS and 
<3MB/sec.  The setup can obviously do better, so where is the bottleneck?  I 
don't see any CPU core on any side being maxed out so it can't be it...
The database is MySQL, it runs on a Linux box that connects to the Nexenta 
server through 10GbE using iSCSI.
You're very helpful btw, thanks a lot!Ian   
  ___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Expected throughput

2010-07-03 Thread Ian D


>To summarise, putting 28 disks in a single vdev is nothing you would do if you 
>want performance. You'll end >up with as many IOPS a single drive can do. 
>Split it up into smaller (<10 disk) vdevs and try again. If you need >high 
>performance, put them in a striped mirror (aka RAID1+0)>A little addition - 
>for 28 drives, I guess I'd choose four vdevs with seven drives each in raidz2. 
>You'll loose >space, but it'll be four times faster, and four times safer. 
>Better safe than sorry
Ok... so we've rebuilt the pool as 14 pairs of mirrors, each pair having one 
disk in each of the two JBODs.  Now we're getting about 500-1000 IOPS 
(according to zpool iostat) and 20-30MB/sec in random read on a big database.  
Does that sounds right?
Thanks___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mix SAS and SATA drives?

2010-07-01 Thread Ian D

> As the 15k drives are faster seek-wise (and possibly faster for linear I/O), 
> you may want to separate them into different VDEVs or even pools, but then, 
> it's quite impossible to give a "correct" answer unless knowing what it's 
> going to be used for.Mostly database duty.> > Also, using 10+ drives in a 
> single VDEV is not really recommended - use fewer drives, loose more space, 
> but get a faster system with fewer drives in each VDEV. 15x 750GB SATA drives 
> in a single VDEV will give you about 120 IOPS for the whole VDEV at most. 
> Using smaller VDEVs will help speeding up things and will give you better 
> protection against faulty drives (silent or noisy). Once more - see 
> http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_GuideWith 
> that much space we can afford mirroring everything.  We'll put every disk in 
> pair across separate JBODs and controllers.Thanks a lot!
  ___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mix SAS and SATA drives?

2010-07-01 Thread Ian D

Sorry for the formatting, that's
2x 15x 1000GB SATA
3x 15x  750GB SATA
2x 12x  600GB SAS 15K
4x 15x  300GB SAS 15K
  ___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Mix SAS and SATA drives?

2010-07-01 Thread Ian D

Another question...We're building a ZFS NAS/SAN out of the following JBODs we 
already own:
2x 15x 1000GB SATA3x 15x  750GB SATA2x 12x  600GB SAS 15K4x 15x  300GB SAS 15K
That's a lot of spindles we'd like to benefit from, but our assumption is that 
we should split these in two separate pools, one for SATA drives and one for 
SAS 15K drives.  Are we right?
Thanks___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Expected throughput

2010-07-01 Thread Ian D

Hi!  We've put 28x 750GB SATA drives in a RAIDZ2 pool (a single vdev) and we 
get about 80MB/s in sequential read or write. We're running local tests on the 
server itself (no network involved).  Is that what we should be expecting?  It 
seems slow to me.
Thanks___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] First Setup

2010-05-02 Thread Ian D

Hi!  We're building our first dedicated ZFS-based NAS/SAN (probably using 
Nexenta) and I'd like to run the specs by you all to see if you have any 
recommendations.  All of it is already bought, but it's not too late to add to 
it.  
Dell PowerEdge R9102x Intel X7550 2GHz, 8 cores each plus HyperThreading256GB 
RAM2x 4GB DDRDrive X1 (PCIe) in mirror for the ZIL8x 100GB Samsung SS805 SSDs 
for the L2ARC (on an onboard PERC H700 controller)2x PERC H800 and 2x PERC6 
controllers for the JBODs below11x Dell MD1000 JBODs with: 45x 750GB SATA 
HDDs 30x 1000GB SATA HDDs 60X 300GB SAS 15K HDDs 30x 600GB SAS 15K 
HDDs2x 10GbE ports
We plan to connect about 30 servers to it.  About 8 will be currently I/O-bound 
MySQL databases with a 60/40 read bias, the rest will have much lighter usage.  
Most connections will be through iSCSI.
Is there anything that seems out of proportion?  Where do you think the 
bottleneck will be?  If I'm going to use the SAS 15K drives for databases and 
the SATA drives for NFS/backups, how should I setup the pools? 
Thank you for any advice!
  ___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss