Re: [zfs-discuss] JBOD performance

2007-12-19 Thread Peter Schuller
 I was just wandering that maybe there's a problem with just one
 disk...

No, this is something I have observed on at least four different systems, with 
vastly varying hardware. Probably just the effects of the known problem.

Thanks,

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller [EMAIL PROTECTED]'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org



signature.asc
Description: This is a digitally signed message part.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] JBOD performance

2007-12-18 Thread Roch - PAE

Frank Penczek writes:
  Hi,
  
  On Dec 17, 2007 4:18 PM, Roch - PAE [EMAIL PROTECTED] wrote:

 The pool holds home directories so small sequential writes to one
 large file present one of a few interesting use cases.
  
   Can you be more specific here ?
  
   Do you have a body of application that would do small
   sequential writes; or one in particular ? Another
   interesting info is if we expect those to be allocating
   writes or overwrite (beware that some app, move the old file
   out, then run allocating writes, then unlink the original
   file).
  
  Sorry, I try to be more specific.
  The zpool contains home directories that are exported to client machines.
  It is hard to predict what exactly users are doing, but one thing users do 
  for
  certain is checking out software projects from our subversion server. The
  projects typically contain many source code files (thousands) and a
  build process
  accesses all of them in the worst case. That is what I meant by many (small)
  files like compiling projects in my previous post. The performance
  for this case
  is ... hopefully improvable.
  

This we'll have to work on. But first, If this is to
Storage with NVRAM, I assume you checked that the storage
does not flush it's caches :


http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Cache_Flushes


If that is not your problem and if ZFS underperform another
FS on the backend of NFS, then this needs investigation.

If ZFS/NFS underperformance a direct attach FS that might
just be an NFS issue not related to ZFS. Again that needs
investigation. 

Performance gains won't happen unless we find out what
doesn't work.

  Now for sequential writes:
  We don't have a specific application issuing sequential writes but I
  can think of
  at least a few cases where these writes may occur, e.g.
  dumps of substantial amounts of measurement data or growing log files
  of applications.
  In either case these would be mainly allocating writes.
  

Right but I'd hope the application would issue substantially 
large writes specially if it' needs to dump  data at high rate.
If the data rate is more modest, then the CPU lost to this 
effect will itself be modest.

  Does this provide the information you're interested in?
  

I get a sense that it's  more important we find out what is
your build issue is. But the small writes will have to be
improved one day also.


-r

  
  Cheers,
Frank

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] JBOD performance

2007-12-18 Thread Peter Schuller
 Sequential writing problem with process throttling - there's an open
 bug for it for quite a while. Try to lower txg_time to 1s - should
 help a little bit.

Yeah, my post was mostly to emphasize that on commodity hardware raidz2 does 
not even come close to being a CPU bottleneck. It wasn't a poke at the 
streaming performance. Very interesting to hear there's a bug open for it 
though.

 Can you also post iostat -xnz 1 while you're doing dd?
 and zpool status

This was FreeBSD, but I can provide iostat -x if you still want it for some 
reason. 

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller [EMAIL PROTECTED]'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org



signature.asc
Description: This is a digitally signed message part.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] JBOD performance

2007-12-18 Thread Robert Milkowski
Hello Peter,

Tuesday, December 18, 2007, 5:12:48 PM, you wrote:

 Sequential writing problem with process throttling - there's an open
 bug for it for quite a while. Try to lower txg_time to 1s - should
 help a little bit.

PS Yeah, my post was mostly to emphasize that on commodity hardware raidz2 does
PS not even come close to being a CPU bottleneck. It wasn't a poke at the
PS streaming performance. Very interesting to hear there's a bug open for it
PS though.

 Can you also post iostat -xnz 1 while you're doing dd?
 and zpool status

PS This was FreeBSD, but I can provide iostat -x if you still want it for some
PS reason. 


I was just wandering that maybe there's a problem with just one
disk...



-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] JBOD performance

2007-12-17 Thread Robert Milkowski
Hello James,

Sunday, December 16, 2007, 9:54:18 PM, you wrote:

JCM hi Frank,

JCM there is an interesting pattern here (at least, to my
JCM untrained eyes) - your %b starts off quite low:


JCM Frank Penczek wrote:
JCM 
 ---
 dd'ing to NFS mount:
 [EMAIL PROTECTED]://tmp dd if=./file.tmp of=/home/fpz/file.tmp
 20+0 records in
 20+0 records out
 10240 bytes (102 MB) copied, 11.3959 seconds, 9.0 MB/s
 
 # iostat -xnz 1
 extended device statistics
 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 2.8   17.3  149.4  127.6  0.0  1.30.0   66.0   0  12 c2t8d0
 2.8   17.3  149.4  127.6  0.0  1.30.0   65.9   0  13 c2t9d0
 2.8   17.3  149.3  127.6  0.0  1.30.0   66.1   0  13 c2t10d0
 2.8   17.3  149.3  127.6  0.0  1.30.0   66.4   0  13 c2t11d0
 2.8   17.3  149.5  127.6  0.0  1.30.0   66.5   0  13 c2t12d0
 0.31.05.4  133.9  0.0  0.00.1   27.2   0   1 c1t1d0
 0.50.3   26.8   16.5  0.0  0.00.1   11.1   0   0 c1t0d0
 extended device statistics
 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 0.01.00.08.0  0.0  0.00.08.9   0   1 c1t1d0
 extended device statistics
 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 0.0   10.00.07.0  0.0  0.00.00.5   0   0 c2t8d0
 0.0   10.00.07.5  0.0  0.00.00.5   0   1 c2t9d0
 0.0   10.00.06.0  0.0  0.00.00.7   0   1 c2t10d0
 0.0   10.00.07.0  0.0  0.00.00.3   0   0 c2t11d0
 0.0   10.00.07.5  0.0  0.00.00.3   0   0 c2t12d0


JCM then it jumps - roughly, quadrupling

 extended device statistics
 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 0.0   67.60.0 1298.6  0.0  9.80.2  145.2   1  71 c2t8d0
 0.0   64.80.0 1139.4  0.0  9.20.0  141.8   0  69 c2t9d0
 0.0   59.20.0  898.9  0.0  8.60.0  144.9   0  68 c2t10d0
 0.0   67.60.0 1379.4  0.0  9.50.0  140.0   0  68 c2t11d0
 0.0   70.40.0 1257.3  0.0 11.40.0  162.1   0  73 c2t12d0

JCM then it maxes out and stays that way

 extended device statistics
 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 0.0   43.80.0 3068.5  0.0 34.90.0  796.0   0 100 c2t8d0
 0.0   55.60.0 3891.9  0.0 34.70.0  624.9   0 100 c2t9d0
 0.0   58.80.0 4211.9  0.0 33.40.0  568.2   0 100 c2t10d0
 0.0   49.20.0 3388.6  0.0 34.50.0  702.3   0 100 c2t11d0
 0.0   57.70.0 3805.3  0.0 34.30.0  594.0   0 100 c2t12d0
 extended device statistics
 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 0.0   60.00.0 4279.6  0.0 35.00.0  583.2   0 100 c2t8d0
 0.0   48.00.0 3423.7  0.0 35.00.0  729.1   0 100 c2t9d0
 0.0   41.00.0 2910.3  0.0 35.00.0  853.6   0 100 c2t10d0
 0.0   50.00.0 3552.2  0.0 35.00.0  699.9   0 100 c2t11d0
 0.0   48.00.0 3423.7  0.0 35.00.0  729.1   0 100 c2t12d0
 extended device statistics
 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 0.0   48.00.0 3424.6  0.0 35.00.0  728.9   0 100 c2t8d0
 0.0   60.00.0 4280.8  0.0 35.00.0  583.1   0 100 c2t9d0
 0.0   55.00.0 3938.2  0.0 35.00.0  636.1   0 100 c2t10d0
 0.0   56.00.0 4024.3  0.0 35.00.0  624.7   0 100 c2t11d0
 0.0   48.00.0 3424.6  0.0 35.00.0  728.9   0 100 c2t12d0
 extended device statistics
 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 0.0   52.00.0 3723.5  0.0 35.00.0  672.9   0 100 c2t8d0
 0.0   43.00.0 3081.5  0.0 35.00.0  813.8   0 100 c2t9d0
 0.0   46.00.0 3296.0  0.0 35.00.0  760.7   0 100 c2t10d0
 0.0   48.00.0 3424.0  0.0 35.00.0  729.0   0 100 c2t11d0
 0.0   62.00.0 4408.1  0.0 35.00.0  564.4   0 100 c2t12d0
 extended device statistics
 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 0.0   60.00.0 4279.8  0.0 35.00.0  583.2   0 100 c2t8d0
 0.0   57.00.0 4065.8  0.0 35.00.0  613.9   0 100 c2t9d0
 0.0   59.00.0 4194.3  0.0 35.00.0  593.1   0 100 c2t10d0
 0.0   56.00.0 4023.3  0.0 35.00.0  624.9   0 100 c2t11d0
 0.0   48.00.0 3424.3  0.0 35.00.0  729.1   0 100 c2t12d0


JCM drops back a fraction

 extended device statistics
 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 0.0   65.70.0 1385.0  0.0 14.50.0  220.8   0  90 c2t8d0
 0.9   68.4   39.8 1623.6  0.0 13.00.0  187.8   0  87 c2t9d0
 0.9   74.9   39.3 2054.6  0.0 16.70.0  219.6   0  94 c2t10d0
 0.9   70.3   39.3 1662.9  0.0 15.40.0  216.1   0  95 c2t11d0
 

Re: [zfs-discuss] JBOD performance

2007-12-17 Thread James C. McPherson
Robert Milkowski wrote:
 Hello James,
 
 Sunday, December 16, 2007, 9:54:18 PM, you wrote:
 
 JCM hi Frank,
 
 JCM there is an interesting pattern here (at least, to my
 JCM untrained eyes) - your %b starts off quite low:

 JCM All of which, to me, look like you're filling a buffer
 JCM or two.
 
 JCM I don't recall the config of your zpool, but if the
 JCM devices are disks that are direct or san-attached, I
 JCM would be wondering about their outstanding queue depths.
 
 JCM I think it's time to break out some D to find out where
 JCM in the stack the bottleneck(s) really are.
 Maybe he could try to limit # of queued request per disk in zfs to
 something smaller than default 35 (maybe even down to 1?)

Hi Robert,
yup, that's on my list of things for Frank to try. I've
asked for a bit more config information though so we can
get a bit of clarity on that front first.



James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] JBOD performance

2007-12-17 Thread Roch - PAE


dd uses a default block size of 512B.  Does this map to your
expected usage ? When I quickly tested the CPU cost of small
read from cache, I did see that ZFS was more costly than UFS
up to a crossover between 8K and 16K.   We might need a more
comprehensive study of that (data in/out of cache, different
recordsize  alignment constraints   ).   But  for small
syscalls, I think we might need some work  in ZFS to make it
CPU efficient.

So first,  does  small sequential writeto a large  file,
matches an interesting use case ?


-r

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] JBOD performance

2007-12-17 Thread Frank Penczek
Hi,

On Dec 17, 2007 10:37 AM, Roch - PAE [EMAIL PROTECTED] wrote:


 dd uses a default block size of 512B.  Does this map to your
 expected usage ? When I quickly tested the CPU cost of small
 read from cache, I did see that ZFS was more costly than UFS
 up to a crossover between 8K and 16K.   We might need a more
 comprehensive study of that (data in/out of cache, different
 recordsize  alignment constraints   ).   But  for small
 syscalls, I think we might need some work  in ZFS to make it
 CPU efficient.

 So first,  does  small sequential writeto a large  file,
 matches an interesting use case ?

The pool holds home directories so small sequential writes to one
large file present one of a few interesting use cases.
The performance is equally disappointing for many (small) files
like compiling projects in svn repositories.

Cheers,
  Frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] JBOD performance

2007-12-17 Thread Roch - PAE
Frank Penczek writes:
  Hi,
  
  On Dec 17, 2007 10:37 AM, Roch - PAE [EMAIL PROTECTED] wrote:
  
  
   dd uses a default block size of 512B.  Does this map to your
   expected usage ? When I quickly tested the CPU cost of small
   read from cache, I did see that ZFS was more costly than UFS
   up to a crossover between 8K and 16K.   We might need a more
   comprehensive study of that (data in/out of cache, different
   recordsize  alignment constraints   ).   But  for small
   syscalls, I think we might need some work  in ZFS to make it
   CPU efficient.
  
   So first,  does  small sequential writeto a large  file,
   matches an interesting use case ?
  
  The pool holds home directories so small sequential writes to one
  large file present one of a few interesting use cases.

Can you be more specific here ?

Do you have a body of application that would do small
sequential writes; or one in particular ? Another
interesting info is if we expect those to be allocating
writes or overwrite (beware that some app, move the old file 
out, then run allocating writes, then unlink the original
file).



  The performance is equally disappointing for many (small) files
  like compiling projects in svn repositories.
  

???

-r


  Cheers,
Frank

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] JBOD performance

2007-12-17 Thread Rob Logan
  r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  0.0   48.00.0 3424.6  0.0 35.00.0  728.9   0 100 c2t8d0

  That service time is just terrible!

yea, that service time is unreasonable. almost a second for each
command? and 35 more commands queued? (reorder = faster)

I had a server with similar service times, so I repaired
a replacement blade and when I went to slid it in, noticed a
loud noise coming from the blade below it.. notified the windows
person who owned it and it had been broken for some time
and turned it off... it was much better after that.

vibration... check vibration.

Rob
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] JBOD performance

2007-12-17 Thread Frank Penczek
Hi,

On Dec 17, 2007 4:18 PM, Roch - PAE [EMAIL PROTECTED] wrote:
  
   The pool holds home directories so small sequential writes to one
   large file present one of a few interesting use cases.

 Can you be more specific here ?

 Do you have a body of application that would do small
 sequential writes; or one in particular ? Another
 interesting info is if we expect those to be allocating
 writes or overwrite (beware that some app, move the old file
 out, then run allocating writes, then unlink the original
 file).

Sorry, I try to be more specific.
The zpool contains home directories that are exported to client machines.
It is hard to predict what exactly users are doing, but one thing users do for
certain is checking out software projects from our subversion server. The
projects typically contain many source code files (thousands) and a
build process
accesses all of them in the worst case. That is what I meant by many (small)
files like compiling projects in my previous post. The performance
for this case
is ... hopefully improvable.

Now for sequential writes:
We don't have a specific application issuing sequential writes but I
can think of
at least a few cases where these writes may occur, e.g.
dumps of substantial amounts of measurement data or growing log files
of applications.
In either case these would be mainly allocating writes.

Does this provide the information you're interested in?


Cheers,
  Frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] JBOD performance

2007-12-16 Thread Frank Penczek
Hi,

On Dec 14, 2007 7:50 PM, Louwtjie Burger [EMAIL PROTECTED] wrote:
[...]
 I would have said ... to be expected, since the 280 came with a
 100Mbit interface. So a 9-12 MB/s peak would be acceptable. You did
 mention a gigabit switch... did you install a gigabit HBA ? If
 that's the case then yes, performance sucks.

Yes, sorry, I forgot to mention that we're using a GigaSwift NIC in that
machine.

 fsstat?

What 'fsstat' output are you interested in? For a start  here's 'fsstat -F':

# fsstat -F
 new  name   name  attr  attr lookup rddir  read read  write write
 file remov  chng   get   setops   ops   ops bytes   ops bytes
 660K 69.8K 16.0K 11.3M  170K  60.2M  154K 25.6M 14.0G 6.05M 58.9G ufs
0 0 0 43.6K 0  78.8K 14.0K 27.8K 10.0M 0 0 proc
0 0 021 0  0 0 0 0 0 0 nfs
30.7K 7.67K 11.2K  348M 56.9K   122M 4.61M 1.65M 25.1G 1.05M 58.2G zfs
0 0 0  574K 0  0 0 0 0 0 0 lofs
 162K 17.3K  120K  273K 15.9K  1.48M 4.60K  418K 1.24G 1.10M 5.93G tmpfs
0 0 0 6.18K 0  0 051 9.31K 0 0 mntfs
0 0 0 0 0  0 0 0 0 0 0 nfs3
0 0 0 0 0  0 0 0 0 0 0 nfs4
0 0 043 0  0 0 0 0 0 0 autofs


Thanks,
  Frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] JBOD performance

2007-12-16 Thread Frank Penczek
Hi,

sorry for the lengthy post ...

On Dec 15, 2007 1:56 PM, Robert Milkowski [EMAIL PROTECTED] wrote:
[...]
 Sequential writing problem with process throttling - there's an open
 bug for it for quite a while. Try to lower txg_time to 1s - should
 help a little bit.

Since setting txg_time to 1 the periodic drop in bandwidth seems to have gone.
That's great. Unfortunately the performance is still not amazing - 10MB/s over
the network and not more...

 Can you also post iostat -xnz 1 while you're doing dd?
 and zpool status

---
# zpool status
  pool: storage_array
 state: ONLINE
 scrub: scrub completed with 0 errors on Wed Dec 12 23:38:36 2007
config:

NAME STATE READ WRITE CKSUM
storage_array  ONLINE   0 0 0
  raidz2 ONLINE   0 0 0
c2t8d0   ONLINE   0 0 0
c2t9d0   ONLINE   0 0 0
c2t10d0  ONLINE   0 0 0
c2t11d0  ONLINE   0 0 0
c2t12d0  ONLINE   0 0 0

errors: No known data errors


---
dd'ing to NFS mount:
[EMAIL PROTECTED]://tmp dd if=./file.tmp of=/home/fpz/file.tmp
20+0 records in
20+0 records out
10240 bytes (102 MB) copied, 11.3959 seconds, 9.0 MB/s

# iostat -xnz 1
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
2.8   17.3  149.4  127.6  0.0  1.30.0   66.0   0  12 c2t8d0
2.8   17.3  149.4  127.6  0.0  1.30.0   65.9   0  13 c2t9d0
2.8   17.3  149.3  127.6  0.0  1.30.0   66.1   0  13 c2t10d0
2.8   17.3  149.3  127.6  0.0  1.30.0   66.4   0  13 c2t11d0
2.8   17.3  149.5  127.6  0.0  1.30.0   66.5   0  13 c2t12d0
0.31.05.4  133.9  0.0  0.00.1   27.2   0   1 c1t1d0
0.50.3   26.8   16.5  0.0  0.00.1   11.1   0   0 c1t0d0
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.01.00.08.0  0.0  0.00.08.9   0   1 c1t1d0
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.0   10.00.07.0  0.0  0.00.00.5   0   0 c2t8d0
0.0   10.00.07.5  0.0  0.00.00.5   0   1 c2t9d0
0.0   10.00.06.0  0.0  0.00.00.7   0   1 c2t10d0
0.0   10.00.07.0  0.0  0.00.00.3   0   0 c2t11d0
0.0   10.00.07.5  0.0  0.00.00.3   0   0 c2t12d0
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.0   67.60.0 1298.6  0.0  9.80.2  145.2   1  71 c2t8d0
0.0   64.80.0 1139.4  0.0  9.20.0  141.8   0  69 c2t9d0
0.0   59.20.0  898.9  0.0  8.60.0  144.9   0  68 c2t10d0
0.0   67.60.0 1379.4  0.0  9.50.0  140.0   0  68 c2t11d0
0.0   70.40.0 1257.3  0.0 11.40.0  162.1   0  73 c2t12d0
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.0   43.80.0 3068.5  0.0 34.90.0  796.0   0 100 c2t8d0
0.0   55.60.0 3891.9  0.0 34.70.0  624.9   0 100 c2t9d0
0.0   58.80.0 4211.9  0.0 33.40.0  568.2   0 100 c2t10d0
0.0   49.20.0 3388.6  0.0 34.50.0  702.3   0 100 c2t11d0
0.0   57.70.0 3805.3  0.0 34.30.0  594.0   0 100 c2t12d0
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.0   60.00.0 4279.6  0.0 35.00.0  583.2   0 100 c2t8d0
0.0   48.00.0 3423.7  0.0 35.00.0  729.1   0 100 c2t9d0
0.0   41.00.0 2910.3  0.0 35.00.0  853.6   0 100 c2t10d0
0.0   50.00.0 3552.2  0.0 35.00.0  699.9   0 100 c2t11d0
0.0   48.00.0 3423.7  0.0 35.00.0  729.1   0 100 c2t12d0
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.0   48.00.0 3424.6  0.0 35.00.0  728.9   0 100 c2t8d0
0.0   60.00.0 4280.8  0.0 35.00.0  583.1   0 100 c2t9d0
0.0   55.00.0 3938.2  0.0 35.00.0  636.1   0 100 c2t10d0
0.0   56.00.0 4024.3  0.0 35.00.0  624.7   0 100 c2t11d0
0.0   48.00.0 3424.6  0.0 35.00.0  728.9   0 100 c2t12d0
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.0   52.00.0 3723.5  0.0 35.00.0  672.9   0 100 c2t8d0
0.0   43.00.0 3081.5  0.0 35.00.0  813.8   0 100 c2t9d0
0.0   46.00.0 3296.0  0.0 35.00.0  760.7   0 100 c2t10d0
0.0   48.00.0 3424.0  0.0 35.00.0  729.0   0 100 c2t11d0
0.0   62.00.0 4408.1  0.0 35.00.0  564.4   0 100 c2t12d0
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.0   60.00.0 4279.8  0.0 35.00.0  583.2   0 100 c2t8d0

Re: [zfs-discuss] JBOD performance

2007-12-16 Thread Frank Penczek
Hi,

On Dec 14, 2007 8:24 PM, Richard Elling [EMAIL PROTECTED] wrote:
 Frank Penczek wrote:
 
  The performance is slightly disappointing. Does anyone have
  a similar setup and can anyone share some figures?
  Any pointers to possible improvements are greatly appreciated.
 
 

 Use a faster processor or change to a mirrored configuration.
 raidz2 can become processor bound in the Reed-Soloman calculations
 for the 2nd parity set.  You should be able to see this in mpstat, and to
 a coarser grain in vmstat.
  -- richard


Thanks for the hint. When dd'ing to the pool, mpstat tells me:

# mpstat 1
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0   46   0   67   518  413  5094   46   260   2721   3   0  96
  1   44   0   64  1765  141  5765   46   260   2561   3   0  96
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  00   0   10   804  698 11304   69   520   2180   5   0  95
  17   0   10  3301  390 11895   79   640   1060   4   0  96
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  00   04   294  188  6013   56   590850   2   0  98
  10   04  1029  319  5931   57   610780   2   0  98
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  00   02   303  198  2382   33   100   1450   0   0 100
  10   04   283   74  2613   3080   1590   1   0  99
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  02   0  113  1055  885  813  137  120   209 328337  57   0  36
  1   90   2   74  3622  220 1328   74  118   49   18 199565  45   0  50
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0   14   0  115   438  191  876  181  127   400 324257  54   0  39
  1   14   0  197  1513  453  671  118  132   540 239295  53   0  42
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  00   0   98   901  679 1726  163  189   430 267185  48   0  47
  10   0  121  3722  508  843  171  194   320 299476  53   0  41
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  00   0   50   418  185  905  162  109   210 318847  54   0  39
  10   0  135  1772  550  670  102  107   370 238825  55   0  40
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  00   0   58   353  184  330  106  105   170 30134   28  47   0  26
  11   0   74   862  250  604  128  106   270 263126  45   0  50
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  00   0   60  1087  851 1542  231  174   311 338007  61   0  32
  10   0  136  4191  444 1072  125  165   391 202734  52   0  44
...

Based on the 'idl' column I interpret these numbers as
there are resources left or is it me being naive?

Cheers,
  Frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] JBOD performance

2007-12-16 Thread Louwtjie Burger
 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 0.0   48.00.0 3424.6  0.0 35.00.0  728.9   0 100 c2t8d0
 0.0   60.00.0 4280.8  0.0 35.00.0  583.1   0 100 c2t9d0
 0.0   55.00.0 3938.2  0.0 35.00.0  636.1   0 100 c2t10d0
 0.0   56.00.0 4024.3  0.0 35.00.0  624.7   0 100 c2t11d0
 0.0   48.00.0 3424.6  0.0 35.00.0  728.9   0 100 c2t12d0

That service time is just terrible!
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] JBOD performance

2007-12-16 Thread James C. McPherson

hi Frank,

there is an interesting pattern here (at least, to my
untrained eyes) - your %b starts off quite low:


Frank Penczek wrote:

 ---
 dd'ing to NFS mount:
 [EMAIL PROTECTED]://tmp dd if=./file.tmp of=/home/fpz/file.tmp
 20+0 records in
 20+0 records out
 10240 bytes (102 MB) copied, 11.3959 seconds, 9.0 MB/s
 
 # iostat -xnz 1
 extended device statistics
 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 2.8   17.3  149.4  127.6  0.0  1.30.0   66.0   0  12 c2t8d0
 2.8   17.3  149.4  127.6  0.0  1.30.0   65.9   0  13 c2t9d0
 2.8   17.3  149.3  127.6  0.0  1.30.0   66.1   0  13 c2t10d0
 2.8   17.3  149.3  127.6  0.0  1.30.0   66.4   0  13 c2t11d0
 2.8   17.3  149.5  127.6  0.0  1.30.0   66.5   0  13 c2t12d0
 0.31.05.4  133.9  0.0  0.00.1   27.2   0   1 c1t1d0
 0.50.3   26.8   16.5  0.0  0.00.1   11.1   0   0 c1t0d0
 extended device statistics
 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 0.01.00.08.0  0.0  0.00.08.9   0   1 c1t1d0
 extended device statistics
 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 0.0   10.00.07.0  0.0  0.00.00.5   0   0 c2t8d0
 0.0   10.00.07.5  0.0  0.00.00.5   0   1 c2t9d0
 0.0   10.00.06.0  0.0  0.00.00.7   0   1 c2t10d0
 0.0   10.00.07.0  0.0  0.00.00.3   0   0 c2t11d0
 0.0   10.00.07.5  0.0  0.00.00.3   0   0 c2t12d0


then it jumps - roughly, quadrupling

 extended device statistics
 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 0.0   67.60.0 1298.6  0.0  9.80.2  145.2   1  71 c2t8d0
 0.0   64.80.0 1139.4  0.0  9.20.0  141.8   0  69 c2t9d0
 0.0   59.20.0  898.9  0.0  8.60.0  144.9   0  68 c2t10d0
 0.0   67.60.0 1379.4  0.0  9.50.0  140.0   0  68 c2t11d0
 0.0   70.40.0 1257.3  0.0 11.40.0  162.1   0  73 c2t12d0

then it maxes out and stays that way

 extended device statistics
 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 0.0   43.80.0 3068.5  0.0 34.90.0  796.0   0 100 c2t8d0
 0.0   55.60.0 3891.9  0.0 34.70.0  624.9   0 100 c2t9d0
 0.0   58.80.0 4211.9  0.0 33.40.0  568.2   0 100 c2t10d0
 0.0   49.20.0 3388.6  0.0 34.50.0  702.3   0 100 c2t11d0
 0.0   57.70.0 3805.3  0.0 34.30.0  594.0   0 100 c2t12d0
 extended device statistics
 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 0.0   60.00.0 4279.6  0.0 35.00.0  583.2   0 100 c2t8d0
 0.0   48.00.0 3423.7  0.0 35.00.0  729.1   0 100 c2t9d0
 0.0   41.00.0 2910.3  0.0 35.00.0  853.6   0 100 c2t10d0
 0.0   50.00.0 3552.2  0.0 35.00.0  699.9   0 100 c2t11d0
 0.0   48.00.0 3423.7  0.0 35.00.0  729.1   0 100 c2t12d0
 extended device statistics
 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 0.0   48.00.0 3424.6  0.0 35.00.0  728.9   0 100 c2t8d0
 0.0   60.00.0 4280.8  0.0 35.00.0  583.1   0 100 c2t9d0
 0.0   55.00.0 3938.2  0.0 35.00.0  636.1   0 100 c2t10d0
 0.0   56.00.0 4024.3  0.0 35.00.0  624.7   0 100 c2t11d0
 0.0   48.00.0 3424.6  0.0 35.00.0  728.9   0 100 c2t12d0
 extended device statistics
 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 0.0   52.00.0 3723.5  0.0 35.00.0  672.9   0 100 c2t8d0
 0.0   43.00.0 3081.5  0.0 35.00.0  813.8   0 100 c2t9d0
 0.0   46.00.0 3296.0  0.0 35.00.0  760.7   0 100 c2t10d0
 0.0   48.00.0 3424.0  0.0 35.00.0  729.0   0 100 c2t11d0
 0.0   62.00.0 4408.1  0.0 35.00.0  564.4   0 100 c2t12d0
 extended device statistics
 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 0.0   60.00.0 4279.8  0.0 35.00.0  583.2   0 100 c2t8d0
 0.0   57.00.0 4065.8  0.0 35.00.0  613.9   0 100 c2t9d0
 0.0   59.00.0 4194.3  0.0 35.00.0  593.1   0 100 c2t10d0
 0.0   56.00.0 4023.3  0.0 35.00.0  624.9   0 100 c2t11d0
 0.0   48.00.0 3424.3  0.0 35.00.0  729.1   0 100 c2t12d0


drops back a fraction

 extended device statistics
 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 0.0   65.70.0 1385.0  0.0 14.50.0  220.8   0  90 c2t8d0
 0.9   68.4   39.8 1623.6  0.0 13.00.0  187.8   0  87 c2t9d0
 0.9   74.9   39.3 2054.6  0.0 16.70.0  219.6   0  94 c2t10d0
 0.9   70.3   39.3 1662.9  0.0 15.40.0  216.1   0  95 c2t11d0
 0.0   68.40.0 1736.0  0.0 14.90.0  217.9   0  87 c2t12d0
 extended 

Re: [zfs-discuss] JBOD performance

2007-12-15 Thread Robert Milkowski
Hello Peter,

Saturday, December 15, 2007, 7:45:50 AM, you wrote:

 Use a faster processor or change to a mirrored configuration.
 raidz2 can become processor bound in the Reed-Soloman calculations
 for the 2nd parity set.  You should be able to see this in mpstat, and to
 a coarser grain in vmstat.

PS Hmm. Is the OP's hardware *that* slow? (I don't know enough about the Sun
PS hardware models)

PS I have a 5-disk raidz2 (cheap SATA) here on my workstation, which is an X2
PS 3800+ (i.e., one of the earlier AMD dual-core offerings). Here's me dd:ing 
to
PS a file on FreeBSD on ZFS running on that hardware:

PS promraid 741G   387G  0380  0  47.2M
PS promraid 741G   387G  0336  0  41.8M
PS promraid 741G   387G  0424510  51.0M
PS promraid 741G   387G  0441  0  54.5M
PS promraid 741G   387G  0514  0  19.2M
PS promraid 741G   387G 34192  4.12M  24.1M
PS promraid 741G   387G  0341  0  42.7M
PS promraid 741G   387G  0361  0  45.2M
PS promraid 741G   387G  0350  0  43.9M
PS promraid 741G   387G  0370  0  46.3M
PS promraid 741G   387G  1423   134K  51.7M
PS promraid 742G   386G 22329  2.39M  10.3M
PS promraid 742G   386G 28214  3.49M  26.8M
PS promraid 742G   386G  0347  0  43.5M
PS promraid 742G   386G  0349  0  43.7M
PS promraid 742G   386G  0354  0  44.3M
PS promraid 742G   386G  0365  0  45.7M
PS promraid 742G   386G  2460  7.49K  55.5M

PS At this point the bottleneck looks architectural rather than CPU. None of 
the
PS cores are saturated, and the CPU usage of the ZFS kernel threads is pretty
PS low.

PS I say architectural because writes to the underlying devices are not 
PS sustained; it drops to almost zero for certain periods (this is more visible
PS in iostat -x than it is in the zpool statistics). What I think is happening
PS is that ZFS is too late to evict data in the cache, thus blocking the 
writing
PS process. Once a transaction group with a bunch of data gets committed the
PS application unblocks, but presumably ZFS waits for a little while before
PS resuming writes.

PS Note that this is also being run on plain hardware; it's not even PCI 
Express.
PS During throughput peaks, but not constantly, the bottleneck is probably the
PS PCI bus.


Sequential writing problem with process throttling - there's an open
bug for it for quite a while. Try to lower txg_time to 1s - should
help a little bit.

Can you also post iostat -xnz 1 while you're doing dd?
and zpool status



-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] JBOD performance

2007-12-14 Thread Richard Elling
Frank Penczek wrote:

 The performance is slightly disappointing. Does anyone have
 a similar setup and can anyone share some figures?
 Any pointers to possible improvements are greatly appreciated.

   

Use a faster processor or change to a mirrored configuration.
raidz2 can become processor bound in the Reed-Soloman calculations
for the 2nd parity set.  You should be able to see this in mpstat, and to
a coarser grain in vmstat.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] JBOD performance

2007-12-14 Thread Louwtjie Burger
 The throughput when writing from a local disk to the
 zpool is around 30MB/s, when writing from a client

Err.. sorry, the internal storage would be good old 1Gbit FCAL disks @
10K rpm. Still, not the fastest around ;)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] JBOD performance

2007-12-14 Thread Peter Schuller
 Use a faster processor or change to a mirrored configuration.
 raidz2 can become processor bound in the Reed-Soloman calculations
 for the 2nd parity set.  You should be able to see this in mpstat, and to
 a coarser grain in vmstat.

Hmm. Is the OP's hardware *that* slow? (I don't know enough about the Sun 
hardware models)

I have a 5-disk raidz2 (cheap SATA) here on my workstation, which is an X2 
3800+ (i.e., one of the earlier AMD dual-core offerings). Here's me dd:ing to 
a file on FreeBSD on ZFS running on that hardware:

promraid 741G   387G  0380  0  47.2M
promraid 741G   387G  0336  0  41.8M
promraid 741G   387G  0424510  51.0M
promraid 741G   387G  0441  0  54.5M
promraid 741G   387G  0514  0  19.2M
promraid 741G   387G 34192  4.12M  24.1M
promraid 741G   387G  0341  0  42.7M
promraid 741G   387G  0361  0  45.2M
promraid 741G   387G  0350  0  43.9M
promraid 741G   387G  0370  0  46.3M
promraid 741G   387G  1423   134K  51.7M
promraid 742G   386G 22329  2.39M  10.3M
promraid 742G   386G 28214  3.49M  26.8M
promraid 742G   386G  0347  0  43.5M
promraid 742G   386G  0349  0  43.7M
promraid 742G   386G  0354  0  44.3M
promraid 742G   386G  0365  0  45.7M
promraid 742G   386G  2460  7.49K  55.5M

At this point the bottleneck looks architectural rather than CPU. None of the 
cores are saturated, and the CPU usage of the ZFS kernel threads is pretty 
low.

I say architectural because writes to the underlying devices are not 
sustained; it drops to almost zero for certain periods (this is more visible 
in iostat -x than it is in the zpool statistics). What I think is happening 
is that ZFS is too late to evict data in the cache, thus blocking the writing 
process. Once a transaction group with a bunch of data gets committed the 
application unblocks, but presumably ZFS waits for a little while before 
resuming writes.

Note that this is also being run on plain hardware; it's not even PCI Express. 
During throughput peaks, but not constantly, the bottleneck is probably the 
PCI bus.

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller [EMAIL PROTECTED]'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org



signature.asc
Description: This is a digitally signed message part.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss