[zfs-discuss] set zfs:zfs_vdev_max_pending

2010-01-12 Thread Ed Spencer
We have a zpool made of 4 512g iscsi luns located on a network appliance.
We are seeing poor read performance from the zfs pool. 
The release of solaris we are using is:
Solaris 10 10/09 s10s_u8wos_08a SPARC

The server itself is a T2000

I was wondering how we can tell if the zfs_vdev_max_pending setting is impeding 
read performance of the zfs pool? (The pool consists of lots of small files).

And if it is impeding read performance, how do we go about finding a new value 
for this parameter?

Of course I may misunderstand this parameter entirely and would be quite happy 
for an proper explanation!

--
Ed
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs fragmentation

2009-08-11 Thread Ed Spencer
I've come up with a better name for the concept of file and directory
fragmentation which is, Filesystem Entropy. Where, over time, an
active and volitile  filesystem moves from an organized state to a
disorganized state resulting in backup difficulties.

Here are some stats which illustrate the issue:

First the development mail server:
==
(Jump frames, Nagle disabled and tcp_xmit_hiwat,tcp_recv_hiwat set to
2097152)

Small file workload (copy from zfs on iscsi network to local ufs
filesystem)
# zpool iostat 10
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
space   70.5G  29.0G  3  0   247K  59.7K
space   70.5G  29.0G136  0  8.37M  0
space   70.5G  29.0G115  0  6.31M  0
space   70.5G  29.0G108  0  7.08M  0
space   70.5G  29.0G105  0  3.72M  0
space   70.5G  29.0G135  0  3.74M  0
space   70.5G  29.0G155  0  6.09M  0
space   70.5G  29.0G193  0  4.85M  0
space   70.5G  29.0G142  0  5.73M  0
space   70.5G  29.0G159  0  7.87M  0

Large File workload (cd and dvd iso's) 
# zpool iostat 10
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
space   70.5G  29.0G  3  0   224K  59.8K
space   70.5G  29.0G462  0  57.8M  0
space   70.5G  29.0G427  0  53.5M  0
space   70.5G  29.0G406  0  50.8M  0
space   70.5G  29.0G430  0  53.8M  0
space   70.5G  29.0G382  0  47.9M  0

The production mail server:
===
Mail system is running with 790 imap users logged in (low imap work
load).
Two backup streams are running.
Not using jumbo frames, nagle enabled, tcp_xmit_hiwat,tcp_recv_hiwat set
to 2097152
- we've never seen any effect of changing the iscsi transport
parameters
  under this small file workload.

# zpool iostat 10
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
space   1.06T   955G 96 69  5.20M  2.69M
space   1.06T   955G175105  8.96M  2.22M
space   1.06T   955G182 16  4.47M   546K
space   1.06T   955G170 16  4.82M  1.85M
space   1.06T   955G145159  4.23M  3.19M
space   1.06T   955G138 15  4.97M  92.7K
space   1.06T   955G134 15  3.82M  1.71M
space   1.06T   955G109123  3.07M  3.08M
space   1.06T   955G106 11  3.07M  1.34M
space   1.06T   955G120 17  3.69M  1.74M

# prstat -mL
   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG
PROCESS/LWPID
 12438 root  12 6.9 0.0 0.0 0.0 0.0  81 0.1 508  84  4K   0 save/1
 27399 cyrus 15 0.5 0.0 0.0 0.0 0.0  85 0.0  18  10 297   0 imapd/1
 20230 root 3.9 8.0 0.0 0.0 0.0 0.0  88 0.1 393  33  2K   0 save/1
 25913 root 0.5 3.3 0.0 0.0 0.0 0.0  96 0.0  22   2  1K   0 prstat/1
 20495 cyrus1.1 0.2 0.0 0.0 0.5 0.0  98 0.0  14   3 191   0 imapd/1
  1051 cyrus1.2 0.0 0.0 0.0 0.0 0.0  99 0.0  19   1  80   0 master/1
 24350 cyrus0.5 0.5 0.0 0.0 1.4 0.0  98 0.0  57   1 484   0 lmtpd/1
 22645 cyrus0.6 0.3 0.0 0.0 0.0 0.0  99 0.0  53   1 603   0 imapd/1
 24904 cyrus0.3 0.4 0.0 0.0 0.0 0.0  99 0.0  66   0 863   0 imapd/1
 18139 cyrus0.3 0.2 0.0 0.0 0.0 0.0  99 0.0  24   0 195   0 imapd/1
 21459 cyrus0.2 0.3 0.0 0.0 0.0 0.0  99 0.0  54   0 635   0 imapd/1
 24891 cyrus0.3 0.3 0.0 0.0 0.9 0.0  99 0.0  28   0 259   0 lmtpd/1
   388 root 0.2 0.3 0.0 0.0 0.0 0.0 100 0.0   1   1  48   0
in.routed/1
 21643 cyrus0.2 0.3 0.0 0.0 0.2 0.0  99 0.0  49   7 540   0 imapd/1
 18684 cyrus0.2 0.3 0.0 0.0 0.0 0.0 100 0.0  48   1 544   0 imapd/1
 25398 cyrus0.2 0.2 0.0 0.0 0.0 0.0 100 0.0  47   0 466   0 pop3d/1
 23724 cyrus0.2 0.2 0.0 0.0 0.0 0.0 100 0.0  47   0 540   0 imapd/1
 24909 cyrus0.1 0.2 0.0 0.0 0.2 0.0  99 0.0  25   1 251   0 lmtpd/1
 16317 cyrus0.2 0.2 0.0 0.0 0.0 0.0 100 0.0  37   1 495   0 imapd/1
 28243 cyrus0.1 0.3 0.0 0.0 0.0 0.0 100 0.0  32   0 289   0 imapd/1
 20097 cyrus0.1 0.2 0.0 0.0 0.3 0.0  99 0.0  26   5 253   0 lmtpd/1
Total: 893 processes, 1125 lwps, load averages: 1.14, 1.16, 1.16
 
-- 
Ed  

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs fragmentation

2009-08-11 Thread Ed Spencer

On Tue, 2009-08-11 at 07:58, Alex Lam S.L. wrote:
 At a first glance, your production server's numbers are looking fairly
 similar to the small file workload results of your development
 server.
 
 I thought you were saying that the development server has faster performance?

The development serer was running only one cp -pr command.

The production mail sevrer was running two concurrent backup jobs and of
course the mail system, with each job having the same performance
throughput as if there were a single job running. The single threaded
backup jobs do not conflict with each other over performance.

If we ran 20 concurrent backup jobs, overall performance would scale up
quite a bit. (I would guess between 5 and 10 times the performance). (I
just read Mike's post and will do some 'concurrency' testing).

Users are currently evenly distributed over 5 filesystems (I previously
mentioned 7 but its really 5 filesystems for users and 1 for system
data, totalling 6, and one test filesystem).

We backup 2 filesystems on tuesday, 2 filesystems on thursday, and 2 on
saturday. We backup to disk and then clone to tape. Our backup people
can only handle doing 2 filesystems per night.

Creating more filesystems to increase the parallelism of our backup is
one solution but its a major redesign of the of the mail system.

Adding a second server to half the pool and thereby  half the problem is
another solution (and we would also create more filesystems at the same
time).

Moving the pool to a FC San or a JBOD may also increase performance.
(Less layers, introduced by the appliance, thereby increasing
performance.)

I suspect that if we 'rsync' one of these filesystems to a second
server/pool  that we would also see a performance increase equal to what
we see on the development server. (I don't know how zfs send a receive
work so I don't know if it would address this Filesystem Entropy or
specifically reorganize the files and directories). However, when we
created a testfs filesystem in the zfs pool on the production server,
and copied data to it, we saw the same performance as the other
filesystems, in the same pool.

We will have to do something to address the problem. A combination of
what I just listed is our probable course of action. (Much testing will
have to be done to ensure our solution will address the problem because
we are not 100% sure what is the cause of performance degradation).  I'm
also dealing with Network Appliance to see if there is anything we can
do at the filer end to increase performance.  But I'm holding out little
hope.

But please, don't miss the point I'm trying to make. ZFS would benefit
from a utility or a background process that would reorganize files and
directories in the pool to optimize performance. A utility to deal with
Filesystem Entropy. Currently a zfs pool will live as long as the
lifetime of the disks that it is on, without reorganization. This can be
a long long time. Not to mention slowly expanding the pool over time
contributes to the issue. 

--
Ed



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs fragmentation

2009-08-11 Thread Ed Spencer
Concurrency/Parallelism testing.
I have 6 different filesystems populated with email data on our mail
development server.
I rebooted the server before beginning the tests.
The server is a T2000 (sun4v) machine so its ideally suited for this
type of testing.
The test was to tar (to /dev/null) each of the filesystems. Launch 1,
gather stats launch another , gather stats, etc.
The underlying storage system is a Network Appliance. Our only one. In
production. Serving NFS, CIFS and iscsi. Other work the appliance is
doing may effect these tests, and vice versa :) . No one seemed to
notice I was running these tests. 

After 6 concurrent tar's running we are probabaly seeing benefits of the
ARC. 
At certian points I included load averages and traffic stats for each of
the iscsi ethernet interfaces that are configured with MPXIO.

After the first 6 jobs, I launched duplicates of the 6. Then another 6,
etc.

At the end I included the zfs kernel statistics:

1 job
=
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
space   70.5G  29.0G  0  0  0  0
space   70.5G  29.0G 19  0  1.04M  0
space   70.5G  29.0G268  0  8.71M  0
space   70.5G  29.0G196  0  11.3M  0
space   70.5G  29.0G171  0  11.0M  0
space   70.5G  29.0G182  0  5.01M  0
space   70.5G  29.0G273  0  9.71M  0
space   70.5G  29.0G292  0  8.91M  0
space   70.5G  29.0G279  0  15.4M  0
space   70.5G  29.0G219  0  11.3M  0
space   70.5G  29.0G175  0  8.67M  0

2 jobs
==
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
space   70.5G  29.0G381  0  23.8M  0
space   70.5G  29.0G422  0  28.0M  0
space   70.5G  29.0G386  0  26.5M  0
space   70.5G  29.0G380  0  22.9M  0
space   70.5G  29.0G411  0  18.8M  0
space   70.5G  29.0G393  0  20.7M  0
space   70.5G  29.0G302  0  15.0M  0
space   70.5G  29.0G267  0  15.6M  0
space   70.5G  29.0G304  0  18.7M  0
space   70.5G  29.0G534  0  19.7M  0
space   70.5G  29.0G339  0  17.0M  0

3 jobs
==
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
space   70.5G  29.0G530  0  22.9M  0
space   70.5G  29.0G428  0  16.3M  0
space   70.5G  29.0G439  0  16.4M  0
space   70.5G  29.0G511  0  22.1M  0
space   70.5G  29.0G464  0  17.9M  0
space   70.5G  29.0G371  0  12.1M  0
space   70.5G  29.0G447  0  16.5M  0
space   70.5G  29.0G379  0  15.5M  0

4jobs
==
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
space   70.5G  29.0G434  0  22.0M  0
space   70.5G  29.0G506  0  29.5M  0
space   70.5G  29.0G424  0  21.3M  0
space   70.5G  29.0G643  0  36.0M  0
space   70.5G  29.0G688  0  31.1M  0
space   70.5G  29.0G726  0  37.6M  0
space   70.5G  29.0G652  0  24.8M  0
space   70.5G  29.0G646  0  33.9M  0

5jobs
==
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
space   70.5G  29.0G629  0  31.1M  0
space   70.5G  29.0G774  0  45.8M  0
space   70.5G  29.0G815  0  39.8M  0
space   70.5G  29.0G895  0  44.4M  0
space   70.5G  29.0G800  0  48.1M  0
space   70.5G  29.0G857  0  51.8M  0
space   70.5G  29.0G725  0  47.6M  0

6jobs
==
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
space   70.5G  29.0G924  0  58.8M  0
space   70.5G  29.0G767  0  51.8M  0
space   70.5G  29.0G862  0  48.4M  0
space   70.5G  29.0G977  0  43.9M  0
space   70.5G  29.0G954  0  53.7M  0
space   70.5G  29.0G903  0  48.3M  0

# uptime
  2:19pm  up 15 min(s),  2 users,  load average: 1.44, 1.10, 0.67

26MB ( 1 minute average) on each iSCSI ethernet port

12jobs
==
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  

Re: [zfs-discuss] zfs fragmentation

2009-08-08 Thread Ed Spencer

On Fri, 2009-08-07 at 19:33, Richard Elling wrote:

 This is very unlikely to be a fragmentation problem. It is a  
 scalability problem
 and there may be something you can do about it in the short term.

You could be right.

Out test mail server consists of the exact same design, same hardware
(SUN4V)  but in a smaller configuration (less memory and 4 x 25g san
luns) has a backup/copy thoughput of 30GB/hour. Data used for testing
was copied from our production mail server.

  Adding another pool and copying all/some data over to it would only  
  a short term solution.
 
 I'll have to disagree.

What is the point of a filesystem the can grow to such a huge size and
not have functionality built in to optimize data layout?  Real world
implementations of filesystems that are intended to live for
years/decades need this functionality, don't they?

Our mail system works well, only the backup doesn't perform well.
All the features of ZFS that make reads perform well (prefetch, ARC)
have little effect.
 
We think backup is quite important. We do quite a few restores of months
old data. Snapshots help in the short term, but for longer term restores
we need to go to tape. 

Of course, as you can tell, I'm kinda stuck on this idea that file and
directory fragmentation is causing our issues with the backup. I don't
know how to analyze the pool to better understand the problem.

If we did chop the pool up into lets say 7 pools (one for each current
filesystem) then over time these 7 pools would grow and we would end up
with the same issues. Thats why it seems to me to be a short term
solution.

If our issues with zfs are scalability then you could say zfs is not
scalable. Is that true?
(It certianly is if the solution is too create more pools!).  

-- 
Ed 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs fragmentation

2009-08-08 Thread Ed Spencer

On Sat, 2009-08-08 at 09:17, Bob Friesenhahn wrote:
 Many of us here already tested our own systems and found that under 
 some conditions ZFS was offering up only 30MB/second for bulk data 
 reads regardless of how exotic our storage pool and hardware was.

Just so we are using the same units of measurements. Backup/copy
throughput on our development mail server is 8.5MB/sec. The people
running our backups would be over joyed with that performance.

However backup/copy throughput on our production mail server is 2.25
MB/sec.

The underlying disk is 15000 RPM 146GB FC drives.
Our performance may be hampered somewhat because the luns are on a
Network Appliance accessed via iSCSI, but not to the extent that we are
seeing, and it does not account for the throughput difference in the
development and production pools.

When I talk about fragmentation its not in the normal sense. I'm not
talking about blocks in a file not being sequential. I'm talking about
files in a single directory that end up spread across the entire
filesytem/pool. 

My problem right now is diagnosing the performance issues.  I can't
address them without understanding the underlying cause.  There is a
lack of tools to help in this area. There is also a lack of acceptance
that I'm actually having a problem with zfs. Its frustrating.

Anyone know how significantly increase the performance of a zfs
filesystem without causing any downtime to an Enterprise email system
used by 30,000 intolerant people, when you don't really know what is
causing the performance issues in the first place? (Yeah, it sucks to be
me!)

-- 
Ed 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs fragmentation

2009-08-08 Thread Ed Spencer

On Sat, 2009-08-08 at 08:14, Mattias Pantzare wrote:

 Your scalability problem may be in your backup solution.
We've eliminated the backup system as being involved with the
performance issues. 

The servers are Solaris 10 with the OS on UFS filesystems. (In zfs
terms, the pool is old/mature). Solaris has been patched to a fairly
current level.  

Copying data from the zfs filesystem to the local ufs filesystem enjoys
the same throughput as the backup system. 

The test was simple. Create a test filesystem on the zfs pool. Restore
production email data to it. Reboot the server. Backup the data (29
minutes for a 15.8 gig of data). Reboot the server. Copy data from zfs
to ufs using a 'cp -pr ...' command, which also took 29 minutes. 

And if anyone is interested it only took 15 minutes to restore (write) 
the 15.8GB of data over the network. 

-- 
Ed 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs fragmentation

2009-08-08 Thread Ed Spencer

On Sat, 2009-08-08 at 15:12, Mike Gerdts wrote:

 The DBA's that I know use files that are at least hundreds of
 megabytes in size.  Your problem is very different.
Yes, definitely. 

I'm relating records in a table to my small files because our email
system treats the filesystem as a database.

And in the back of my mind I'm also thinking that you have to
rebuild/repair the database once in a while to improve performance.

And in my case, since the filesystem is the database, I want to do that
to zfs! 

At least thats what I'm thinking, however, and I always come back to
this, I'm not certian what is causing my problem. I need certainty
before taking action on the production system. 

-- 
Ed 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs fragmentation

2009-08-08 Thread Ed Spencer

On Sat, 2009-08-08 at 15:20, Bob Friesenhahn wrote:

 A SSD slog backed by a SAS 15K JBOD array should perform much better 
 than a big iSCSI LUN.

Now...yes. We implemented this pool years ago. I believe, then, the
server would crash if you had a zfs drive fail. We decided to let the
netapp handle the disk redundency. Its worked out well. 

I've looked at those really nice Sun products adoringly. And a 7000
series appliance would also be a nice addition to our central NFS
service. Not to mention more cost effective than expanding our Network
Appliance (We have researchers who are quite hungry for storage and NFS
is always our first choice).

We now have quite an investment in the current implementation. Its
difficult to move away from. The netapp is quite a reliable product.

We are quite happy with zfs and our implementation. We just need to
address our backup performance and  improve it just a little bit!

We were almost lynched this spring because we encountered some pretty
severe zfs bugs. We are still running the IDR named A wad of ZFS bug
fixes for Solaris 10 Update 6. It took over a month to resolve the
issues.

I work at a University and Final Exams and year end occur at the same
time. I don't recommend having email problems during this time! People
are intolerant to email problems.

I live in hope that a Netapp OS update, or a solaris patch, or a zfs
patch, or a iscsi patch , or something will come along that improves our
performance just a bit so our backup people get off my back!

-- 
Ed 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs fragmentation

2009-08-08 Thread Ed Spencer

On Sat, 2009-08-08 at 15:05, Mike Gerdts wrote:
 On Sat, Aug 8, 2009 at 12:51 PM, Ed Spencered_spen...@umanitoba.ca wrote:
 
  On Sat, 2009-08-08 at 09:17, Bob Friesenhahn wrote:
  Many of us here already tested our own systems and found that under
  some conditions ZFS was offering up only 30MB/second for bulk data
  reads regardless of how exotic our storage pool and hardware was.
 
  Just so we are using the same units of measurements. Backup/copy
  throughput on our development mail server is 8.5MB/sec. The people
  running our backups would be over joyed with that performance.
 
  However backup/copy throughput on our production mail server is 2.25
  MB/sec.
 
  The underlying disk is 15000 RPM 146GB FC drives.
  Our performance may be hampered somewhat because the luns are on a
  Network Appliance accessed via iSCSI, but not to the extent that we are
  seeing, and it does not account for the throughput difference in the
  development and production pools.
 
 NetApp filers run WAFL - Write Anywhere File Layout.  Even if ZFS
 arranged everything perfrectly (however that is defined) WAFL would
 undo its hard work.
 
 Since you are using iSCSI, I assume that you have disabled the Nagle
 algorithm and increased  tcp_xmit_hiwat and tcp_recv_hiwat.  If not,
 go do that now.
We've tried many different iscsi parameter changes on our development server:
Jumbo Frames
Disabling the Nagle
I'll double check next week on tcp_xmit_hiwat and tcp_recv_hiwat.

Nothing has made any real difference. 
We are only using about 5% of the bandwidth on our IPSan.

We use two cisco ethernet switches on the IPSAN. The iscsi initiators
use MPXIO in a round robin configuration.  

  When I talk about fragmentation its not in the normal sense. I'm not
  talking about blocks in a file not being sequential. I'm talking about
  files in a single directory that end up spread across the entire
  filesytem/pool.
 
 It's tempting to think that if the files were in roughly the same area
 of the block device that ZFS sees that reading the files sequentially
 would at least trigger a read-ahead at the filer.  I suspect that even
 a moderate amount of file creation and deletion would cause the I/O
 pattern to be random enough (not purely sequential) that the back-end
 storage would not have a reasonable chance of recognizing it as a good
 time for read-ahead.  Further, since the backup application is
 probably in a loop of:
 
 while there are more files in the directory
if next file mtime  last backup time
open file
read file contents, send to backup stream
close file
 end if
 end while
 
 In other words, other I/O operations are interspersed between the
 sequential data reads, some files are likely to be skipped, and there
 is latency introduced by writing to the data stream.  I would be
 surprised to see any file system do intelligent read-ahead here.  In
 other words, lots of small file operations make backups and especially
 restores go slowly.  More backup and restore streams will almost
 certainly help.  Multiplex the streams so that you can keep your tapes
 moving at a constant speed.

We backup to disk first and then put to tape later.

 Do you have statistics on network utilization to ensure that you
 aren't stressing it?
 
 Have you looked at iostat data to be sure that you are seeing asvc_t +
 wsvc_t that supports the number of operations that you need to
 perform?  That is if asvc_t + wsvc_t for a device adds up to 10 ms, a
 workload that waits for the completion of one I/O before issuing the
 next will max out at 100 iops.  Presumably ZFS should hide some of
 this from you[1], but it does suggest that each backup stream would be
 limited to about 100 files per second[2].  This is because the read
 request for one file does not happen before the close of the previous
 file[3].  Since cyrus stores each message as a separate file, this
 suggests that 2.5 MB/s corresponds to average mail message size of 25
 KB.
 
 1. via metadata caching, read-ahead on file data reads, etc.
 2. Assuming wsvc_t + asvc_t = 10 ms
 3. Assuming that networker is about as smart as tar, zip, cpio, etc.

There is a backup of a single filesystem in the pool going on right now:
# zpool iostat 5 5
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
space   1.05T   965G 97 69  5.24M  2.71M
space   1.05T   965G113 10  6.41M   996K
space   1.05T   965G100112  2.87M  1.81M
space   1.05T   965G112  8  2.35M  35.9K
space   1.05T   965G106  3  1.76M  55.1K

Here are examples :
iostat -xpn 5 5
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device

   17.1   29.2  746.7  317.1  0.0  0.60.0   12.5   0  27
c4t60A98000433469764E4A2D456A644A74d0
   25.0   11.9  991.9  277.0  0.0  0.60.0   16.1   0  36

Re: [zfs-discuss] Fwd: zfs fragmentation

2009-08-08 Thread Ed Spencer

On Sat, 2009-08-08 at 17:25, Mike Gerdts wrote:

 ndd -get /dev/tcp tcp_xmit_hiwat
 ndd -get /dev/tcp tcp_recv_hiwat
 grep tcp-nodelay /kernel/drv/iscsi.conf
# ndd -get /dev/tcp tcp_xmit_hiwat
2097152
# ndd -get /dev/tcp tcp_recv_hiwat
2097152
# grep tcp-nodelay /kernel/drv/iscsi.conf
#

 While backups are running (which is probably all the time given the
 backup rate)
 
 # look at service times
 iostat -xzn 10
Oh crap. Look like there are no backup jobs running right now. It must
have just ended.
 # is networker cpu bound?
No. The server is barely tasked by either the email system or networker.
 prstat -mL
 Some indication of how many backup jobs run concurrently would
 probably help frame any future discussion.
I'll get more info on the backups next week when the full backups run.
 
-- 
Ed 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs fragmentation

2009-08-07 Thread Ed Spencer
Let me give a real life example of what I believe is a fragmented zfs pool.

Currently the pool is 2 terabytes in size  (55% used) and is made of 4 san luns 
(512gb each).
The pool has never gotten close to being full. We increase the size of the pool 
by adding 2 512gb luns about once a year or so.

The pool has been divided into 7 filesystems.

The pool is used for imap email data. The email system (cyrus) has 
approximately 80,000 accounts all located within the pool, evenly distributed 
between the filesystems.

Each account has a directory associated with it. This directory is the users 
inbox. Additional mail folders are subdirectories. Mail is stored as individual 
files.

We receive mail at a rate of 0-20MB/Second, every minute of every  hour of 
every day of every week, etc etc.

Users recieve mail constantly over time. They read it and then either delete it 
or store it in a subdirectory/folder.

I imagine that my mail (located in a single subdirectory structure) is spread 
over the entire pool because it has been received over time. I believe the data 
is highly fragmented (from a file and directory perspective).

The result of this is that backup thoughput of a single filesystem in this pool 
is about 8GB/hour.
We use EMC networker for backups.
  
This is a problem. There are no utilities available to evaluate this type of 
fragmentation.  
There are no utilities to fix it.

ZFS, from the mail system perspective works great. 
Writes and random reads operate well.

Backup is a problem and not just because of small files, but small files 
scatterred over the entire pool. 

Adding another pool and copying all/some data over to it would only a short 
term solution.

I believe zfs needs a feature that operates in the background and defrags the 
pool to optimize sequential reads of the file and directory structure.

Ed
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Split responsibility for data with ZFS

2008-12-12 Thread Ed Spencer
I find this thread both interesting and disturbing. I'm fairly new to
this list so please excuse me if my comments/opinions are simplistic or
just incorrect.

I think there's been to much FC SAN bashing so let me change the
example.

What if you buy a 7000 Series server (complete with zfs) and setup an IP
SAN. You create a LUN and share it out to a Solaris 10 host.
On the solaris host you create a ZFS pool with that iscsi LUN.

Now my undersatnding is that you will not be able to correct errors on
the zpool of the Solaris10 machine because zfs on the solaris 10 machine
is not doing the raid.

Another example would be if you were sharing out a lun to a vmware
server, from your iscsi san or fc san, and creating solaris 10 virtual
machines, with zfs booting.

Another example would be Solaris 10 booting a zfs filesystem from a
hardware mirrored pair of drives.

Now these are examples of standard implementations of machines in a
datacenter, specifically ones I have installed.

From following this thread I now feel that if I have uncorrectable data
errors on the zfs pools there will be no way to easily repair the pool.

I see no reason that if I do detect errors as I scrub the zfs pool that
I should be able to run a simple utility to fix the pools as I would a
ufs filesystem and then recover the corrupted files from tape.

I believe that for zfs to be used as a general purpose filesystem that
there has to be support built into zfs to support these standard data
center implementations, otherwise it will just become a specialized
filesystem, like Netapp's WAFL, and there are alot more servers than
storage appliances in the datacenter.

I think this thread has put zfs in a negative light.  I don't actually
believe that I will experience many of these problems in an Enterprise
class data center, but still I don't look forward to having to deal with
the consequences of encountering these types of problems.

Maybe zfs is not ready to be considered a general purpose filesystem.

--
Ed Spencer

 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS+NFS4 strange timestamps on file creation

2008-12-04 Thread Ed Spencer
Yes, I've seen them on nfs filesystems on solaris10 using a Netapp nfs
server.
Here's a link to a solution that I just implemented on a solaris10
server:
https://equoria.net/index.php/Value_too_large_for_defined_data_type

On Thu, 2008-12-04 at 15:31, Scott Williamson wrote:
 Has anyone seen files created on a linux client with negative or zero
 creation timestamps on zfs+nfs exported datasets?
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-- 
Ed Spencer  http://home.cc.umanitoba.ca/~fastedy
UNIX System Administrator   Academic Computing and Networking
EMail: [EMAIL PROTECTED]  The University of Manitoba
Telephone: (204) 474-8311   Winnipeg, Manitoba, Canada R3T 2N2

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss