Hello zfs-discuss,

I've just did a quick test on scsi cache flushes performance impact on
6540 disk array when using ZFS.

The configuration is: v490, S10U5 (137111-03), 2x 6540 disk
arrays with 7.10.25.10 firmware, host is dual ported. ZFS does
mirroring between 6540s. There is no other load on 6540 except for
these testing. Each LUN is a RAID-10 made of many disks on each 6540
(doesn't really matter).


 zpool status dslrp
  pool: dslrp
 state: ONLINE
 scrub: scrub completed with 0 errors on Tue Aug  5 10:49:28 2008
config:

        NAME                                       STATE     READ WRITE CKSUM
        dslrp                                      ONLINE       0     0     0
          mirror                                   ONLINE       0     0     0
            c6t600A0B800029B74600004245486B68EBd0  ONLINE       0     0     0
            c6t600A0B800029AF0000006CCC486B36ABd0  ONLINE       0     0     0

errors: No known data errors


I wrote a simple C program to perform in a loop (N times) creation of
a file with O_DSYNC flag (synchronous writes), writing 255 bytes of
data, closing file and removing it. Then I measured total execution
time while disabling or enabling scsi cache flushes in zfs. The source
code for the program is attached at the end of this post.




# echo "zfs_nocacheflush/D" | mdb -k
zfs_nocacheflush:
zfs_nocacheflush:               1


# ptime ./filesync-1 /dslrp/test/ 100000
Time in seconds to create and unlink 100000 files with O_DSYNC: 44.947136


real       44.950
user        0.349
sys        18.188


# dtrace -n fbt::*SYNCHRONIZE_CACHE*:entry'[EMAIL PROTECTED]();}' -n 
tick-1s'{printa(@);clear(@);}'
[not a single synchronize which is expected]


# iostat -xnzC 1
[...]
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0 4297.0    0.0 17236.1  0.0  1.0    0.0    0.2   0  98 c6
    0.0 2149.5    0.0 8622.0  0.0  0.5    0.0    0.2   0  50 
c6t600A0B800029AF0000006CCC486B36ABd0
    0.0 2147.5    0.0 8614.0  0.0  0.5    0.0    0.2   1  49 
c6t600A0B800029B74600004245486B68EBd0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0 3892.9    0.0 15571.7  0.0  1.0    0.0    0.3   0  98 c6
    0.0 1946.0    0.0 7783.9  0.0  0.6    0.0    0.3   0  57 
c6t600A0B800029AF0000006CCC486B36ABd0
    0.0 1947.0    0.0 7787.9  0.0  0.4    0.0    0.2   0  41 
c6t600A0B800029B74600004245486B68EBd0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0 4548.1    0.0 18192.3  0.0  1.0    0.0    0.2   0  97 c6
    0.0 2274.0    0.0 9096.2  0.0  0.5    0.0    0.2   0  51 
c6t600A0B800029AF0000006CCC486B36ABd0
    0.0 2274.0    0.0 9096.2  0.0  0.5    0.0    0.2   1  46 
c6t600A0B800029B74600004245486B68EBd0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0 4632.8    0.0 18620.4  0.0  1.1    0.0    0.2   0  99 c6
    0.0 2316.9    0.0 9310.2  0.0  0.5    0.0    0.2   1  47 
c6t600A0B800029AF0000006CCC486B36ABd0
    0.0 2315.9    0.0 9310.2  0.0  0.6    0.0    0.2   1  52 
c6t600A0B800029B74600004245486B68EBd0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0 4610.2    0.0 18150.6  0.0  1.0    0.0    0.2   0  97 c6
    0.0 2304.1    0.0 9075.3  0.0  0.5    0.0    0.2   0  52 
c6t600A0B800029AF0000006CCC486B36ABd0
    0.0 2306.1    0.0 9075.3  0.0  0.5    0.0    0.2   1  45 
c6t600A0B800029B74600004245486B68EBd0
[...]



Now lets repeat the same test but with ZFS sending scsi flushes.


# echo "zfs_nocacheflush/W0" | mdb -kw
zfs_nocacheflush:               0x1             =       0x0

# ptime ./filesync-1 /dslrp/test/ 100000
Time in seconds to create and unlink 100000 files with O_DSYNC: 53.809971


real       53.813
user        0.351
sys        22.107


# dtrace -n fbt::*SYNCHRONIZE_CACHE*:entry'[EMAIL PROTECTED]();}' -n 
tick-1s'{printa(@);clear(@);}'
[...]
CPU     ID                    FUNCTION:NAME
  3  93193                         :tick-1s
             6000

  3  93193                         :tick-1s
             7244

  3  93193                         :tick-1s
             6172

  3  93193                         :tick-1s
             7172
[...]

So now we are sending thousands of cache flushes per second as
expected.

# iostat -xnzC 1
[...]
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0 8386.6    0.0 17050.2  0.0  1.0    0.0    0.1   0  84 c6
    0.0 4191.3    0.0 8525.1  0.0  0.5    0.0    0.1   0  43 
c6t600A0B800029AF0000006CCC486B36ABd0
    0.0 4195.3    0.0 8525.1  0.0  0.5    0.0    0.1   0  41 
c6t600A0B800029B74600004245486B68EBd0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0 7476.5    0.0 14953.0  0.0  0.9    0.0    0.1   0  87 c6
    0.0 3738.3    0.0 7476.5  0.0  0.4    0.0    0.1   0  44 
c6t600A0B800029AF0000006CCC486B36ABd0
    0.0 3738.3    0.0 7476.5  0.0  0.4    0.0    0.1   0  42 
c6t600A0B800029B74600004245486B68EBd0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0 8146.9    0.0 16345.8  0.0  0.8    0.0    0.1   0  83 c6
    0.0 4073.4    0.0 8172.9  0.0  0.4    0.0    0.1   0  42 
c6t600A0B800029AF0000006CCC486B36ABd0
    0.0 4073.4    0.0 8172.9  0.0  0.4    0.0    0.1   0  41 
c6t600A0B800029B74600004245486B68EBd0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0 7646.4    0.0 15288.8  0.0  0.8    0.0    0.1   0  80 c6
    0.0 3823.2    0.0 7644.4  0.0  0.4    0.0    0.1   0  40 
c6t600A0B800029AF0000006CCC486B36ABd0
    0.0 3823.2    0.0 7644.4  0.0  0.4    0.0    0.1   0  40 
c6t600A0B800029B74600004245486B68EBd0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0 6620.2    0.0 13240.4  0.0  0.8    0.0    0.1   0  81 c6
    0.0 3310.1    0.0 6620.2  0.0  0.4    0.0    0.1   0  41 
c6t600A0B800029AF0000006CCC486B36ABd0
    0.0 3310.1    0.0 6620.2  0.0  0.4    0.0    0.1   0  40 
c6t600A0B800029B74600004245486B68EBd0
[...]


Looks like iostat is counting also scsi cache flushes - that's why we
are doing more IOPS, avg IO time is lower (each scsi flush is a NOP so
it is lowering avg time).


Total execution time for the test with scsi flushes enabled in zfs
takes about 20% more time to complete than zfs with scsi cache flushes
disabled. Looks like 6540 disk array with firmware 7.10.25.10 does
actually ignore scsi flushes. Last time I checked on 2530 array the difference 
is
about 7x not just 20%.

IIRC someone reported that 6540 performance was badly hit by zfs
sending scsi cache flushes. Looks like it's been fixed
in latest firmware recently released so in most environments there is's no much 
point in
disabling cache flushes anymore on 6540+zfs.

Although there is still 20% hit please keep in mind that the program I
used for testing is rather on an extreme side and in most environments
the performance overhead should be much smaller than 20%. Nevertheless
if you want to squeeze every bit of performance from zfs+6540 (or
similar array with nvram)disabling cache flushes won't hurt.



Just out of curiosity let's disable zil (zil_disable=1):

# ptime ./filesync-1 /dslrp/test/ 100000
Time in seconds to create and unlink 100000 files with O_DSYNC: 12.130675


real       12.134
user        0.328
sys        11.805

# ptime ./filesync-1 /tmp 100000
Time in seconds to create and unlink 100000 files with O_DSYNC: 3.098279


real        3.101
user        0.291
sys         2.809

That's interesting - tmpfs is 4 times faster then ZFS with ZIL
disabled (basically everything happens in memory) in that test -
disabling checksums for the test filesystem doesn't change it. Maybe
it is mostly due to metadata checksums?





// simple and ugly program

# cat filesync-1.c

#include <stdio.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/nameser.h>
#include <resolv.h>
#include <netdb.h>
#include <sys/time.h>
#include <inttypes.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

int main(int argc, char *argv[])
{
  char *path;
  char filename[256];
  int n_req;
  int i, fd;
  hrtime_t stopper_start, stopper_stop;

  if(argc != 3)
  {
    printf("Usage: %s path N\n", argv[0]);
    exit(1);
  }

  path = argv[1];
  n_req = atoi(argv[2]);

  stopper_start = gethrtime();

  i=0;
  while (i++ < n_req)
  {
    strcpy(filename, path);
    strcat(filename, "/filesync_test");
    fd = open(filename, O_CREAT|O_RDWR|O_DSYNC);
    write (fd, filename, 255);
    close(fd);
    unlink(filename);
  }

  stopper_stop = gethrtime();

  printf("Time in seconds to create and unlink %d files with O_DSYNC: %f\n\n", 
n_req, (float)(stopper_stop - stopper_start)/1000000000);

  exit(0);
}



-- 
Best regards,
 Robert Milkowski                          mailto:[EMAIL PROTECTED]
                                     http://milek.blogspot.com

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to