Re: [zfs-discuss] ZFS/WAFL lawsuit

2007-09-06 Thread johansen-osdev
It's Columbia Pictures vs. Bunnell:

http://www.eff.org/legal/cases/torrentspy/columbia_v_bunnell_magistrate_order.pdf

The Register syndicated a Security Focus article that summarizes the
potential impact of the court decision:

http://www.theregister.co.uk/2007/08/08/litigation_data_retention/


-j

On Thu, Sep 06, 2007 at 08:14:56PM +0200, [EMAIL PROTECTED] wrote:
 
 
 It really is a shot in the dark at this point, you really never know what
 will happen in court (take the example of the recent court decision that
 all data in RAM be held for discovery ?!WHAT, HEAD HURTS!?).  But at the
 end of the day,  if you waited for a sure bet on any technology or
 potential patent disputes you would not implement anything, ever.
 
 
 Do you have a reference for all data in RAM most be held.  I guess we
 need to build COW RAM as well.
 
 Casper
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Extremely long creat64 latencies on higly utilized zpools

2007-08-15 Thread johansen-osdev
You might also consider taking a look at this thread:

http://mail.opensolaris.org/pipermail/zfs-discuss/2007-July/041760.html

Although I'm not certain, this sounds a lot like the other pool
fragmentation issues.

-j

On Wed, Aug 15, 2007 at 01:11:40AM -0700, Yaniv Aknin wrote:
 Hello friends,
 
 I've recently seen a strange phenomenon with ZFS on Solaris 10u3, and was 
 wondering if someone may have more information.
 
 The system uses several zpools, each a bit under 10T, each containing one zfs 
 with lots and lots of small files (way too many, about 100m files and 75m 
 directories).
 
 I have absolutely no control over the directory structure and believe me I 
 tried to change it.
 
 Filesystem usage patterns are create and read, never delete and never rewrite.
 
 When volumes approach 90% usage, and under medium/light load (zpool iostat 
 reports 50mb/s and 750iops reads), some creat64 system calls take over 50 
 seconds to complete (observed with 'truss -D touch'). When doing manual 
 tests, I've seen similar times on unlink() calls (truss -D rm). 
 
 I'd like to stress this happens on /some/ of the calls, maybe every 100th 
 manual call (I scripted the test), which (along with normal system 
 operations) would probably be every 10,000th or 100,000th call.
 
 Other system parameters (memory usage, loadavg, process number, etc) appear 
 nominal. The machine is an NFS server, though the crazy latencies were 
 observed both local and remote.
 
 What would you suggest to further diagnose this? Has anyone seen trouble with 
 high utilization and medium load? (with or without insanely high filecount?)
 
 Many thanks in advance,
  - Yaniv
  
  
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] is send/receive incremental

2007-08-08 Thread johansen-osdev
You can do it either way.  Eric Kustarz has a good explanation of how to
set up incremental send/receive on your laptop.  The description is on
his blog:

http://blogs.sun.com/erickustarz/date/20070612

The technique he uses is applicable to any ZFS filesystem.

-j

On Wed, Aug 08, 2007 at 04:44:16PM -0600, Peter Baumgartner wrote:
 
I'd like to send a backup of my filesystem offsite nightly using zfs
send/receive. Are those done incrementally so only changes move or
would a full copy get shuttled across everytime?
--
Pete

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] si3124 controller problem and fix (fwd)

2007-07-17 Thread johansen-osdev
In an attempt to speed up progress on some of the si3124 bugs that Roger
reported, I've created a workspace with the fixes for:

   6565894 sata drives are not identified by si3124 driver
   6566207 si3124 driver loses interrupts.

I'm attaching a driver which contains these fixes as well as a diff of
the changes I used to produce them.

I don't have access to a si3124 chipset, unfortunately.

Would somebody be able to review these changes and try the new driver on
a si3124 card?

Thanks,

-j

On Tue, Jul 17, 2007 at 02:39:00AM -0700, Nigel Smith wrote:
 You can see the  status of bug here:
 
 http://bugs.opensolaris.org/view_bug.do?bug_id=6566207
 
 Unfortunately, it's showing no progress since 20th June.
 
 This fix really could do to be in place for S10u4 and snv_70.
 Thanks
 Nigel Smith
  
  
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


si3124.tar.gz
Description: application/tar-gz

--- usr/src/uts/common/io/sata/adapters/si3124/si3124.c ---

Index: usr/src/uts/common/io/sata/adapters/si3124/si3124.c
--- /ws/onnv-clone/usr/src/uts/common/io/sata/adapters/si3124/si3124.c  Mon Nov 
13 23:20:01 2006
+++ 
/export/johansen/si-fixes/usr/src/uts/common/io/sata/adapters/si3124/si3124.c   
Tue Jul 17 14:37:17 2007
@@ -22,11 +22,11 @@
 /*
  * Copyright 2006 Sun Microsystems, Inc.  All rights reserved.
  * Use is subject to license terms.
  */
 
-#pragma ident  @(#)si3124.c   1.4 06/11/14 SMI
+#pragma ident  @(#)si3124.c   1.5 07/07/17 SMI
 
 
 
 /*
  * SiliconImage 3124/3132 sata controller driver
@@ -381,11 +381,11 @@
 
 extern struct mod_ops mod_driverops;
 
 static  struct modldrv modldrv = {
mod_driverops, /* driverops */
-   si3124 driver v1.4,
+   si3124 driver v1.5,
sictl_dev_ops, /* driver ops */
 };
 
 static  struct modlinkage modlinkage = {
MODREV_1,
@@ -2808,10 +2808,13 @@
si_portp = si_ctlp-sictl_ports[port];
mutex_enter(si_portp-siport_mutex);
 
/* Clear Port Reset. */
ddi_put32(si_ctlp-sictl_port_acc_handle,
+   (uint32_t *)PORT_CONTROL_SET(si_ctlp, port),
+   PORT_CONTROL_SET_BITS_PORT_RESET);
+   ddi_put32(si_ctlp-sictl_port_acc_handle,
(uint32_t *)PORT_CONTROL_CLEAR(si_ctlp, port),
PORT_CONTROL_CLEAR_BITS_PORT_RESET);
 
/*
 * Arm the interrupts for: Cmd completion, Cmd error,
@@ -3509,16 +3512,16 @@
port);
 
if (port_intr_status  INTR_COMMAND_COMPLETE) {
(void) si_intr_command_complete(si_ctlp, si_portp,
port);
-   }
-
+   } else {
/* Clear the interrupts */
ddi_put32(si_ctlp-sictl_port_acc_handle,
(uint32_t *)(PORT_INTERRUPT_STATUS(si_ctlp, port)),
port_intr_status  INTR_MASK);
+   }
 
/*
 * Note that we did not clear the interrupt for command
 * completion interrupt. Reading of slot_status takes care
 * of clearing the interrupt for command completion case.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: [storage-discuss] NCQ performance

2007-05-29 Thread johansen-osdev
 When sequential I/O is done to the disk directly there is no performance
 degradation at all.  

All filesystems impose some overhead compared to the rate of raw disk
I/O.  It's going to be hard to store data on a disk unless some kind of
filesystem is used.  All the tests that Eric and I have performed show
regressions for multiple sequential I/O streams.  If you have data that
shows otherwise, please feel free to share.

 [I]t does not take any additional time in ldi_strategy(),
 bdev_strategy(), mv_rw_dma_start().  In some instance it actually
 takes less time.   The only thing that sometimes takes additional time
 is waiting for the disk I/O.

Let's be precise about what was actually observed.  Eric and I saw
increased service times for the I/O on devices with NCQ enabled when
running multiple sequential I/O streams.  Everything that we observed
indicated that it actually took the disk longer to service requests when
many sequential I/Os were queued.

-j


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?

2007-05-16 Thread johansen-osdev
 *sata_hba_list::list sata_hba_inst_t satahba_next | ::print 
 sata_hba_inst_t satahba_dev_port | ::array void* 32 | ::print void* | 
 ::grep .!=0 | ::print sata_cport_info_t cport_devp.cport_sata_drive | 
 ::print -a sata_drive_info_t satadrv_features_support satadrv_settings 
 satadrv_features_enabled

 This gives me mdb: failed to dereference symbol: unknown symbol
 name. 

You may not have the SATA module installed.  If you type:

::modinfo !  grep sata

and don't get any output, your sata driver is attached some other way.

My apologies for the confusion.

-K
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Lots of overhead with ZFS - what am I doing wrong?

2007-05-16 Thread johansen-osdev
At Matt's request, I did some further experiments and have found that
this appears to be particular to your hardware.  This is not a general
32-bit problem.  I re-ran this experiment on a 1-disk pool using a 32
and 64-bit kernel.  I got identical results:

64-bit
==

$ /usr/bin/time dd if=/testpool1/filebench/testfile of=/dev/null bs=128k
count=1
1+0 records in
1+0 records out

real   20.1
user0.0
sys 1.2

62 Mb/s

# /usr/bin/time dd if=/dev/dsk/c1t3d0 of=/dev/null bs=128k count=1
1+0 records in
1+0 records out

real   19.0
user0.0
sys 2.6

65 Mb/s

32-bit
==

/usr/bin/time dd if=/testpool1/filebench/testfile of=/dev/null bs=128k
count=1
1+0 records in
1+0 records out

real   20.1
user0.0
sys 1.7

62 Mb/s

# /usr/bin/time dd if=/dev/dsk/c1t3d0 of=/dev/null bs=128k count=1
1+0 records in
1+0 records out

real   19.1
user0.0
sys 4.3

65 Mb/s

-j

On Wed, May 16, 2007 at 09:32:35AM -0700, Matthew Ahrens wrote:
 Marko Milisavljevic wrote:
 now lets try:
 set zfs:zfs_prefetch_disable=1
 
 bingo!
 
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  609.00.0 77910.00.0  0.0  0.80.01.4   0  83 c0d0
 
 only 1-2 % slower then dd from /dev/dsk. Do you think this is general
 32-bit problem, or specific to this combination of hardware?
 
 I suspect that it's fairly generic, but more analysis will be necessary.
 
 Finally, should I file a bug somewhere regarding prefetch, or is this
 a known issue?
 
 It may be related to 6469558, but yes please do file another bug report. 
  I'll have someone on the ZFS team take a look at it.
 
 --matt
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Lots of overhead with ZFS - what am I doing wrong?

2007-05-16 Thread johansen-osdev
Marko,
Matt and I discussed this offline some more and he had a couple of ideas
about double-checking your hardware.

It looks like your controller (or disks, maybe?) is having trouble with
multiple simultaneous I/Os to the same disk.  It looks like prefetch
aggravates this problem.

When I asked Matt what we could do to verify that it's the number of
concurrent I/Os that is causing performance to be poor, he had the
following suggestions:

set zfs_vdev_{min,max}_pending=1 and run with prefetch on, then
iostat should show 1 outstanding io and perf should be good.

or turn prefetch off, and have multiple threads reading
concurrently, then iostat should show multiple outstanding ios
and perf should be bad.

Let me know if you have any additional questions.

-j

On Wed, May 16, 2007 at 11:38:24AM -0700, [EMAIL PROTECTED] wrote:
 At Matt's request, I did some further experiments and have found that
 this appears to be particular to your hardware.  This is not a general
 32-bit problem.  I re-ran this experiment on a 1-disk pool using a 32
 and 64-bit kernel.  I got identical results:
 
 64-bit
 ==
 
 $ /usr/bin/time dd if=/testpool1/filebench/testfile of=/dev/null bs=128k
 count=1
 1+0 records in
 1+0 records out
 
 real   20.1
 user0.0
 sys 1.2
 
 62 Mb/s
 
 # /usr/bin/time dd if=/dev/dsk/c1t3d0 of=/dev/null bs=128k count=1
 1+0 records in
 1+0 records out
 
 real   19.0
 user0.0
 sys 2.6
 
 65 Mb/s
 
 32-bit
 ==
 
 /usr/bin/time dd if=/testpool1/filebench/testfile of=/dev/null bs=128k
 count=1
 1+0 records in
 1+0 records out
 
 real   20.1
 user0.0
 sys 1.7
 
 62 Mb/s
 
 # /usr/bin/time dd if=/dev/dsk/c1t3d0 of=/dev/null bs=128k count=1
 1+0 records in
 1+0 records out
 
 real   19.1
 user0.0
 sys 4.3
 
 65 Mb/s
 
 -j
 
 On Wed, May 16, 2007 at 09:32:35AM -0700, Matthew Ahrens wrote:
  Marko Milisavljevic wrote:
  now lets try:
  set zfs:zfs_prefetch_disable=1
  
  bingo!
  
 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   609.00.0 77910.00.0  0.0  0.80.01.4   0  83 c0d0
  
  only 1-2 % slower then dd from /dev/dsk. Do you think this is general
  32-bit problem, or specific to this combination of hardware?
  
  I suspect that it's fairly generic, but more analysis will be necessary.
  
  Finally, should I file a bug somewhere regarding prefetch, or is this
  a known issue?
  
  It may be related to 6469558, but yes please do file another bug report. 
   I'll have someone on the ZFS team take a look at it.
  
  --matt
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?

2007-05-15 Thread johansen-osdev
 Each drive is freshly formatted with one 2G file copied to it. 

How are you creating each of these files?

Also, would you please include a the output from the isalist(1) command?

 These are snapshots of iostat -xnczpm 3 captured somewhere in the
 middle of the operation.

Have you double-checked that this isn't a measurement problem by
measuring zfs with zpool iostat (see zpool(1M)) and verifying that
outputs from both iostats match?

 single drive, zfs file
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  258.30.0 33066.60.0 33.0  2.0  127.77.7 100 100 c0d1
 
 Now that is odd. Why so much waiting? Also, unlike with raw or UFS, kr/s /
 r/s gives 256K, as I would imagine it should.

Not sure.  If we can figure out why ZFS is slower than raw disk access
in your case, it may explain why you're seeing these results.

 What if we read a UFS file from the PATA disk and ZFS from SATA:
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  792.80.0 44092.90.0  0.0  1.80.02.2   1  98 c1d0
  224.00.0 28675.20.0 33.0  2.0  147.38.9 100 100 c0d0
 
 Now that is confusing! Why did SATA/ZFS slow down too? I've retried this a
 number of times, not a fluke.

This could be cache interference.  ZFS and UFS use different caches.

How much memory is in this box?

 I have no idea what to make of all this, except that it ZFS has a problem
 with this hardware/drivers that UFS and other traditional file systems,
 don't. Is it a bug in the driver that ZFS is inadvertently exposing? A
 specific feature that ZFS assumes the hardware to have, but it doesn't? Who
 knows!

This may be a more complicated interaction than just ZFS and your
hardware.  There are a number of layers of drivers underneath ZFS that
may also be interacting with your hardware in an unfavorable way.

If you'd like to do a little poking with MDB, we can see the features
that your SATA disks claim they support.

As root, type mdb -k, and then at the  prompt that appears, enter the
following command (this is one very long line):

*sata_hba_list::list sata_hba_inst_t satahba_next | ::print sata_hba_inst_t 
satahba_dev_port | ::array void* 32 | ::print void* | ::grep .!=0 | ::print 
sata_cport_info_t cport_devp.cport_sata_drive | ::print -a sata_drive_info_t 
satadrv_features_support satadrv_settings satadrv_features_enabled

This should show satadrv_features_support, satadrv_settings, and
satadrv_features_enabled for each SATA disk on the system.

The values for these variables are defined in:

http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/sys/sata/impl/sata.h

this is the relevant snippet for interpreting these values:

/*
 * Device feature_support (satadrv_features_support)
 */
#define SATA_DEV_F_DMA  0x01
#define SATA_DEV_F_LBA280x02
#define SATA_DEV_F_LBA480x04
#define SATA_DEV_F_NCQ  0x08
#define SATA_DEV_F_SATA10x10
#define SATA_DEV_F_SATA20x20
#define SATA_DEV_F_TCQ  0x40/* Non NCQ tagged queuing */

/*
 * Device features enabled (satadrv_features_enabled)
 */
#define SATA_DEV_F_E_TAGGED_QING0x01/* Tagged queuing enabled */
#define SATA_DEV_F_E_UNTAGGED_QING  0x02/* Untagged queuing enabled */

/*
 * Drive settings flags (satdrv_settings)
 */
#define SATA_DEV_READ_AHEAD 0x0001  /* Read Ahead enabled */
#define SATA_DEV_WRITE_CACHE0x0002  /* Write cache ON */
#define SATA_DEV_SERIAL_FEATURES0x8000  /* Serial ATA feat.  enabled */
#define SATA_DEV_ASYNCH_NOTIFY  0x2000  /* Asynch-event enabled */

This may give us more information if this is indeed a problem with
hardware/drivers supporting the right features.

-j
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?

2007-05-14 Thread johansen-osdev
This certainly isn't the case on my machine.

$ /usr/bin/time dd if=/test/filebench/largefile2 of=/dev/null bs=128k 
count=1
1+0 records in
1+0 records out

real1.3
user0.0
sys 1.2

# /usr/bin/time dd if=/dev/dsk/c0t0d0 of=/dev/null bs=128k count=1
1+0 records in
1+0 records out

real   22.3
user0.0
sys 2.2

This looks like 56 MB/s on the /dev/dsk and 961 MB/s on the pool.

My pool is configured into a 46 disk RAID-0 stripe.  I'm going to omit
the zpool status output for the sake of brevity.

 What I am seeing is that ZFS performance for sequential access is
 about 45% of raw disk access, while UFS (as well as ext3 on Linux) is
 around 70%. For workload consisting mostly of reading large files
 sequentially, it would seem then that ZFS is the wrong tool
 performance-wise. But, it could be just my setup, so I would
 appreciate more data points.

This isn't what we've observed in much of our performance testing.
It may be a problem with your config, although I'm not an expert on
storage configurations.  Would you mind providing more details about
your controller, disks, and machine setup?

-j

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?

2007-05-14 Thread johansen-osdev
Marko,

I tried this experiment again using 1 disk and got nearly identical
times:

# /usr/bin/time dd if=/dev/dsk/c0t0d0 of=/dev/null bs=128k count=1
1+0 records in
1+0 records out

real   21.4
user0.0
sys 2.4

$ /usr/bin/time dd if=/test/filebench/testfile of=/dev/null bs=128k count=1
1+0 records in
1+0 records out

real   21.0
user0.0
sys 0.7


 [I]t is not possible for dd to meaningfully access multiple-disk
 configurations without going through the file system. I find it
 curious that there is such a large slowdown by going through file
 system (with single drive configuration), especially compared to UFS
 or ext3.

Comparing a filesystem to raw dd access isn't a completely fair
comparison either.  Few filesystems actually layout all of their data
and metadata so that every read is a completely sequential read.

 I simply have a small SOHO server and I am trying to evaluate which OS to
 use to keep a redundant disk array. With unreliable consumer-level hardware,
 ZFS and the checksum feature are very interesting and the primary selling
 point compared to a Linux setup, for as long as ZFS can generate enough
 bandwidth from the drive array to saturate single gigabit ethernet.

I would take Bart's reccomendation and go with Solaris on something like a
dual-core box with 4 disks.

 My hardware at the moment is the wrong choice for Solaris/ZFS - PCI 3114
 SATA controller on a 32-bit AthlonXP, according to many posts I found.

Bill Moore lists some controller reccomendations here:

http://mail.opensolaris.org/pipermail/zfs-discuss/2006-March/016874.html

 However, since dd over raw disk is capable of extracting 75+MB/s from this
 setup, I keep feeling that surely I must be able to get at least that much
 from reading a pair of striped or mirrored ZFS drives. But I can't - single
 drive or 2-drive stripes or mirrors, I only get around 34MB/s going through
 ZFS. (I made sure mirror was rebuilt and I resilvered the stripes.)

Maybe this is a problem with your controller?  What happens when you
have two simultaneous dd's to different disks running?  This would
simulate the case where you're reading from the two disks at the same
time.

-j

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: gzip compression throttles system?

2007-05-03 Thread johansen-osdev
A couple more questions here.

[mpstat]

 CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
   00   0 3109  3616  316  1965   17   48   45   2450  85   0  15
   10   0 3127  3797  592  2174   17   63   46   1760  84   0  15
 CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
   00   0 3051  3529  277  2012   14   25   48   2160  83   0  17
   10   0 3065  3739  606  1952   14   37   47   1530  82   0  17
 CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
   00   0 3011  3538  316  2423   26   16   52   2020  81   0  19
   10   0 3019  3698  578  2694   25   23   56   3090  83   0  17
 
 # lockstat -kIW -D 20 sleep 30
 
 Profiling interrupt: 6080 events in 31.341 seconds (194 events/sec)
 
 Count indv cuml rcnt nsec Hottest CPU+PILCaller  
 ---
  2068  34%  34% 0.00 1767 cpu[0] deflate_slow
  1506  25%  59% 0.00 1721 cpu[1] longest_match   
  1017  17%  76% 0.00 1833 cpu[1] mach_cpu_idle   
   454   7%  83% 0.00 1539 cpu[0] fill_window 
   215   4%  87% 0.00 1788 cpu[1] pqdownheap  
snip

What do you have zfs compresison set to?  The gzip level is tunable,
according to zfs set, anyway:

PROPERTY   EDIT  INHERIT   VALUES
compression YES  YES   on | off | lzjb | gzip | gzip-[1-9]

You still have idle time in this lockstat (and mpstat).

What do you get for a lockstat -A -D 20 sleep 30?

Do you see anyone with long lock hold times, long sleeps, or excessive
spinning?

The largest numbers from mpstat are for interrupts and cross calls.
What does intrstat(1M) show?

Have you run dtrace to determine the most frequent cross-callers?

#!/usr/sbin/dtrace -s

sysinfo:::xcalls
{
@a[stack(30)] = count();
}

END
{
trunc(@a, 30);
}

is an easy way to do this.

-j
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] C'mon ARC, stay small...

2007-03-15 Thread johansen-osdev
This seems a bit strange.  What's the workload, and also, what's the
output for:

 ARC_mru::print size lsize
 ARC_mfu::print size lsize
and
 ARC_anon::print size

For obvious reasons, the ARC can't evict buffers that are in use.
Buffers that are available to be evicted should be on the mru or mfu
list, so this output should be instructive.

-j

On Thu, Mar 15, 2007 at 02:08:37PM -0400, Jim Mauro wrote:
 
 FYI - After a few more runs, ARC size hit 10GB, which is now 10X c_max:
 
 
  arc::print -tad
 {
 . . .
c02e29e8 uint64_t size = 0t10527883264
c02e29f0 uint64_t p = 0t16381819904
c02e29f8 uint64_t c = 0t1070318720
c02e2a00 uint64_t c_min = 0t1070318720
c02e2a08 uint64_t c_max = 0t1070318720
 . . .
 
 Perhaps c_max does not do what I think it does?
 
 Thanks,
 /jim
 
 
 Jim Mauro wrote:
 Running an mmap-intensive workload on ZFS on a X4500, Solaris 10 11/06
 (update 3). All file IO is mmap(file), read memory segment, unmap, close.
 
 Tweaked the arc size down via mdb to 1GB. I used that value because
 c_min was also 1GB, and I was not sure if c_max could be larger than
 c_minAnyway, I set c_max to 1GB.
 
 After a workload run:
  arc::print -tad
 {
 . . .
   c02e29e8 uint64_t size = 0t3099832832
   c02e29f0 uint64_t p = 0t16540761088
   c02e29f8 uint64_t c = 0t1070318720
   c02e2a00 uint64_t c_min = 0t1070318720
   c02e2a08 uint64_t c_max = 0t1070318720
 . . .
 
 size is at 3GB, with c_max at 1GB.
 
 What gives? I'm looking at the code now, but was under the impression
 c_max would limit ARC growth. Granted, it's not a factor of 10, and
 it's certainly much better than the out-of-the-box growth to 24GB
 (this is a 32GB x4500), so clearly ARC growth is being limited, but it
 still grew to 3X c_max.
 
 Thanks,
 /jim
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] C'mon ARC, stay small...

2007-03-15 Thread johansen-osdev
Gar.  This isn't what I was hoping to see.  Buffers that aren't
available for eviction aren't listed in the lsize count.  It looks like
the MRU has grown to 10Gb and most of this could be successfully
evicted.

The calculation for determining if we evict from the MRU is in
arc_adjust() and looks something like:

top_sz = ARC_anon.size + ARC_mru.size

Then if top_sz  arc.p and ARC_mru.lsize  0 we evict the smaller of
ARC_mru.lsize and top_size - arc.p

In your previous message it looks like arc.p is  (ARC_mru.size +
ARC_anon.size).  It might make sense to double-check these numbers
together, so when you check the size and lsize again, also check arc.p.

How/when did you configure arc_c_max?  arc.p is supposed to be
initialized to half of arc.c.  Also, I assume that there's a reliable
test case for reproducing this problem?

Thanks,

-j

On Thu, Mar 15, 2007 at 06:57:12PM -0400, Jim Mauro wrote:
 
 
  ARC_mru::print -d size lsize
 size = 0t10224433152
 lsize = 0t10218960896
  ARC_mfu::print -d size lsize
 size = 0t303450112
 lsize = 0t289998848
  ARC_anon::print -d size
 size = 0
 
 
 So it looks like the MRU is running at 10GB...
 
 What does this tell us?
 
 Thanks,
 /jim
 
 
 
 [EMAIL PROTECTED] wrote:
 This seems a bit strange.  What's the workload, and also, what's the
 output for:
 
   
 ARC_mru::print size lsize
 ARC_mfu::print size lsize
 
 and
   
 ARC_anon::print size
 
 
 For obvious reasons, the ARC can't evict buffers that are in use.
 Buffers that are available to be evicted should be on the mru or mfu
 list, so this output should be instructive.
 
 -j
 
 On Thu, Mar 15, 2007 at 02:08:37PM -0400, Jim Mauro wrote:
   
 FYI - After a few more runs, ARC size hit 10GB, which is now 10X c_max:
 
 
 
 arc::print -tad
   
 {
 . . .
c02e29e8 uint64_t size = 0t10527883264
c02e29f0 uint64_t p = 0t16381819904
c02e29f8 uint64_t c = 0t1070318720
c02e2a00 uint64_t c_min = 0t1070318720
c02e2a08 uint64_t c_max = 0t1070318720
 . . .
 
 Perhaps c_max does not do what I think it does?
 
 Thanks,
 /jim
 
 
 Jim Mauro wrote:
 
 Running an mmap-intensive workload on ZFS on a X4500, Solaris 10 11/06
 (update 3). All file IO is mmap(file), read memory segment, unmap, close.
 
 Tweaked the arc size down via mdb to 1GB. I used that value because
 c_min was also 1GB, and I was not sure if c_max could be larger than
 c_minAnyway, I set c_max to 1GB.
 
 After a workload run:
   
 arc::print -tad
 
 {
 . . .
  c02e29e8 uint64_t size = 0t3099832832
  c02e29f0 uint64_t p = 0t16540761088
  c02e29f8 uint64_t c = 0t1070318720
  c02e2a00 uint64_t c_min = 0t1070318720
  c02e2a08 uint64_t c_max = 0t1070318720
 . . .
 
 size is at 3GB, with c_max at 1GB.
 
 What gives? I'm looking at the code now, but was under the impression
 c_max would limit ARC growth. Granted, it's not a factor of 10, and
 it's certainly much better than the out-of-the-box growth to 24GB
 (this is a 32GB x4500), so clearly ARC growth is being limited, but it
 still grew to 3X c_max.
 
 Thanks,
 /jim
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] C'mon ARC, stay small...

2007-03-15 Thread johansen-osdev
Something else to consider, depending upon how you set arc_c_max, you
may just want to set arc_c and arc_p at the same time.  If you try
setting arc_c_max, and then setting arc_c to arc_c_max, and then set
arc_p to arc_c / 2, do you still get this problem?

-j

On Thu, Mar 15, 2007 at 05:18:12PM -0700, [EMAIL PROTECTED] wrote:
 Gar.  This isn't what I was hoping to see.  Buffers that aren't
 available for eviction aren't listed in the lsize count.  It looks like
 the MRU has grown to 10Gb and most of this could be successfully
 evicted.
 
 The calculation for determining if we evict from the MRU is in
 arc_adjust() and looks something like:
 
 top_sz = ARC_anon.size + ARC_mru.size
 
 Then if top_sz  arc.p and ARC_mru.lsize  0 we evict the smaller of
 ARC_mru.lsize and top_size - arc.p
 
 In your previous message it looks like arc.p is  (ARC_mru.size +
 ARC_anon.size).  It might make sense to double-check these numbers
 together, so when you check the size and lsize again, also check arc.p.
 
 How/when did you configure arc_c_max?  arc.p is supposed to be
 initialized to half of arc.c.  Also, I assume that there's a reliable
 test case for reproducing this problem?
 
 Thanks,
 
 -j
 
 On Thu, Mar 15, 2007 at 06:57:12PM -0400, Jim Mauro wrote:
  
  
   ARC_mru::print -d size lsize
  size = 0t10224433152
  lsize = 0t10218960896
   ARC_mfu::print -d size lsize
  size = 0t303450112
  lsize = 0t289998848
   ARC_anon::print -d size
  size = 0
  
  
  So it looks like the MRU is running at 10GB...
  
  What does this tell us?
  
  Thanks,
  /jim
  
  
  
  [EMAIL PROTECTED] wrote:
  This seems a bit strange.  What's the workload, and also, what's the
  output for:
  

  ARC_mru::print size lsize
  ARC_mfu::print size lsize
  
  and

  ARC_anon::print size
  
  
  For obvious reasons, the ARC can't evict buffers that are in use.
  Buffers that are available to be evicted should be on the mru or mfu
  list, so this output should be instructive.
  
  -j
  
  On Thu, Mar 15, 2007 at 02:08:37PM -0400, Jim Mauro wrote:

  FYI - After a few more runs, ARC size hit 10GB, which is now 10X c_max:
  
  
  
  arc::print -tad

  {
  . . .
 c02e29e8 uint64_t size = 0t10527883264
 c02e29f0 uint64_t p = 0t16381819904
 c02e29f8 uint64_t c = 0t1070318720
 c02e2a00 uint64_t c_min = 0t1070318720
 c02e2a08 uint64_t c_max = 0t1070318720
  . . .
  
  Perhaps c_max does not do what I think it does?
  
  Thanks,
  /jim
  
  
  Jim Mauro wrote:
  
  Running an mmap-intensive workload on ZFS on a X4500, Solaris 10 11/06
  (update 3). All file IO is mmap(file), read memory segment, unmap, close.
  
  Tweaked the arc size down via mdb to 1GB. I used that value because
  c_min was also 1GB, and I was not sure if c_max could be larger than
  c_minAnyway, I set c_max to 1GB.
  
  After a workload run:

  arc::print -tad
  
  {
  . . .
   c02e29e8 uint64_t size = 0t3099832832
   c02e29f0 uint64_t p = 0t16540761088
   c02e29f8 uint64_t c = 0t1070318720
   c02e2a00 uint64_t c_min = 0t1070318720
   c02e2a08 uint64_t c_max = 0t1070318720
  . . .
  
  size is at 3GB, with c_max at 1GB.
  
  What gives? I'm looking at the code now, but was under the impression
  c_max would limit ARC growth. Granted, it's not a factor of 10, and
  it's certainly much better than the out-of-the-box growth to 24GB
  (this is a 32GB x4500), so clearly ARC growth is being limited, but it
  still grew to 3X c_max.
  
  Thanks,
  /jim
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] C'mon ARC, stay small...

2007-03-15 Thread johansen-osdev
I suppose I should have been more forward about making my last point.
If the arc_c_max isn't set in /etc/system, I don't believe that the ARC
will initialize arc.p to the correct value.   I could be wrong about
this; however, next time you set c_max, set c to the same value as c_max
and set p to half of c.  Let me know if this addresses the problem or
not.

-j

 How/when did you configure arc_c_max?  
 Immediately following a reboot, I set arc.c_max using mdb,
 then verified reading the arc structure again.
 arc.p is supposed to be
 initialized to half of arc.c.  Also, I assume that there's a reliable
 test case for reproducing this problem?
   
 Yep. I'm using a x4500 in-house to sort out performance of a customer test
 case that uses mmap. We acquired the new DIMMs to bring the
 x4500 to 32GB, since the workload has a 64GB working set size,
 and we were clobbering a 16GB thumper. We wanted to see how doubling
 memory may help.
 
 I'm trying clamp the ARC size because for mmap-intensive workloads,
 it seems to hurt more than help (although, based on experiments up to this
 point, it's not hurting a lot).
 
 I'll do another reboot, and run it all down for you serially...
 
 /jim
 
 Thanks,
 
 -j
 
 On Thu, Mar 15, 2007 at 06:57:12PM -0400, Jim Mauro wrote:
   
 
 ARC_mru::print -d size lsize
   
 size = 0t10224433152
 lsize = 0t10218960896
 
 ARC_mfu::print -d size lsize
   
 size = 0t303450112
 lsize = 0t289998848
 
 ARC_anon::print -d size
   
 size = 0
 
 So it looks like the MRU is running at 10GB...
 
 What does this tell us?
 
 Thanks,
 /jim
 
 
 
 [EMAIL PROTECTED] wrote:
 
 This seems a bit strange.  What's the workload, and also, what's the
 output for:
 
  
   
 ARC_mru::print size lsize
 ARC_mfu::print size lsize

 
 and
  
   
 ARC_anon::print size

 
 For obvious reasons, the ARC can't evict buffers that are in use.
 Buffers that are available to be evicted should be on the mru or mfu
 list, so this output should be instructive.
 
 -j
 
 On Thu, Mar 15, 2007 at 02:08:37PM -0400, Jim Mauro wrote:
  
   
 FYI - After a few more runs, ARC size hit 10GB, which is now 10X c_max:
 
 

 
 arc::print -tad
  
   
 {
 . . .
   c02e29e8 uint64_t size = 0t10527883264
   c02e29f0 uint64_t p = 0t16381819904
   c02e29f8 uint64_t c = 0t1070318720
   c02e2a00 uint64_t c_min = 0t1070318720
   c02e2a08 uint64_t c_max = 0t1070318720
 . . .
 
 Perhaps c_max does not do what I think it does?
 
 Thanks,
 /jim
 
 
 Jim Mauro wrote:

 
 Running an mmap-intensive workload on ZFS on a X4500, Solaris 10 11/06
 (update 3). All file IO is mmap(file), read memory segment, unmap, 
 close.
 
 Tweaked the arc size down via mdb to 1GB. I used that value because
 c_min was also 1GB, and I was not sure if c_max could be larger than
 c_minAnyway, I set c_max to 1GB.
 
 After a workload run:
  
   
 arc::print -tad

 
 {
 . . .
 c02e29e8 uint64_t size = 0t3099832832
 c02e29f0 uint64_t p = 0t16540761088
 c02e29f8 uint64_t c = 0t1070318720
 c02e2a00 uint64_t c_min = 0t1070318720
 c02e2a08 uint64_t c_max = 0t1070318720
 . . .
 
 size is at 3GB, with c_max at 1GB.
 
 What gives? I'm looking at the code now, but was under the impression
 c_max would limit ARC growth. Granted, it's not a factor of 10, and
 it's certainly much better than the out-of-the-box growth to 24GB
 (this is a 32GB x4500), so clearly ARC growth is being limited, but it
 still grew to 3X c_max.
 
 Thanks,
 /jim
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  
   
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] understanding zfs/thunoer bottlenecks?

2007-02-27 Thread johansen-osdev
 it seems there isn't an algorithm in ZFS that detects sequential write
 in traditional fs such as ufs, one would trigger directio.

There is no directio for ZFS.  Are you encountering a situation in which
you believe directio support would improve performance?  If so, please
explain.

-j
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS multi-threading

2007-02-08 Thread johansen-osdev
 Would the logic behind ZFS take full advantage of a heavily multicored
 system, such as on the Sun Niagara platform? Would it utilize of the
 32 concurrent threads for generating its checksums? Has anyone
 compared ZFS on a Sun Tx000, to that of a 2-4 thread x64 machine?

Pete and I are working on resolving ZFS scalability issues with Niagara and
StarCat right now.  I'm not sure if any official numbers about ZFS
performance on Niagara have been published.

As far as concurrent threads generating checksums goes, the system
doesn't work quite the way you have postulated.  The checksum is
generated in the ZIO_STAGE_CHECKSUM_GENERATE pipeline state for writes,
and verified in the ZIO_STAGE_CHECKSUM_VERIFY pipeline stage for reads.
Whichever thread happens to advance the pipline to the checksum generate
stage is the thread that will actually perform the work.  ZFS does not
break the work of the checksum into chunks and have multiple CPUs
perform the computation.  However, it is possible to have concurrent
writes simultaneously in the checksum_generate stage.

More details about this can be found in zfs/zio.c and zfs/sys/zio_impl.h

-j

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS direct IO

2007-01-24 Thread johansen-osdev
 And this feature is independant on whether   or not the data  is
 DMA'ed straight into the user buffer.

I suppose so, however, it seems like it would make more sense to
configure a dataset property that specifically describes the caching
policy that is desired.  When directio implies different semantics for
different filesystems, customers are going to get confused.

 The  other  feature,  is to  avoid a   bcopy by  DMAing full
 filesystem block reads straight into user buffer (and verify
 checksum after). The I/O is high latency, bcopy adds a small
 amount. The kernel memory can  be freed/reuse straight after
 the user read  completes. This is  where I ask, how much CPU
 is lost to the bcopy in workloads that benefit from DIO ?

Right, except that if we try to DMA into user buffers with ZFS there's a
bunch of other things we need the VM to do on our behalf to protect the
integrity of the kernel data that's living in user pages.  Assume you
have a high-latency I/O and you've locked some user pages for this I/O.
In a pathological case, when another thread tries to access the locked
pages and then also blocks,  it does so for the duration of the first
thread's I/O.  At that point, it seems like it might be easier to accept
the cost of the bcopy instead of blocking another thread.

I'm not even sure how to assess the impact of VM operations required to
change the permissions on the pages before we start the I/O.

 The quickest return on  investement  I see for  the  directio
 hint would be to tell ZFS to not grow the ARC when servicing
 such requests.

Perhaps if we had an option that specifies not to cache data from a
particular dataset, that would suffice.  I think you've filed a CR along
those lines already (6429855)?

-j
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS direct IO

2007-01-23 Thread johansen-osdev
 Basically speaking - there needs to be some sort of strategy for
 bypassing the ARC or even parts of the ARC for applications that
 may need to advise the filesystem of either:
 1) the delicate nature of imposing additional buffering for their
 data flow
 2) already well optimized applications that need more adaptive
 cache in the application instead of the underlying filesystem or
 volume manager

This advice can't be sensibly delivered to ZFS via a Direct I/O
mechanism.  Anton's characterization of Direct I/O as, an optimization
which allows data to be transferred directly between user data buffers
and disk, without a memory-to-memory copy, is concise and accurate.
Trying to intuit advice from this is unlikely to be useful.  It would be
better to develop a separate mechanism for delivering advice about the
application to the filesystem.  (fadvise, perhaps?)

A DIO implementation for ZFS is more complicated than UFS and adversely
impacts well optimized applications.

I looked into this late last year when we had a customer who was
suffering from too much bcopy overhead.  Billm found another workaround
instead of bypassing the ARC.

The challenge for implementing DIO for ZFS is in dealing with access to
the pages mapped by the user application.  Since ZFS has to checksum all
of its data, the user's pages that are involved in the direct I/O cannot
be written to by another thread during the I/O.  If this policy isn't
enforced, it is possible for the data written to or read from disk to be
different from their checksums.

In order to protect the user pages while a DIO is in progress, we want
support from the VM that isn't presently implemented.  To prevent a page
from being accessed by another thread, we have to unmap the TLB/PTE
entries and lock the page.  There's a cost associated with this, as it
may be necessary to cross-call other CPUs.  Any thread that accesses the
locked pages will block.  While it's possible lock pages in the VM
today, there isn't a neat set of interfaces the filesystem can use to
maintain the integrity of the user's buffers.  Without an experimental
prototype to verify the design, it's impossible to say whether overhead
of manipulating the page permissions is more than the cost of bypassing
the cache.

What do you see as potential use cases for ZFS Direct I/O?  I'm having a
hard time imagining a situation in which this would be useful to a
customer.  The application would probably have to be single-threaded,
and if not, it would have to be pretty careful about how its threads
access buffers involved in I/O.

-j
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS direct IO

2007-01-23 Thread johansen-osdev
 Note also that for most applications, the size of their IO operations
 would often not match the current page size of the buffer, causing
 additional performance and scalability issues.

Thanks for mentioning this, I forgot about it.

Since ZFS's default block size is configured to be larger than a page,
the application would have to issue page-aligned block-sized I/Os.
Anyone adjusting the block size would presumably be responsible for
ensuring that the new size is a multiple of the page size.  (If they
would want Direct I/O to work...)

I believe UFS also has a similar requirement, but I've been wrong
before.

-j

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: slow reads question...

2006-09-22 Thread johansen-osdev
Harley:

I had tried other sizes with much the same results, but
 hadnt gone as large as 128K.  With bs=128K, it gets worse:
 
 | # time dd if=zeros-10g of=/dev/null bs=128k count=102400
 | 81920+0 records in
 | 81920+0 records out
 | 
 | real2m19.023s
 | user0m0.105s
 | sys 0m8.514s

I may have done my math wrong, but if we assume that the real
time is the actual amount of time we spent performing the I/O (which may
be incorrect) haven't you done better here?

In this case you pushed 81920 128k records in ~139 seconds -- approx
75437 k/sec.

Using ZFS with 8k bs, you pushed 102400 8k records in ~68 seconds --
approx 12047 k/sec.

Using the raw device you pushed 102400 8k records in ~23 seconds --
approx 35617 k/sec.

I may have missed something here, but isn't this newest number the
highest performance so far?

What does iostat(1M) say about your disk read performance?

Is there any other info I can provide which would help?

Are you just trying to measure ZFS's read performance here?

It might be interesting to change your outfile (of) argument and see if
we're actually running into some other performance problem.  If you
change of=/tmp/zeros does performance improve or degrade?  Likewise, if
you write the file out to another disk (UFS, ZFS, whatever), does this
improve performance?

-j
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: slow reads question...

2006-09-22 Thread johansen-osdev
Harley:

 Old 36GB drives:
 
 | # time mkfile -v 1g zeros-1g
 | zeros-1g 1073741824 bytes
 | 
 | real2m31.991s
 | user0m0.007s
 | sys 0m0.923s
 
 Newer 300GB drives:
 
 | # time mkfile -v 1g zeros-1g
 | zeros-1g 1073741824 bytes
 | 
 | real0m8.425s
 | user0m0.010s
 | sys 0m1.809s

This is a pretty dramatic difference.  What type of drives were your old
36g drives?

I am wondering if there is something other than capacity
 and seek time which has changed between the drives.  Would a
 different scsi command set or features have this dramatic a
 difference?

I'm hardly the authority on hardware, but there are a couple of
possibilties.  Your newer drives may have a write cache.  It's also
quite likely that the newer drives have a faster speed of rotation and
seek time.

If you subtract the usr + sys time from the real time in these
measurements, I suspect the result is the amount of time you were
actually waiting for the I/O to finish.  In the first case, you spent
99% of your total time waiting for stuff to happen, whereas in the
second case it was only ~86% of your overall time.

-j
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss