comments below... Carsten Aulbert wrote: > Hi all, > > Carsten Aulbert wrote: > >> More later. >> > > OK, I'm completely puzzled right now (and sorry for this lengthy email). > My first (and currently only idea) was that the size of the files is > related to this effect, but that does not seem to be the case: > > (1) A 185 GB zfs file system was transferred yesterday with a speed of > about 60 MB/s to two different servers. The histogram of files looks like: > > 2822 files were investigated, total size is: 185.82 Gbyte > > Summary of file sizes [bytes]: > zero: 2 > 1 -> 2 0 > 2 -> 4 1 > 4 -> 8 3 > 8 -> 16 26 > 16 -> 32 8 > 32 -> 64 6 > 64 -> 128 29 > 128 -> 256 11 > 256 -> 512 13 > 512 -> 1024 17 > 1024 -> 2k 33 > 2k -> 4k 45 > 4k -> 8k 9044 ************ > 8k -> 16k 60 > 16k -> 32k 41 > 32k -> 64k 19 > 64k -> 128k 22 > 128k -> 256k 12 > 256k -> 512k 5 > 512k -> 1024k 1218 ** > 1024k -> 2M 16004 ********************* > 2M -> 4M 46202 > ************************************************************ > 4M -> 8M 0 > 8M -> 16M 0 > 16M -> 32M 0 > 32M -> 64M 0 > 64M -> 128M 0 > 128M -> 256M 0 > 256M -> 512M 0 > 512M -> 1024M 0 > 1024M -> 2G 0 > 2G -> 4G 0 > 4G -> 8G 0 > 8G -> 16G 1 > > (2) Currently a much larger file system is being transferred, the same > script (even the same incarnation, i.e. process) is now running close to > 22 hours: > > 28549 files were investigated, total size is: 646.67 Gbyte > > Summary of file sizes [bytes]: > zero: 4954 ************************** > 1 -> 2 0 > 2 -> 4 0 > 4 -> 8 1 > 8 -> 16 1 > 16 -> 32 0 > 32 -> 64 0 > 64 -> 128 1 > 128 -> 256 0 > 256 -> 512 9 > 512 -> 1024 71 > 1024 -> 2k 1 > 2k -> 4k 1095 ****** > 4k -> 8k 8449 ********************************************* > 8k -> 16k 2217 ************ > 16k -> 32k 503 *** > 32k -> 64k 1 > 64k -> 128k 1 > 128k -> 256k 1 > 256k -> 512k 0 > 512k -> 1024k 0 > 1024k -> 2M 0 > 2M -> 4M 0 > 4M -> 8M 16 > 8M -> 16M 0 > 16M -> 32M 0 > 32M -> 64M 11218 > ************************************************************ > 64M -> 128M 0 > 128M -> 256M 0 > 256M -> 512M 0 > 512M -> 1024M 0 > 1024M -> 2G 0 > 2G -> 4G 5 > 4G -> 8G 1 > 8G -> 16G 3 > 16G -> 32G 1 > > > When watching zpool iostat I get this (30 second average, NOT the first > output): > > capacity operations bandwidth > pool used avail read write read write > ---------- ----- ----- ----- ----- ----- ----- > atlashome 3.54T 17.3T 137 0 4.28M 0 > raidz2 833G 6.00T 1 0 30.8K 0 > c0t0d0 - - 1 0 2.38K 0 > c1t0d0 - - 1 0 2.18K 0 > c4t0d0 - - 0 0 1.91K 0 > c6t0d0 - - 0 0 1.76K 0 > c7t0d0 - - 0 0 1.77K 0 > c0t1d0 - - 0 0 1.79K 0 > c1t1d0 - - 0 0 1.86K 0 > c4t1d0 - - 0 0 1.97K 0 > c5t1d0 - - 0 0 2.04K 0 > c6t1d0 - - 1 0 2.25K 0 > c7t1d0 - - 1 0 2.31K 0 > c0t2d0 - - 1 0 2.21K 0 > c1t2d0 - - 0 0 1.99K 0 > c4t2d0 - - 0 0 1.99K 0 > c5t2d0 - - 1 0 2.38K 0 > raidz2 1.29T 5.52T 67 0 2.09M 0 > c6t2d0 - - 58 0 143K 0 > c7t2d0 - - 58 0 141K 0 > c0t3d0 - - 53 0 131K 0 > c1t3d0 - - 53 0 130K 0 > c4t3d0 - - 58 0 143K 0 > c5t3d0 - - 58 0 145K 0 > c6t3d0 - - 59 0 147K 0 > c7t3d0 - - 59 0 146K 0 > c0t4d0 - - 59 0 145K 0 > c1t4d0 - - 58 0 145K 0 > c4t4d0 - - 58 0 145K 0 > c6t4d0 - - 58 0 143K 0 > c7t4d0 - - 58 0 143K 0 > c0t5d0 - - 58 0 145K 0 > c1t5d0 - - 58 0 144K 0 > raidz2 1.43T 5.82T 69 0 2.16M 0 > c4t5d0 - - 62 0 141K 0 > c5t5d0 - - 60 0 138K 0 > c6t5d0 - - 59 0 135K 0 > c7t5d0 - - 60 0 138K 0 > c0t6d0 - - 62 0 142K 0 > c1t6d0 - - 61 0 138K 0 > c4t6d0 - - 59 0 135K 0 > c5t6d0 - - 60 0 138K 0 > c6t6d0 - - 62 0 142K 0 > c7t6d0 - - 61 0 138K 0 > c0t7d0 - - 58 0 134K 0 > c1t7d0 - - 60 0 137K 0 > c4t7d0 - - 62 0 142K 0 > c5t7d0 - - 61 0 139K 0 > c6t7d0 - - 58 0 134K 0 > c7t7d0 - - 60 0 138K 0 > ---------- ----- ----- ----- ----- ----- ----- > > Odd things: > > (1) The zpool is not equally striped across the raidz2-pools >
Since you are reading, it depends on where the data was written. Remember, ZFS dynamic striping != RAID-0. I would expect something like this if the pool was expanded at some point in time. > (2) The disks should be able to perform much much faster than they > currently output data at, I believe it;s 2008 and not 1995. > X4500? Those disks are good for about 75-80 random iops, which seems to be about what they are delivering. The dtrace tool, iopattern, will show the random/sequential nature of the workload. > (3) The four cores of the X4500 are dying of boredom, i.e. idle >95% all > the time. > > Has anyone a good idea, where the bottleneck could be? I'm running out > of ideas. > I would suspect the disks. 30 second samples are not very useful to try and debug such things -- even 1 second samples can be too coarse. But you should take a look at 1 second samples to see if there is a consistent I/O workload. -- richard > Cheers > > Carsten > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss