R: raid and 2.4 kernels

Gianluca Cecchi Wed, 02 Aug 2000 11:11:23 -0700
This my experience with raid0 and 2.2.14/2.4*. Below an e-mail I sent in
another list and probably Glenn cc'ed it also here (about 24/07).
It seems the main problem is kswapd sucking of resources not totally fixed
(some better result from 2.3.99 to 2.4.0-test5, but not sufficient).
If you make "vmstat 3", then P (to sort by cpu usage), during heavy I/O
operation you see after several seconds the block in/block out performances
decreasing stabilyzing to poor numbers.
It doesn't matter using tagged/notagged with my system: in both 2.2 and 2.4
the performance don't change. The mail refers to test4, but also
in test5 the numbers are similar.
Anyone knows the progress on thi problem with 2.4 thread?
A curiosity: I have not worked much on it but yesterday I compiled a 2.2.16
standard kernel and it seems to have similar bad performance like 2.4 ones.
Anyone registered this with 2.2.16. Could be redhat particular patches that
speed up raid0 with their kernel in 6.2 release?
Bye
Gianluca


My mail:
The system:

MB: Supermicro P6SBU (Adaptec 7890 on board)
CPU: 1 pentium III 500 MHz
Mem: 256Mb

1x9.1Gb IBM DNES-309170W disk on fast/se channel
4x18.2Gb IBM DNES-318350W on ultra2 channel
The 18.2 Gb disks are in raid0 software. Below the /etc/raidtab file:

raiddev /dev/md0
          raid-level      0
          nr-raid-disks   4
          persistent-superblock 1
          chunk-size     128
          device          /dev/sdb1
          raid-disk       0
          device          /dev/sdc1
          raid-disk       1
          device          /dev/sdd1
          raid-disk       2
          device          /dev/sde1
          raid-disk       3

output of dmesg related to scsi conf (in 2.4.0-test4 boot):

md.c: sizeof(mdp_super_t) = 4096
(scsi0) <Adaptec AIC-7890/1 Ultra2 SCSI host adapter> found at PCI 0/14/0
(scsi0) Wide Channel, SCSI ID=7, 32/255 SCBs
(scsi0) Downloading sequencer code... 392 instructions downloaded
scsi0 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.2.1/5.2.0
       <Adaptec AIC-7890/1 Ultra2 SCSI host adapter>
scsi : 1 host.
(scsi0:0:5:0) Synchronous at 10.0 Mbyte/sec, offset 15.
  Vendor: SONY      Model: SDT-9000          Rev: 0400
  Type:   Sequential-Access                  ANSI SCSI revision: 02
(scsi0:0:6:0) Synchronous at 40.0 Mbyte/sec, offset 31.
  Vendor: IBM       Model: DNES-309170W      Rev: SA30
  Type:   Direct-Access                      ANSI SCSI revision: 03
Detected scsi disk sda at scsi0, channel 0, id 6, lun 0
(scsi0:0:8:0) Synchronous at 80.0 Mbyte/sec, offset 31.
  Vendor: IBM       Model: DNES-318350W      Rev: SA30
  Type:   Direct-Access                      ANSI SCSI revision: 03
Detected scsi disk sdb at scsi0, channel 0, id 8, lun 0
(scsi0:0:9:0) Synchronous at 80.0 Mbyte/sec, offset 31.
  Vendor: IBM       Model: DNES-318350W      Rev: SA30
  Type:   Direct-Access                      ANSI SCSI revision: 03
Detected scsi disk sdc at scsi0, channel 0, id 9, lun 0
(scsi0:0:10:0) Synchronous at 80.0 Mbyte/sec, offset 31.
  Vendor: IBM       Model: DNES-318350W      Rev: SA30
  Type:   Direct-Access                      ANSI SCSI revision: 03
Detected scsi disk sdd at scsi0, channel 0, id 10, lun 0
(scsi0:0:12:0) Synchronous at 80.0 Mbyte/sec, offset 31.
  Vendor: IBM       Model: DNES-318350W      Rev: SA30
  Type:   Direct-Access                      ANSI SCSI revision: 03
Detected scsi disk sde at scsi0, channel 0, id 12, lun 0
scsi : detected 5 SCSI disks total.
SCSI device sda: hdwr sector= 512 bytes. Sectors= 17916240 [8748 MB] [8.7
GB]
Partition check:
 sda: sda1 sda2 < sda5 sda6 sda7 >
SCSI device sdb: hdwr sector= 512 bytes. Sectors= 35843670 [17501 MB] [17.5
GB]
 sdb: sdb1
SCSI device sdc: hdwr sector= 512 bytes. Sectors= 35843670 [17501 MB] [17.5
GB]
 sdc: sdc1
SCSI device sdd: hdwr sector= 512 bytes. Sectors= 35843670 [17501 MB] [17.5
GB]
 sdd: sdd1
SCSI device sde: hdwr sector= 512 bytes. Sectors= 35843670 [17501 MB] [17.5
GB]
 sde: sde1

[snipped]

(read) sdb1's sb offset: 17920384 [events: 00000085]
(read) sdc1's sb offset: 17920384 [events: 00000085]
(read) sdd1's sb offset: 17920384 [events: 00000085]
(read) sde1's sb offset: 17920384 [events: 00000085]
autorun ...
considering sde1 ...
  adding sde1 ...
  adding sdd1 ...
  adding sdc1 ...
  adding sdb1 ...
created md0
bind<sdb1,1>
bind<sdc1,2>
bind<sdd1,3>
bind<sde1,4>
running: <sde1><sdd1><sdc1><sdb1>
now!
sde1's event counter: 00000085
sdd1's event counter: 00000085
sdc1's event counter: 00000085
sdb1's event counter: 00000085
raid0 personality registered
md0: max total readahead window set to 2048k
md0: 4 data-disks, max readahead per data-disk: 512k
raid0: looking at sdb1
raid0:   comparing sdb1(17920384) with sdb1(17920384)
raid0:   END
raid0:   ==> UNIQUE
raid0: 1 zones
raid0: looking at sdc1
raid0:   comparing sdc1(17920384) with sdb1(17920384)
raid0:   EQUAL
raid0: looking at sdd1
raid0:   comparing sdd1(17920384) with sdb1(17920384)
raid0:   EQUAL
raid0: looking at sde1
raid0:   comparing sde1(17920384) with sdb1(17920384)
raid0:   EQUAL
raid0: FINAL 1 zones
zone 0
 checking sdb1 ... contained as device 0
  (17920384) is smallest!.
 checking sdc1 ... contained as device 1
 checking sdd1 ... contained as device 2
 checking sde1 ... contained as device 3
 zone->nb_dev: 4, size: 71681536
current zone offset: 17920384
done.
raid0 : md_size is 71681536 blocks.
raid0 : conf->smallest->size is 71681536 blocks.
raid0 : nb_zone is 1.
raid0 : Allocating 8 bytes for hash.
md: updating md0 RAID superblock on device
sde1 [events: 00000086](write) sde1's sb offset: 17920384
sdd1 [events: 00000086](write) sdd1's sb offset: 17920384
sdc1 [events: 00000086](write) sdc1's sb offset: 17920384
sdb1 [events: 00000086](write) sdb1's sb offset: 17920384
.
... autorun DONE.
Detected scsi tape st0 at scsi0, channel 0, id 5, lun 0
st: bufsize 32768, wrt 30720, max init. buffers 4, s/g segs 16.


These are the outputs of bonnie++ version 1.00 compiled on 2.2.14 kernel
 redhat 6.2)


kernel 2.2.14 no tagged
Version  1.00       ------Sequential Output------ --Sequential
Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per
Chr- --Block-- --Seeks--
Machine          MB K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec
%CP
Unknown        1000  8400  97 56908  70 21380  48  8475  96 58199  44
  nan -21474836
48
                    ------Sequential Create------ --------Random
Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delet
e--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec
%CP
                 30   175  96   627  99  6451  99   182  99   806  99   722
91
Unknown,1000,8400,97,56908,70,21380,48,8475,96,58199,44,
nan,-2147483648,30,175,96,62
7,99,6451,99,182,99,806,99,722,91

kernel 2.4.0-test4 no tagged
Version  1.00       ------Sequential Output------ --Sequential
Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per
Chr- --Block-- --Seeks--
Machine          MB K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec
%CP
Unknown        1000  8249  97 51642  37 10498  18  6190  72 17248  19
  nan -2147483648
                    ------Sequential Create------ --------Random
Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delet
e--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec
%CP
                 30   174  99 +++++  93  9417  93   180  99 +++++ 104  1282
98
Unknown,1000,8249,97,51642,37,10498,18,6190,72,17248,19,
nan,-2147483648,30,174,99,+++++,93,9417,93,180,99,+++++,104,1282,98

What about them? In particular the big difference between rewrite seq.
output (21380  48 % cpu in 2.2 vs 10498  18%
in 2.4) and  block seq. input (58199  44% in 2.2 vs 17248  19% in 2.4)?

Making dd or cp of big files the performances of 2.4 remains 1/3 respect of
2.2.14 (due to seq. input bad performance??),
eg. 62 secs for 512MB dd with 1MB block size versus 23 secs in 2.2.14, from
raid0 to raid0 itself). The cpu load is 45% in 2.2 versus 18% in 2.4.
The problem with kswapd overload seems not to be so present analyzing
vmstat, but the
performance gap remains.
Is philosophical (logical) change toward multi user/multi processor
environment or bad performance?
Thanks in advance for your clarifications.
Gianluca


PS: tell me if I can be of any help for testing conditions with my hardware.



----- Original Message -----
From: Neil Brown <[EMAIL PROTECTED]>
To: Nils Rennebarth <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Monday, July 31, 2000 5:43 AM
Subject: Re: raid and 2.4 kernels


> On Thursday July 27, [EMAIL PROTECTED] wrote:
> > On Wed, Jul 26, 2000 at 08:12:47PM -0700, Gregory Leblanc wrote:
> > > > Given the code at the moment, I am highly confident that linear,
raid0
> > > > and raid1 should be just as fast in 2.4 as in 2.2.
> > > > There are some issues with raid5 that I am looking into. I don't
> > > > know that they affect speed much, though they might.
> > I doubt it, see below
>
> I'm ready to be proved wrong ... you may well have suceeded :-)
>
> >
> > > Could you be a little more specific?  Speed comparisons on disk
access?
> > > Then you can't compare RAID with no RAID effectively.  You could
compare the
> > > speed of 2.2/2.4 RAID, and 2.2/2.4 no RAID, but comparisons across
would
> > > seem to be meaningless.  Later,
> > Ok, so here is what I did.
> >
> > Machine (you might have seen this before, sorry for repetitions)
> > AMD Athlon 650, 256MB RAM, Abit KA7 Mainboard, VIA Chipset, system on
> > Fujitsu 20GB IDE disk, 3 Promise PDC20262 UMDA Controller, 6 IBM
> > DTLA-307045 46GB disks for data.
> >
> > It runs kernel 2.4.0-test1 (first 3 tests) and 2.4.0-test5-pre3 (rest)
> > with Andre Hedricks IDE patch and the latest reiserfs.
> >
> > These are bonnie results:
> >
> >               -------Sequential Output-------- ---Sequential
Input-- --Random--
> >               -Per Char- --Block--- -Rewrite-- -Per
Char- --Block--- --Seeks---
> > Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU
/sec %CPU
> > fujitsu   500  2243 39.7  3400  9.4  1422 54.9  1945 75.4  2795 66.3
182.7 29.7
> > single    500  5025 95.7 20218 81.1 11182 40.5  5214 90.9 26707 48.9
213.2 18.7
> > raid0, 3  500  4804 83.3 17275 42.7 11067 50.7  5393 92.6 37242 65.2
361.4 28.1
> > raid0, 4  500  4941 81.8 20920 48.7 13068 66.4  5283 91.0 35318 55.3
394.8 27.6
> > raid5, 6  500  3009 52.0  3004  4.6  2589  3.6  4411 70.4 18982 13.9
360.5 22.7
> >
> > where
> > - "fujitsu"  is the system disk
> > - "single"   is a filesystem on a single IBM disk
> > - "raid0, 3" is a RAID0 array of 3 IBM disks
> > - "raid0, 4" is a RAID0 array of 4 IBM disks
> > - "raid5, 6" is a RAID5 array of 6 IBM disks
> >
> > In between the disks have been freshly partitioned (1 big primary only)
> > mkraid'ed and given a fresh reiserfs.
> >
> > It could of course be a combination of reiserfs, Andre Hedrick's IDE
patches
> > and raid. I expected the raid 0 with n disks to get a bit less than
> > n times the block read performance of 1 disk, and raid5 to have block
write
> > performance of a bit less than single disk, block read performance much
> > better than for a single disk. Are these expectations unrealistic?
> >
>
> I think your expectations of raid0 may be a little over optimistic.
>
>
> raid0 will only get close to 'n' times a single disc when you have a
> number of separate threads accessing the device, otherwise there are
> fewer opportunities for multiple drives to be accessed at once.
> I believe that bonnie is single-threaded, so it is unlikely to drive a
> raid0 array optimally.
> Possibly you could try running 3 bonnies in parallel, and compare that
> to three paralel bonnies on three separate drives.
>
> So your read timings look believable - raid0 is faster but not
> stunningly faster.  The extra speed probably comes from read-ahead,
> which adds an element of multi-threading.
>
> Your write times are a bit dissapointing though.  The write-behind
> that the kernel does should provide plenty of opportunity to get all
> the drives writing at once.  Cluster size  can affect this  a bit, but
> I wouldn't expect it that much.  I might do some similar tests my
> self...
>
> For raid-5, your numbers are disappointing, but not actually very
> surprising now that I think about it.
> The way the raid5 code is at the moment, only one request can be
> outstanding for each 4k wide stripe.  Also, request are processed
> immediately, but are queue, and have to wait for the raid5d thread to
> wake-up, which is probably a context-switch away at least.
>
> I hope to fix this one day, but whether it will be before 2.4.0-final
> comes out or not, I don't know.
>
> Thanks for the numbers.
>
> NeilBrown
>
R: raid and 2.4 kernels

Reply via email to