I am extremely inexperienced with software RAID, so please don't flame me
if this message comes arcross as evidence of outright idiocy.  I prefer to
think the line between idiocy and adventurousness is just very thin.

I built a custom kernel using the 2.2.14 source and applying the most
recent patch, raid0145-19990824-2.2.11.  Although that patch was targeted
officially at 2.2.11, it worked fairly well, rejecting only some
architecture-specific files related to SPARC and PPC, and the raid0.c
module.  Since I am not even compiling the raid0.c module and am running
on Intel x86, this seemed like a reasonable result.  The kernel builds and
runs fairly well in most respects that I can see.  Besides, it was the
most recent source patch at the places mentioned in the HOWTO; is there
anything better available?

I then used this custom kernel as a replacement on the installation disk
for the Debian "Potato" distribution.  I put the userland raidtools onto a
floppy disk, which I can mount manually after booting.  I also wrote a
short raidtab onto the floppy which almost exactly follows the example in
the HOWTO for a RAID-1 set, using type 0xFD /dev/hda1 and /dev/hdc1
partitions which had been created manually with fdisk.  Both /dev/hda and
/dev/hdc are identical 30 GB Western Digital drives, partitioned
identically.  I used mkraid successfully and I can use raidstart and
raidstop with no problems.  The contents of /proc/mdstat look good.

If I allow the initial mirroring to complete, the /dev/md0 device works
normally as far as I can tell.  It can be mounted, files read and written
to it, and so on.  However, during the remirroring process, I was able to
to do everything successfully prior to mount.  During the remirroring
process, I can run mke2fs and e2fsck successfully, and e2fsck reports it
as clean when that would be expected.  However, any attempt to mount the
partition during the remirroring process fails out with the unhelpful:

        Kernel PANIC: B_FREE inserted into queues.

This is particularly irritating, since this causes a complete lockup and
the remirror starts again from scratch on the reboot.

The ext2 fs code seems to be solid.  The md0 device and its ext2 fs work
fine as long as the remirror is not in progress, or at least I think so. 
I can mount an ext2 fs floppy disk (which I created as a test) during the
remirror of md0 with no trouble.

I really have too little experience with software RAID to know what I am
looking at here.  The kernel panic is not helpful, and it generates
nothing by way of stack trace or other information besides the one line
message.  I am at a complete loss even to guess which subsystem it is
coming from, since the queue manager is touched by everything.  I have
even been wondering if the problem could be hardware, triggered by
something like writing to both IDE channels in quick succession or DMA
mismanagement.  The system in question is a Pentium 166 MHz with 64 MB RAM
using an Asus P55TP4N motherboard (upgraded to the latest BIOS), which has
the Intel Triton chipset (82371 and 82437) handling the IDE channels.

Although the system runs when the mirror is synchronized, this instability
during remirror greatly troubles me, and it essentially defeats the whole
purpose of using RAID.  I also have doubts about whether the system is
truly stable in this configuration even with the mirror synchronized, or
whether it is working by coincidence and some set of circumstances or
sequence of events will cause a kernel panic during normal operation.

Can you give me any insight on this?   I am very new to software RAID.

-- Mike


Reply via email to