[EMAIL PROTECTED] (Robert Creager) writes: > When grilled further on (Wed, 7 Jan 2004 18:06:08 -0500), > Andrew Sullivan <[EMAIL PROTECTED]> confessed: > >> We have lately had a couple of cases where machines either locked >> up, slowed down to the point of complete unusability, or died >> completely while using jfs. We are _not_ sure that jfs is in fact >> the culprit. In one case, a kernel panic appeared to be referring >> to the jfs kernel module, but I can't be sure as I lost the output >> immediately thereafter. Yesterday, we had a problem of data >> corruption on a failed jfs volume. >> >> None of this is to say that jfs is in fact to blame, nor even that, >> if it is, it does not have something to do with the age of our >> installations, &c. (these are all RH 8). In fact, I suspect >> hardware in both cases. But I thought I'd mention it just in case >> other people are seeing strange behaviour, on the principle of >> "better safe than sorry." > > Interestingly enough, I'm using JFS on a new scsi disk with Mandrake > 9.1 and was having similar problems. I was generating heavy disk > usage through database and astronomical data reductions. My machine > (dual AMD) would suddenly hang. No new jobs would run, just > increase the load, until I reboot the machine. > > I solved my problems by creating a 128Mb ram disk (using EXT2) for > the temp data produced my reduction runs. > > I believe JFS was to blame, not hardware, but you never know...
Interesting. The set of concurrent factors that came together to appear when this happened "consistently" were thus: 1. Heavy DB updates taking place on JFS filesystems; 2. SMP (we suspected Xeon hyperthreading as a possible factor, but shut it off and still saw the same problem...) 3. The third factor that appeared a catalyst was copying, via scp, a file > 2GB in size onto the system. The third piece was a particularly interesting aspect; the file would get copied over successfully, and the scp process would hang (to the point of "kill -9" being unable to touch it) immediately thereafter. At that point, processes on the system that were accessing files on the hung-up filesystem were locked, also unkillable by "kill 9." That's certainly consistent with JFS being at the root of the problem, whether it was the cause or not... -- let name="cbbrowne" and tld="libertyrms.info" in String.concat "@" [name;tld];; <http://dev6.int.libertyrms.com/> Christopher Browne (416) 646 3304 x124 (land) ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend