Jeff Dike wrote: > On Fri, May 04, 2007 at 07:30:36PM +0200, Jan Ploski wrote: > >>I am experimenting with UML in a HPC cluster. What I do is basically start >>up 60 instances all at once, a bunch of instances on each hardware node, >>using the resource manager TORQUE. Each instance gets a different umid. >>The instances are configured to boot up, execute a job and halt after >>that. Most of the times it works very well. However, every now and then >>some instance of the 60 will get stuck with the infamous "INIT: Id 0 >>respawning too fast" message at boot and consequently neither run the job >>nor terminate. >> >>So far I have found mentions of two possible causes for this problem: 1) >>wrong name of the tty device in inittab 2) /lib/tls problem. Neither >>applies in my case (/dev/tty0 is correct, and I have already renamed >>/lib/tls, just in case). > > > These would cause problems all the time, not sporadically as you're seeing. > >>As I can reproduce the problem "statistically" (quite reliably in the >>cluster context) but not at will when running a single instance from the >>command line, my question is: how should I proceed about troubleshooting >>it? Are there any locations in the UML kernel code where I could insert >>some debug statements (or maybe delays? maybe the problem is >>timing-related somehow?) to gather useful diagnostic information? > > > Is it possible that it is caused by confusion about how quickly real > time is progressing compared to how much computation is happening in > that time? By default, UML will match its time to the host, with the > effect that, on a busy system, it will see time progressing quickly > compared to the work it's doing. > > If so, then disable CONFIG_UML_REAL_TIME_CLOCK, and use > 2.6.21-rc7-mm2, which has a fix in this area, and see if that makes > any difference.
Jeff, I'm having trouble applying the 2.6.21-rc7-mm2 patch against 2.6.21 sources - lots of rejected hunks (but not all) when I run patch -p1 < 2.6.21-rc7-mm2, and the kernel does not compile after that. I have never used mm kernels before and Google did not help identify my mistake. Can you give me a hint about how/against which target to apply this patch? Thanks! -JPL ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ User-mode-linux-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/user-mode-linux-user
