Jeff Dike wrote:
> On Fri, May 04, 2007 at 07:30:36PM +0200, Jan Ploski wrote:
> 
>>I am experimenting with UML in a HPC cluster. What I do is basically start 
>>up 60 instances all at once, a bunch of instances on each hardware node, 
>>using the resource manager TORQUE. Each instance gets a different umid. 
>>The instances are configured to boot up, execute a job and halt after 
>>that. Most of the times it works very well. However, every now and then 
>>some instance of the 60 will get stuck with the infamous "INIT: Id 0 
>>respawning too fast" message at boot and consequently neither run the job 
>>nor terminate.
>>
>>So far I have found mentions of two possible causes for this problem: 1) 
>>wrong name of the tty device in inittab 2) /lib/tls problem. Neither 
>>applies in my case (/dev/tty0 is correct, and I have already renamed 
>>/lib/tls, just in case).
> 
> 
> These would cause problems all the time, not sporadically as you're seeing.
> 
>>As I can reproduce the problem "statistically" (quite reliably in the 
>>cluster context) but not at will when running a single instance from the 
>>command line, my question is: how should I proceed about troubleshooting 
>>it? Are there any locations in the UML kernel code where I could insert 
>>some debug statements (or maybe delays? maybe the problem is 
>>timing-related somehow?) to gather useful diagnostic information?
> 
> 
> Is it possible that it is caused by confusion about how quickly real
> time is progressing compared to how much computation is happening in
> that time?  By default, UML will match its time to the host, with the
> effect that, on a busy system, it will see time progressing quickly
> compared to the work it's doing.
> 
> If so, then disable CONFIG_UML_REAL_TIME_CLOCK, and use
> 2.6.21-rc7-mm2, which has a fix in this area, and see if that makes
> any difference.

Jeff,

I'm having trouble applying the 2.6.21-rc7-mm2 patch against 2.6.21 
sources - lots of rejected hunks (but not all) when I run patch -p1 < 
2.6.21-rc7-mm2, and the kernel does not compile after that. I have never 
used mm kernels before and Google did not help identify my mistake. Can 
you give me a hint about how/against which target to apply this patch? 
Thanks!

-JPL

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
User-mode-linux-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-user

Reply via email to