Public bug reported: Upstart sometimes aborts on a stateful re-execution triggered by "telinit u":
job.c:1977: Assertion failed in job_deserialise: job->kill_process Caught abort, core dumped init:job.c:1977: Assertion failed in job_deserialise: job->kill_process [ 69.668199] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000600 The attached file (sessions.json) is a salvaged dump of the Upstart state that triggers the assertion failure; the problem evidently occurs while processing the following piece: [...] "name": "", "path": "\/com\/ubuntu\/Upstart\/jobs\/ureadahead\/_", "goal": "JOB_STOP", "state": "JOB_KILLED", [...] "kill_timer": { "timeout": 180, "due": 245 }, "kill_process": "PROCESS_MAIN", [...] The issue has been caught in the package ubuntu-1.12.1 (Ubuntu 14.04) and is caused by the following code: [init/job.c] 1954 json_kill_timer = json_object_object_get (json, "kill_timer"); 1955 1956 if (json_kill_timer) { [...] 1973 nih_local NihTimer *kill_timer = job_deserialise_kill_timer (json_kill_timer); 1974 if (! kill_timer) 1975 goto error; 1976 1977 nih_assert (job->kill_process); 1978 job_process_set_kill_timer (job, job->kill_process, 1979 kill_timer->timeout); 1980 job_process_adj_kill_timer (job, kill_timer->due); 1981 } The assertion (job->kill_process) fails in the routine job_deserialise() if the deserialised job has an associated kill timer and the field kill_process == PROCESS_MAIN. It seems the issue might still affect the trunk as well: there're no similar checks in the routines job_process_kill() and job_serialise(), so if the Upstart state is serialised after the job_process_kill() but before the job kill timer fires then the resulting state representation cannot be restored since job->kill_timer is non-NULL and job->kill_process isn't PROCESS_INVALID that is a result of job_process_set_kill_timer() operation. Probably the assertion in question should read (job->kill_process != PROCESS_INVALID) if job_process_set_kill_timer() is assumed to operate correctly. Unfortunately the issue is extremely difficult to reproduce so additional diagnostics might be difficult to perform and it might kill the race that triggers the issue. ** Affects: upstart (Ubuntu) Importance: Undecided Status: New ** Attachment added: "Serialised Upstart state dump" https://bugs.launchpad.net/bugs/1514609/+attachment/4515781/+files/sessions.json -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to upstart in Ubuntu. https://bugs.launchpad.net/bugs/1514609 Title: Deserialising a job with the attribute "kill_timer" and "kill_process"="PROCESS_MAIN" results in abort Status in upstart package in Ubuntu: New Bug description: Upstart sometimes aborts on a stateful re-execution triggered by "telinit u": job.c:1977: Assertion failed in job_deserialise: job->kill_process Caught abort, core dumped init:job.c:1977: Assertion failed in job_deserialise: job->kill_process [ 69.668199] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000600 The attached file (sessions.json) is a salvaged dump of the Upstart state that triggers the assertion failure; the problem evidently occurs while processing the following piece: [...] "name": "", "path": "\/com\/ubuntu\/Upstart\/jobs\/ureadahead\/_", "goal": "JOB_STOP", "state": "JOB_KILLED", [...] "kill_timer": { "timeout": 180, "due": 245 }, "kill_process": "PROCESS_MAIN", [...] The issue has been caught in the package ubuntu-1.12.1 (Ubuntu 14.04) and is caused by the following code: [init/job.c] 1954 json_kill_timer = json_object_object_get (json, "kill_timer"); 1955 1956 if (json_kill_timer) { [...] 1973 nih_local NihTimer *kill_timer = job_deserialise_kill_timer (json_kill_timer); 1974 if (! kill_timer) 1975 goto error; 1976 1977 nih_assert (job->kill_process); 1978 job_process_set_kill_timer (job, job->kill_process, 1979 kill_timer->timeout); 1980 job_process_adj_kill_timer (job, kill_timer->due); 1981 } The assertion (job->kill_process) fails in the routine job_deserialise() if the deserialised job has an associated kill timer and the field kill_process == PROCESS_MAIN. It seems the issue might still affect the trunk as well: there're no similar checks in the routines job_process_kill() and job_serialise(), so if the Upstart state is serialised after the job_process_kill() but before the job kill timer fires then the resulting state representation cannot be restored since job->kill_timer is non-NULL and job->kill_process isn't PROCESS_INVALID that is a result of job_process_set_kill_timer() operation. Probably the assertion in question should read (job->kill_process != PROCESS_INVALID) if job_process_set_kill_timer() is assumed to operate correctly. Unfortunately the issue is extremely difficult to reproduce so additional diagnostics might be difficult to perform and it might kill the race that triggers the issue. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/upstart/+bug/1514609/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp