On Fri, Feb 19, 2016 at 07:11:52AM +0000, [email protected] wrote: > Hi, > > > > My job gets aborted after a while with exit status 135 > > > > failed 100 : assumedly after job > > exit_status 135 > > > > 02/17/2016 11:25:10|qmaster|master1|W|job 428284.1 failed on host test1 > assumedly after job because: job 428284.1 died through signal BUS (7) > > > > Job submitted with same resources sometimes get succeeded. > > > > I tried to increase the h_vmem size and submit again but I face the same > result, > > > > Can you please help me in finding the reason for this kind of behavior. > > > I don't think SIGBUS is likely to be grid engine related per se. A quick google suggests that one possible cause of SIGBUS errors is trying to access part of an mmap'd file that no longer exists because the file has been truncated since it was mmap'd.
This is the sort of thing that would be more likely to show up on a cluster where you might have multiple copies of a program running that are all manipulating the same file on a shared file system. If this is the case and you can identify the problem file you might be able to avoid the error by working on a private copy of the file rather than a shared one. William
signature.asc
Description: Digital signature
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
