Hi Heinrich, On Mon, 5 Jul 2021 at 11:31, Heinrich Schuchardt <xypron.g...@gmx.de> wrote: > > On 7/4/21 10:15 PM, Simon Glass wrote: > > Hi Heinrich, > > > > On Sun, 6 Jun 2021 at 15:35, Heinrich Schuchardt <xypron.g...@gmx.de> wrote: > >> > >> Am 6. Juni 2021 20:07:31 MESZ schrieb Sean Anderson <sean...@gmail.com>: > >>> On 6/6/21 1:57 PM, Heinrich Schuchardt wrote: > >>>> On 6/6/21 7:52 PM, Sean Anderson wrote: > >>>>> On 6/6/21 1:28 PM, Heinrich Schuchardt wrote: > >>>>>> On 6/6/21 6:44 PM, Simon Glass wrote: > >>>>>>> Hi Heinrich, > >>>>>>> > >>>>>>> On Mon, 22 Mar 2021 at 18:56, Simon Glass <s...@chromium.org> > >>> wrote: > >>>>>>>> > >>>>>>>> Hi Heinrich, > >>>>>>>> > >>>>>>>> On Mon, 22 Mar 2021 at 23:02, Heinrich Schuchardt > >>>>>>>> <xypron.g...@gmx.de> wrote: > >>>>>>>>> > >>>>>>>>> On 22.03.21 06:21, Simon Glass wrote: > >>>>>>>>>> At present if sandbox crashes it prints a message and tries to > >>>>>>>>>> exit. But > >>>>>>>>>> with the recently introduced signal handler, it often seems to > >>> get > >>>>>>>>>> stuck > >>>>>>>>>> in a loop until the stack overflows: > >>>>>>>>>> > >>>>>>>>>> Segmentation violation > >>>>>>>>>> > >>>>>>>>>> Segmentation violation > >>>>>>>>>> > >>>>>>>>>> Segmentation violation > >>>>>>>>>> > >>>>>>>>>> Segmentation violation > >>>>>>>>>> > >>>>>>>>>> Segmentation violation > >>>>>>>>>> > >>>>>>>>>> Segmentation violation > >>>>>>>>>> > >>>>>>>>>> Segmentation violation > >>>>>>>>>> ... > >>>>>>>>> > >>>>>>>>> Hello Simon, > >>>>>>>>> > >>>>>>>>> do you have a reproducible example? I never have seen this. > >>>>>>>> > >>>>>>>> https://source.denx.de/u-boot/custodians/u-boot-dm/-/jobs/242433 > >>>>>>>> > >>>>>>>> You need to run that commit with pytest though...it does not > >>> happen > >>>>>>>> when run directly. > >>>>>>>> > >>>>>>>> BTW this sems to expose some rather nasty bug in dlmalloc or how > >>> it is > >>>>>>>> used. I notice that as soon as the first test is run, the 'top' > >>> value > >>>>>>>> in dlmalloc is outside the range of the malloc pool, which seems > >>>>>>>> wrong. I wonder if there is something broken with how > >>>>>>>> dm_test_pre_run() and dm_test_post_run() work. > >>>>>>>> > >>>>>>>>> > >>>>>>>>> Corrupting gd could cause an endless recursive loop, as these > >>> lines > >>>>>>>>> follow printing the observed string: > >>>>>>>>> > >>>>>>>>> printf("pc = 0x%lx, ", pc); > >>>>>>>>> printf("pc_reloc =0x%lx\n\n", pc - gd->reloc_off); > >>>>>>>> > >>>>>>>> Yes I suspect printf() is dead. > >>>>>>>> > >>>>>>>>> > >>>>>>>>> If we remove SA_NODEFER from the signal mask in > >>> arch/sandbox/cpu/os.c, > >>>>>>>>> recursion cannot occur anymore. If a segmentation violation > >>> occurs > >>>>>>>>> inside the handler it will be delegated to the default handler. > >>>>>>>>> > >>>>>>>>> Furthermore we could consider removing the signal handler at the > >>> start > >>>>>>>>> of os_signal_action(). > >>>>>>>> > >>>>>>>> The issue is that if you get a segfault you really don't know if > >>> you > >>>>>>>> can continue and do anything else. > >>>>>>>> > >>>>>>>> What is the goal with the signal handler? I don't think the user > >>> can > >>>>>>>> do anything about it. > >>>>>> > >>>>>> Hello Simon, > >>>>>> > >>>>>> the signal handler prints out the crash location and this makes > >>>>>> analyzing problems much easier. It proved valuable to me several > >>> times. > >>>>> > >>>>> Can't you just rerun with gdb? > >>>> > >>>> This would require that the problem is easily reproducible which may > >>> not > >>>> be the case. > >>> > >>> Hm, perhaps you could keep track of how many times we've tried to catch > >>> a signal, and bail if this is the second time around. E.g. something > >>> like > >>> > >> > >> Removing SA_NODEFER from the signal mask will let the OS kick in if an > >> exception occurs in a signal handler. > >> > >> No counter is needed. > > > > Yes that is correct. > > > > However I am still going to apply this patch, since I would prefer > > that the default behaviour is to exit with a signal, rather than > > continue. It indicates some sort of error (except if we are running a > > strange test), so it seems wrong to try to continue. It may produce > > other issues and sandbox is in an unknown state. > > > > I am happy to discuss another way / patch for doing what you need. > > > Your patch is not needed to exit if a signal occurs. > If the U-Boot sandbox resets or exits is controlled by SANDBOX_CRASH_RESET. > > As said, to solve your problem just remove SA_NODEFER from the signal mask.
Perhaps we should discuss this on a contributor call. This thread is getting too long...! Regards, Simon