On 7/4/21 10:15 PM, Simon Glass wrote:
Hi Heinrich,

On Sun, 6 Jun 2021 at 15:35, Heinrich Schuchardt <xypron.g...@gmx.de> wrote:

Am 6. Juni 2021 20:07:31 MESZ schrieb Sean Anderson <sean...@gmail.com>:
On 6/6/21 1:57 PM, Heinrich Schuchardt wrote:
On 6/6/21 7:52 PM, Sean Anderson wrote:
On 6/6/21 1:28 PM, Heinrich Schuchardt wrote:
On 6/6/21 6:44 PM, Simon Glass wrote:
Hi Heinrich,

On Mon, 22 Mar 2021 at 18:56, Simon Glass <s...@chromium.org>
wrote:

Hi Heinrich,

On Mon, 22 Mar 2021 at 23:02, Heinrich Schuchardt
<xypron.g...@gmx.de> wrote:

On 22.03.21 06:21, Simon Glass wrote:
At present if sandbox crashes it prints a message and tries to
exit. But
with the recently introduced signal handler, it often seems to
get
stuck
in a loop until the stack overflows:

Segmentation violation

Segmentation violation

Segmentation violation

Segmentation violation

Segmentation violation

Segmentation violation

Segmentation violation
...

Hello Simon,

do you have a reproducible example? I never have seen this.

https://source.denx.de/u-boot/custodians/u-boot-dm/-/jobs/242433

You need to run that commit with pytest though...it does not
happen
when run directly.

BTW this sems to expose some rather nasty bug in dlmalloc or how
it is
used. I notice that as soon as the first test is run, the 'top'
value
in dlmalloc is outside the range of the malloc pool, which seems
wrong. I wonder if there is something broken with how
dm_test_pre_run() and dm_test_post_run() work.


Corrupting gd could cause an endless recursive loop, as these
lines
follow printing the observed string:

          printf("pc = 0x%lx, ", pc);
          printf("pc_reloc =0x%lx\n\n", pc - gd->reloc_off);

Yes I suspect printf() is dead.


If we remove SA_NODEFER from the signal mask in
arch/sandbox/cpu/os.c,
recursion cannot occur anymore. If a segmentation violation
occurs
inside the handler it will be delegated to the default handler.

Furthermore we could consider removing the signal handler at the
start
of os_signal_action().

The issue is that if you get a segfault you really don't know if
you
can continue and do anything else.

What is the goal with the signal handler? I don't think the user
can
do anything about it.

Hello Simon,

the signal handler prints out the crash location and this makes
analyzing problems much easier. It proved valuable to me several
times.

Can't you just rerun with gdb?

This would require that the problem is easily reproducible which may
not
be the case.

Hm, perhaps you could keep track of how many times we've tried to catch
a signal, and bail if this is the second time around. E.g. something
like


Removing SA_NODEFER from the signal mask will let the OS kick in if an 
exception occurs in a signal handler.

No counter is needed.

Yes that is correct.

However I am still going to apply this patch, since I would prefer
that the default behaviour is to exit with a signal, rather than
continue. It indicates some sort of error (except if we are running a
strange test), so it seems wrong to try to continue. It may produce
other issues and sandbox is in an unknown state.

I am happy to discuss another way / patch for doing what you need.


Your patch is not needed to exit if a signal occurs.
If the U-Boot sandbox resets or exits is controlled by SANDBOX_CRASH_RESET.

As said, to solve your problem just remove SA_NODEFER from the signal mask.

Best regards

Heinrich

Reply via email to