> From: Stephen Hemminger [mailto:step...@networkplumber.org]
> Sent: Saturday, 29 January 2022 02.11
> 
> The glibc backtrace_symbols() calls malloc which makes it
> dangerous to use rte_dump_stack() in a signal handler that
> is handling errors that maybe due to memory corruption.

Yes. We have experienced that problem with backtrace_symbols(); so as a 
workaround, our failure signal handler dumps all other information first, and 
calls backtrace_symbols() last, in case it crashes.

> 
> Instead, use dladdr() to lookup up symbols incrementally.

I took a brief look at the dladdr() source code, and it looks good to me.

> 
> The format of the messages is based on what X org server
> has been doing for many years. It changes from bottom up
> to top down order.

Good idea. Seems more logical.

> 
> Bugzilla ID: 929
> Signed-off-by: Stephen Hemminger <step...@networkplumber.org>
> ---
>  lib/eal/linux/eal_debug.c | 45 ++++++++++++++++++++++++++++-----------
>  1 file changed, 32 insertions(+), 13 deletions(-)
> 
> diff --git a/lib/eal/linux/eal_debug.c b/lib/eal/linux/eal_debug.c
> index 64dab4e0da24..bf232f72f402 100644
> --- a/lib/eal/linux/eal_debug.c
> +++ b/lib/eal/linux/eal_debug.c
> @@ -4,6 +4,7 @@
> 
>  #ifdef RTE_BACKTRACE
>  #include <execinfo.h>
> +#include <dlfcn.h>
>  #endif
>  #include <stdarg.h>
>  #include <signal.h>
> @@ -18,26 +19,44 @@
> 
>  #define BACKTRACE_SIZE 256
> 
> -/* dump the stack of the calling core */
> +/* Dump the stack of the calling core
> + *
> + * Note: this requires some careful usage in order to
> + * stay safe in case where called from a signal
> + * handler and the malloc pool may be corrupted.
> + */
>  void rte_dump_stack(void)
>  {
>  #ifdef RTE_BACKTRACE
>       void *func[BACKTRACE_SIZE];
> -     char **symb = NULL;
> -     int size;
> +     int i, size;
> 
>       size = backtrace(func, BACKTRACE_SIZE);
> -     symb = backtrace_symbols(func, size);
> -
> -     if (symb == NULL)
> -             return;
> 
> -     while (size > 0) {
> -             rte_log(RTE_LOG_ERR, RTE_LOGTYPE_EAL,
> -                     "%d: [%s]\n", size, symb[size - 1]);
> -             size --;
> +     for (i = 0; i < size; i++) {
> +             void *pc = func[i];
> +             const char *fname;
> +             Dl_info info;
> +
> +             if (dladdr(pc, &info) == 0) {
> +                     rte_log(RTE_LOG_ERR, RTE_LOGTYPE_EAL,
> +                             "%d: ?? [%p]\n", i, pc);
> +                     continue;
> +             }
> +
> +             fname = (info.dli_fname && *info.dli_fname) ?
> info.dli_fname : "(vdso)";
> +             if (info.dli_saddr != NULL)
> +                     rte_log(RTE_LOG_ERR, RTE_LOGTYPE_EAL,
> +                             "%d: %s (%s+%#tx) [%p]\n",
> +                             i, fname, info.dli_sname,
> +                             (ptrdiff_t)((uintptr_t)pc -
> (uintptr_t)info.dli_saddr),
> +                             pc);
> +             else
> +                     rte_log(RTE_LOG_ERR, RTE_LOGTYPE_EAL,
> +                             "%d: %s (%p+%#tx) [%p]\n",
> +                             i, fname, info.dli_fbase,
> +                             (ptrdiff_t)((uintptr_t)pc -
> (uintptr_t)info.dli_fbase),
> +                             pc);
>       }
> -
>       free(symb);

Probably something is lost in formatting here, but free(symb) must also be 
removed.

>  #endif /* RTE_BACKTRACE */
>  }
> --
> 2.34.1
> 

Great improvement, Stephen!

Acked-by: Morten Brørup <m...@smartsharesystems.com>

Reply via email to