[Bug 1826811] Re: Valgrind unhandled instruction 0xD5380000 on Aarch64

Bug Watch Updater Mon, 09 Sep 2019 13:01:30 -0700

Launchpad has imported 15 comments from the remote bug at
https://bugzilla.redhat.com/show_bug.cgi?id=1464211.

If you reply to an imported comment from within Launchpad, your comment
will be sent to the remote bug automatically. Read more about
Launchpad's inter-bugtracker facilities at
https://help.launchpad.net/InterBugTracking.

------------------------------------------------------------------------
On 2017-06-22T16:23:16+00:00 fweimer wrote:

+++ This bug was initially created as a clone of Bug #1464085 +++

valgrind currently does not know anything about the CPUID flag added to
the HWCAP auxv entry in kernel 4.11.  It passes this flag through to
applications, but it will then choke when the application uses it, like
this:

ARM64 front end: branch_etc
disInstr(arm64): unhandled instruction 0xD5380000
disInstr(arm64): 1101'0101 0011'1000 0000'0000 0000'0000
==924== valgrind: Unrecognised instruction at address 0x11f548.
==924==    at 0x11F548: init_cpu_features (cpu-features.c:32)
==924==    by 0x11F548: dl_platform_init (dl-machine.h:241)
==924==    by 0x11F548: _dl_sysdep_start (dl-sysdep.c:231)
==924==    by 0x10981B: _dl_start_final (rtld.c:412)
==924==    by 0x109AAB: _dl_start (rtld.c:520)

The crashing instruction is the mrs in the glibc startup code, which
means that currently no applications run under valgrind:

  if (hwcap & HWCAP_CPUID)
    {
      register uint64_t id = 0;
      asm volatile ("mrs %0, midr_el1" : "=r"(id));
      cpu_features->midr_el1 = id;
    }
  else
    cpu_features->midr_el1 = 0;

Perhaps valgrind should mask all the HWCAP bits it knows nothing about.

Workaround: Run with “LD_HWCAP_MASK=1”.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/valgrind/+bug/1826811/comments/0

------------------------------------------------------------------------
On 2017-06-23T10:52:41+00:00 mjw wrote:

See also upstream https://bugs.kde.org/show_bug.cgi?id=381556
arm64: Handle feature registers access on 4.11 Linux kernel or later

For now worked around in valgrind valgrind-3.13.0-3.fc27 as suggested in
the original description of this bug:

--- a/coregrind/m_initimg/initimg-linux.c
+++ b/coregrind/m_initimg/initimg-linux.c
@@ -703,6 +703,12 @@ Addr setup_client_stack( void*  init_sp,
                   (and anything above) are not supported by Valgrind. */
                auxv->u.a_val &= VKI_HWCAP_S390_TE - 1;
             }
+#           elif defined(VGP_arm64_linux)
+            {
+               /* Linux 4.11 started pupulating this for arm64, but we
+                  currently don't support any. */
+               auxv->u.a_val = 0;
+            }
 #           endif
             break;
 #        if defined(VGP_ppc64be_linux) || defined(VGP_ppc64le_linux)

Keeping this bug open to see how upstream resolves this.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/valgrind/+bug/1826811/comments/1

------------------------------------------------------------------------
On 2017-06-29T20:11:01+00:00 updates wrote:

valgrind-3.13.0-4.fc26 has been submitted as an update to Fedora 26.
https://bodhi.fedoraproject.org/updates/FEDORA-2017-4315a2f0cd

Reply at:
https://bugs.launchpad.net/ubuntu/+source/valgrind/+bug/1826811/comments/2

------------------------------------------------------------------------
On 2017-06-30T20:25:29+00:00 updates wrote:

valgrind-3.13.0-4.fc26 has been pushed to the Fedora 26 testing repository. If 
problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: 
https://bodhi.fedoraproject.org/updates/FEDORA-2017-4315a2f0cd

Reply at:
https://bugs.launchpad.net/ubuntu/+source/valgrind/+bug/1826811/comments/3

------------------------------------------------------------------------
On 2017-07-07T23:05:15+00:00 updates wrote:

valgrind-3.13.0-4.fc26 has been pushed to the Fedora 26 stable
repository. If problems still persist, please make note of it in this
bug report.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/valgrind/+bug/1826811/comments/4

------------------------------------------------------------------------
On 2018-06-13T19:15:57+00:00 rclark wrote:

(In reply to Mark Wielaard from comment #1)
> See also upstream https://bugs.kde.org/show_bug.cgi?id=381556
> arm64: Handle feature registers access on 4.11 Linux kernel or later
> 
> For now worked around in valgrind valgrind-3.13.0-3.fc27 as suggested in the
> original description of this bug:
> 
> --- a/coregrind/m_initimg/initimg-linux.c
> +++ b/coregrind/m_initimg/initimg-linux.c
> @@ -703,6 +703,12 @@ Addr setup_client_stack( void*  init_sp,
>                    (and anything above) are not supported by Valgrind. */
>                 auxv->u.a_val &= VKI_HWCAP_S390_TE - 1;
>              }
> +#           elif defined(VGP_arm64_linux)
> +            {
> +               /* Linux 4.11 started pupulating this for arm64, but we
> +                  currently don't support any. */
> +               auxv->u.a_val = 0;
> +            }
>  #           endif
>              break;
>  #        if defined(VGP_ppc64be_linux) || defined(VGP_ppc64le_linux)
> 
> Keeping this bug open to see how upstream resolves this.

hmm, I just saw the same issue on rawhide (valgrind 1:3.13.0-18.fc29).. did a 
patch get lost from the spec file?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/valgrind/+bug/1826811/comments/5

------------------------------------------------------------------------
On 2018-06-13T19:24:33+00:00 mjw wrote:

(In reply to Rob Clark from comment #5)
> hmm, I just saw the same issue on rawhide (valgrind 1:3.13.0-18.fc29).. did
> a patch get lost from the spec file?

The patch (valgrind-3.13.0-arm64-hwcap.patch) is there (and still the
same, no change upstream), and applied. Is the issue exactly the same as
in the description? Could you paste the command line and the valgrind
error message?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/valgrind/+bug/1826811/comments/6

------------------------------------------------------------------------
On 2018-06-13T19:36:20+00:00 rclark wrote:

cmdline:

  valgrind --leak-check=yes ./deqp-gles31 --deqp-case=dEQP-
GLES31.functional.ssbo.layout.random.arrays_of_arrays.1

(debuging some dEQP test crashes in mesa/freedreno)

output (without LD_HWCAP_MASK=1 which works around the issue) (also
attached):

==32073== Memcheck, a memory error detector
==32073== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==32073== Using Valgrind-3.13.0.SVN and LibVEX; rerun with -h for copyright info
==32073== Command: ./deqp-gles31 --deqp-visibility=hidden 
--deqp-case=dEQP-GLES31.functional.ssbo.layout.random.arrays_of_arrays.1 
--deqp-log-filename=results/dEQP-GLES31.functional.ssbo.layout.random.arrays_of_arrays.1.qpa
==32073== 
ARM64 front end: branch_etc
disInstr(arm64): unhandled instruction 0xD5380000
disInstr(arm64): 1101'0101 0011'1000 0000'0000 0000'0000
==32073== valgrind: Unrecognised instruction at address 0x40150cc.
==32073==    at 0x40150CC: init_cpu_features (cpu-features.c:72)
==32073==    by 0x40150CC: dl_platform_init (dl-machine.h:208)
==32073==    by 0x40150CC: _dl_sysdep_start (dl-sysdep.c:231)
==32073==    by 0x40018C3: _dl_start_final (rtld.c:411)
==32073==    by 0x4001B3F: _dl_start (rtld.c:520)
==32073==    by 0x4001047: ??? (in /usr/lib64/ld-2.27.9000.so)
==32073== Your program just tried to execute an instruction that Valgrind
==32073== did not recognise.  There are two possible reasons for this.
==32073== 1. Your program has a bug and erroneously jumped to a non-code
==32073==    location.  If you are running Memcheck and you just saw a
==32073==    warning about a bad jump, it's probably your program's fault.
==32073== 2. The instruction is legitimate but Valgrind doesn't handle it,
==32073==    i.e. it's Valgrind's fault.  If you think this is the case or
==32073==    you are not sure, please let us know and we'll try to fix it.
==32073== Either way, Valgrind will now raise a SIGILL signal which will
==32073== probably kill your program.
==32073== 
==32073== Process terminating with default action of signal 4 (SIGILL): dumping 
core
==32073==  Illegal opcode at address 0x40150CC
==32073==    at 0x40150CC: init_cpu_features (cpu-features.c:72)
==32073==    by 0x40150CC: dl_platform_init (dl-machine.h:208)
==32073==    by 0x40150CC: _dl_sysdep_start (dl-sysdep.c:231)
==32073==    by 0x40018C3: _dl_start_final (rtld.c:411)
==32073==    by 0x4001B3F: _dl_start (rtld.c:520)
==32073==    by 0x4001047: ??? (in /usr/lib64/ld-2.27.9000.so)

valgrind: m_coredump/coredump-elf.c:506 (fill_fpu): Assertion 'Unimplemented 
functionality' failed.
valgrind: valgrind

host stacktrace:
==32073==    at 0x3803E0FC: show_sched_status_wrk (m_libcassert.c:378)
==32073==    by 0x3803E22B: report_and_quit (m_libcassert.c:449)
==32073==    by 0x3803E387: vgPlain_assert_fail (m_libcassert.c:515)
==32073==    by 0x380706FB: fill_fpu.isra.4 (coredump-elf.c:506)
==32073==    by 0x380708CF: dump_one_thread (coredump-elf.c:563)
==32073==    by 0x380708CF: make_elf_coredump (coredump-elf.c:667)
==32073==    by 0x380708CF: vgPlain_make_coredump (coredump-elf.c:748)
==32073==    by 0x3805654F: default_action (m_signals.c:1937)
==32073==    by 0x3805654F: deliver_signal (m_signals.c:1997)
==32073==    by 0x38056D0B: vgPlain_synth_sigill (m_signals.c:2106)
==32073==    by 0x380982DB: vgPlain_scheduler (scheduler.c:1577)
==32073==    by 0x380A939F: thread_wrapper (syswrap-linux.c:103)
==32073==    by 0x380A939F: run_a_thread_NORETURN (syswrap-linux.c:156)
==32073==    by 0xFFFFFFFFFFFFFFFF: ???

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable (lwpid 32073)
==32073==    at 0x40150CC: init_cpu_features (cpu-features.c:72)
==32073==    by 0x40150CC: dl_platform_init (dl-machine.h:208)
==32073==    by 0x40150CC: _dl_sysdep_start (dl-sysdep.c:231)
==32073==    by 0x40018C3: _dl_start_final (rtld.c:411)
==32073==    by 0x4001B3F: _dl_start (rtld.c:520)
==32073==    by 0x4001047: ??? (in /usr/lib64/ld-2.27.9000.so)

Note: see also the FAQ in the source distribution.
It contains workarounds to several common problems.
In particular, if Valgrind aborted or crashed after
identifying problems in your program, there's a good chance
that fixing those problems will prevent Valgrind aborting or
crashing, especially if it happened in m_mallocfree.c.

If that doesn't help, please report this bug to: www.valgrind.org

In the bug report, send all the above text, the valgrind
version, and what OS and version you are using.  Thanks.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/valgrind/+bug/1826811/comments/7

------------------------------------------------------------------------
On 2018-06-13T19:36:52+00:00 rclark wrote:

Created attachment 1451010
valgrind output

Reply at:
https://bugs.launchpad.net/ubuntu/+source/valgrind/+bug/1826811/comments/8

------------------------------------------------------------------------
On 2018-06-13T19:47:16+00:00 fweimer wrote:

That's from the midr_el1 read:

  /* If there was no useful tunable override, query the MIDR if the kernel
     allows it.  */
  if (midr == UINT64_MAX)
    {
      if (hwcap & HWCAP_CPUID)
        asm volatile ("mrs %0, midr_el1" : "=r"(midr));
      else
        midr = 0;
    }

So it looks like we get the wrong (host) hwcap value without masking.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/valgrind/+bug/1826811/comments/9

------------------------------------------------------------------------
On 2018-06-13T19:48:17+00:00 fweimer wrote:

It might be helpful to run “LD_SHOW_AUXV=1 /bin/true” with and without
valgrind.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/valgrind/+bug/1826811/comments/10

------------------------------------------------------------------------
On 2018-06-13T20:05:40+00:00 rclark wrote:

so, quick disclaimer, but I'm running a non-standard kernel atm, if any
kernel config/etc could effect this, I can retry w/ a vanilla kernel
(but not immediately, and possibly not on the same device)

(In reply to Florian Weimer from comment #10)
> It might be helpful to run “LD_SHOW_AUXV=1 /bin/true” with and without
> valgrind.

[robclark@db820c:~]$ LD_SHOW_AUXV=1 /bin/true
AT_SYSINFO_EHDR: 0xffff81924000
AT_HWCAP:        8ff
AT_PAGESZ:       4096
AT_CLKTCK:       100
AT_PHDR:         0xaaaac8ba2040
AT_PHENT:        56
AT_PHNUM:        9
AT_BASE:         0xffff818f6000
AT_FLAGS:        0x0
AT_ENTRY:        0xaaaac8ba38d0
AT_UID:          1000
AT_EUID:         1000
AT_GID:          1000
AT_EGID:         1000
AT_SECURE:       0
AT_RANDOM:       0xfffff1883f68
AT_EXECFN:       /bin/true
AT_PLATFORM:     aarch64
[robclark@db820c:~]$ 
[robclark@db820c:~]$ LD_SHOW_AUXV=1 valgrind --leak-check=yes /bin/true
AT_SYSINFO_EHDR: 0xffff9eb51000
AT_HWCAP:        8ff
AT_PAGESZ:       4096
AT_CLKTCK:       100
AT_PHDR:         0x400040
AT_PHENT:        56
AT_PHNUM:        9
AT_BASE:         0xffff9eb23000
AT_FLAGS:        0x0
AT_ENTRY:        0x4011d0
AT_UID:          1000
AT_EUID:         1000
AT_GID:          1000
AT_EGID:         1000
AT_SECURE:       0
AT_RANDOM:       0xffffc66278c8
AT_EXECFN:       /usr/local/bin/valgrind
AT_PLATFORM:     aarch64
==1668== Memcheck, a memory error detector
==1668== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==1668== Using Valgrind-3.13.0.SVN and LibVEX; rerun with -h for copyright info
==1668== Command: /bin/true
==1668== 
ARM64 front end: branch_etc
disInstr(arm64): unhandled instruction 0xD5380000
disInstr(arm64): 1101'0101 0011'1000 0000'0000 0000'0000
==1668== valgrind: Unrecognised instruction at address 0x40150cc.
==1668==    at 0x40150CC: init_cpu_features (cpu-features.c:72)
==1668==    by 0x40150CC: dl_platform_init (dl-machine.h:208)
==1668==    by 0x40150CC: _dl_sysdep_start (dl-sysdep.c:231)
==1668==    by 0x40018C3: _dl_start_final (rtld.c:411)
==1668==    by 0x4001B3F: _dl_start (rtld.c:520)
==1668==    by 0x4001047: ??? (in /usr/lib64/ld-2.27.9000.so)
==1668== Your program just tried to execute an instruction that Valgrind
==1668== did not recognise.  There are two possible reasons for this.
==1668== 1. Your program has a bug and erroneously jumped to a non-code
==1668==    location.  If you are running Memcheck and you just saw a
==1668==    warning about a bad jump, it's probably your program's fault.
==1668== 2. The instruction is legitimate but Valgrind doesn't handle it,
==1668==    i.e. it's Valgrind's fault.  If you think this is the case or
==1668==    you are not sure, please let us know and we'll try to fix it.
==1668== Either way, Valgrind will now raise a SIGILL signal which will
==1668== probably kill your program.
==1668== 
==1668== Process terminating with default action of signal 4 (SIGILL): dumping 
core
==1668==  Illegal opcode at address 0x40150CC
==1668==    at 0x40150CC: init_cpu_features (cpu-features.c:72)
==1668==    by 0x40150CC: dl_platform_init (dl-machine.h:208)
==1668==    by 0x40150CC: _dl_sysdep_start (dl-sysdep.c:231)
==1668==    by 0x40018C3: _dl_start_final (rtld.c:411)
==1668==    by 0x4001B3F: _dl_start (rtld.c:520)
==1668==    by 0x4001047: ??? (in /usr/lib64/ld-2.27.9000.so)

valgrind: m_coredump/coredump-elf.c:506 (fill_fpu): Assertion 'Unimplemented 
functionality' failed.
valgrind: valgrind

host stacktrace:
==1668==    at 0x3803E0FC: show_sched_status_wrk (m_libcassert.c:378)
==1668==    by 0x3803E22B: report_and_quit (m_libcassert.c:449)
==1668==    by 0x3803E387: vgPlain_assert_fail (m_libcassert.c:515)
==1668==    by 0x380706FB: fill_fpu.isra.4 (coredump-elf.c:506)
==1668==    by 0x380708CF: dump_one_thread (coredump-elf.c:563)
==1668==    by 0x380708CF: make_elf_coredump (coredump-elf.c:667)
==1668==    by 0x380708CF: vgPlain_make_coredump (coredump-elf.c:748)
==1668==    by 0x3805654F: default_action (m_signals.c:1937)
==1668==    by 0x3805654F: deliver_signal (m_signals.c:1997)
==1668==    by 0x38056D0B: vgPlain_synth_sigill (m_signals.c:2106)
==1668==    by 0x380982DB: vgPlain_scheduler (scheduler.c:1577)
==1668==    by 0x380A939F: thread_wrapper (syswrap-linux.c:103)
==1668==    by 0x380A939F: run_a_thread_NORETURN (syswrap-linux.c:156)
==1668==    by 0xFFFFFFFFFFFFFFFF: ???

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable (lwpid 1668)
==1668==    at 0x40150CC: init_cpu_features (cpu-features.c:72)
==1668==    by 0x40150CC: dl_platform_init (dl-machine.h:208)
==1668==    by 0x40150CC: _dl_sysdep_start (dl-sysdep.c:231)
==1668==    by 0x40018C3: _dl_start_final (rtld.c:411)
==1668==    by 0x4001B3F: _dl_start (rtld.c:520)
==1668==    by 0x4001047: ??? (in /usr/lib64/ld-2.27.9000.so)

Note: see also the FAQ in the source distribution.
It contains workarounds to several common problems.
In particular, if Valgrind aborted or crashed after
identifying problems in your program, there's a good chance
that fixing those problems will prevent Valgrind aborting or
crashing, especially if it happened in m_mallocfree.c.

If that doesn't help, please report this bug to: www.valgrind.org

In the bug report, send all the above text, the valgrind
version, and what OS and version you are using.  Thanks.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/valgrind/+bug/1826811/comments/11

------------------------------------------------------------------------
On 2018-06-13T20:14:44+00:00 mjw wrote:

hohum, so that shows the HWCAP of valgrind itself, which then execs
/bin/true and crashes before showing the auxv Maybe try:

 LD_HWCAP_MASK=1 LD_SHOW_AUXV=1 valgrind -q /bin/true

Reply at:
https://bugs.launchpad.net/ubuntu/+source/valgrind/+bug/1826811/comments/12

------------------------------------------------------------------------
On 2018-06-14T11:50:06+00:00 rclark wrote:

heh, so this makes my problem a bit more obvious.. at one point in the
past I had built my own valgrind (in /usr/local/bin which was ahead of
/usr/bin in $PATH).. so in fact the problem all along was not with
fedora's valgrind but pebkac ;-)

/me reaches for brown paper bag

------
[robclark@db820c:~]$ LD_HWCAP_MASK=1 LD_SHOW_AUXV=1 valgrind -q /bin/true
AT_SYSINFO_EHDR: 0xffffb56ca000
AT_HWCAP:        8ff
AT_PAGESZ:       4096
AT_CLKTCK:       100
AT_PHDR:         0x400040
AT_PHENT:        56
AT_PHNUM:        9
AT_BASE:         0xffffb569c000
AT_FLAGS:        0x0
AT_ENTRY:        0x4011d0
AT_UID:          1000
AT_EUID:         1000
AT_GID:          1000
AT_EGID:         1000
AT_SECURE:       0
AT_RANDOM:       0xffffd156b538
AT_EXECFN:       /usr/local/bin/valgrind
AT_PLATFORM:     aarch64
AT_HWCAP:        8ff
AT_PAGESZ:       4096
AT_CLKTCK:       100
AT_PHDR:         0x108040
AT_PHENT:        56
AT_PHNUM:        9
AT_BASE:         0x4000000
AT_FLAGS:        0x0
AT_ENTRY:        0x1098d0
AT_UID:          1000
AT_EUID:         1000
AT_GID:          1000
AT_EGID:         1000
AT_SECURE:       0
AT_RANDOM:       0xfff000fda
AT_EXECFN:       /bin/true
AT_PLATFORM:     aarch64

Reply at:
https://bugs.launchpad.net/ubuntu/+source/valgrind/+bug/1826811/comments/13

------------------------------------------------------------------------
On 2018-06-14T12:40:40+00:00 mjw wrote:

(In reply to Rob Clark from comment #13)
> heh, so this makes my problem a bit more obvious.. at one point in the past
> I had built my own valgrind (in /usr/local/bin which was ahead of /usr/bin
> in $PATH).. so in fact the problem all along was not with fedora's valgrind
> but pebkac ;-)
> 
> /me reaches for brown paper bag

No worries. Thanks for walking through it with us.
If there is any reason in the future to build an upstream valgrind please let 
me know. I am happy to backport any fixes to the fedora package.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/valgrind/+bug/1826811/comments/14

** Changed in: valgrind (Fedora)
       Status: Unknown => Fix Released

** Changed in: valgrind (Fedora)
   Importance: Unknown => Undecided

** Bug watch added: KDE Bug Tracking System #381556
   https://bugs.kde.org/show_bug.cgi?id=381556

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1826811

Title:
  Valgrind unhandled instruction 0xD5380000 on Aarch64

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/valgrind/+bug/1826811/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1826811] Re: Valgrind unhandled instruction 0xD5380000 on Aarch64

Reply via email to