Re: hith latency when cpu is fully loaded
On 2019年11月19日 14:49, Jan Kiszka wrote: On 19.11.19 02:01, chensong via Xenomai wrote: Dear experts, i'm new in xenomai, i got an issue, here is the detail: Main processor architect: ARM64 phytium ft2000ahk Kernel release number: 4.14.4 cmdline:BOOT_IMAGE=/Image-tmp root=UUID=9fea0634-a9c9-4e9f-906c-9c36b7249822 console=ttyS1,115200 earlyprintk=uart8250-32bit,0x28001000 rw rootdelay=10 KEYBOARDTYPE=pc KEYTABLE=us security= xenomai release number:3.1-devel xenomai configuration: kylin@kylin-os:~/workspace/code/nudt-hgj-xenomai-tjrd$ grep configure config.status # Generated by configure. # Compiler output produced by configure, useful for debugging # configure, is in config.log if it exists. configured by ./configure, generated by GNU Autoconf 2.69, ac_configure_extra_args= ac_configure_extra_args="$ac_configure_extra_args --silent" set X /bin/bash './configure' '--with-core=cobalt' '--enable-smp' '--enable-pshared' $ac_configure_extra_args --no-create --no-recursion configure_time_dlsearch_path='/lib /usr/lib /lib/aarch64-linux-gnu /usr/lib/aarch64-linux-gnu /usr/lib/aarch64-linux-gnu/mesa-egl /usr/lib/aarch64-linux-gnu/mesa /usr/local/lib ' configure_time_lt_sys_library_path='' for var in reload_cmds old_postinstall_cmds old_postuninstall_cmds old_archive_cmds extract_expsyms_cmds old_archive_from_new_cmds old_archive_from_expsyms_cmds archive_cmds archive_expsym_cmds module_cmds module_expsym_cmds export_symbols_cmds prelink_cmds postlink_cmds postinstall_cmds postuninstall_cmds finish_cmds sys_lib_search_path_spec configure_time_dlsearch_path configure_time_lt_sys_library_path; do # on some systems where configure will not decide to define it. # Let's still pretend it is `configure' which instantiates (i.e., don't configure_input='Generated from '` `' by configure.' configure_input="$ac_file. $configure_input" case $configure_input in #( ac_sed_conf_input=`$as_echo "$configure_input" | *) ac_sed_conf_input=$configure_input;; s|@configure_input@|$ac_sed_conf_input|;t t $as_echo "/* $configure_input */" \ $as_echo "/* $configure_input */" \ # Libtool was configured on host `(hostname || uname -n) 2>/dev/null | sed 1q`: : \${LT_SYS_LIBRARY_PATH="$configure_time_lt_sys_library_path"} sys_lib_dlsearch_path_spec=$lt_configure_time_dlsearch_path # Explicit LT_SYS_LIBRARY_PATH set during ./configure time. configure_time_lt_sys_library_path=$lt_configure_time_lt_sys_library_path OR: kylin@kylin-os:~$ xeno-config --info Xenomai version: Xenomai/cobalt v3.1-devel -- # () Linux kylin-os 4.14.4.kylin.rt-1118-ipipe-trace+ #2 SMP PREEMPT Mon Nov 18 18:28:17 CST 2019 aarch64 aarch64 aarch64 GNU/Linux Kernel parameters: BOOT_IMAGE=/Image-tmp root=UUID=9fea0634-a9c9-4e9f-906c-9c36b7249822 console=ttyS1,115200 earlyprintk=uart8250-32bit,0x28001000 rw rootdelay=10 KEYBOARDTYPE=pc KEYTABLE=us security= I-pipe release #2 detected Cobalt core 3.1-devel detected Compiler: gcc version 5.4.0 20160609 (Ubuntu/Linaro 5.4.0-6kord1~16.04.10) Build args: --prefix=/usr --includedir=/usr/include/xenomai --mandir=/usr/share/man --with-testdir=/usr/lib/xenomai/testsuite --enable-smp --build aarch64-linux-gnu build_alias=aarch64-linux-gnu Desktop: kylin 4.0.2 (ubuntu likely desktop) Issue description: latency and cyclictest work fine in my system in most of cases, the worst latency is around 100us ~ 200us. however, when i ran a script to increase the cpu load in the system, the worst latency reached 2000us ~ 5000us or even worse. Basically, the script forks 6 processes by default and each process applies a four-pages buffer and keeps writing without any breath, no warning or error messages in dmesg. below is the script: #include #include #include #include #include #include #define PAGE_SIZE 4096 #define TEST_THREADS 6 unsigned int test_threads; void do_thread_test(void) { void *mm; char i = 0; printf("mem test thread start \n"); mm = malloc(PAGE_SIZE * 4); // 1M while(1) { for (i = 0; i<100; i++) memset(mm, i, PAGE_SIZE * 4); } You cannot run Xenomai threads at 100% on Linux. You need to leave some time for the rest of the system to do housekeeping. That explains the "deadlock" you see. If you turn on CONFIG_XENO_OPT_WATCHDOG, it will detect such mistakes and kick the task out of RT. Jan "do_thread_test" is not an RT task, it's running in linux domain and latency running in xenomai domain was affected. /Song
Re: Deadlock during debugging
On 18.11.19 18:31, Lange Norbert wrote: -Original Message- From: Jan Kiszka Sent: Montag, 18. November 2019 18:22 To: Lange Norbert ; Xenomai (xenomai@xenomai.org) Subject: Re: Deadlock during debugging NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR ATTACHMENTS. On 18.11.19 17:24, Lange Norbert via Xenomai wrote: One more, Note that there seem to be quite different reports, from a recursive fault to some threads getting marked as "runaway". I can reproduce the issue now easily, but its proprietary software I cant reach around. Understood. Will try to read something from the traces. This is a regression over 3.0 now, correct? No, can't say that. I had various recurring issues with 4.9, 4.14 and 4.19 kernels, aswell as 3.0 and and 3.1. It's hard to narrow down and often just vanished after a while, and my only gut-feeling is that condition variables are involved. I also have a couple cobalt threads *not* pinned to a single cpu. I'm only talking about the crash during debug - one issue after the other. Jan Atleast I can now say it’s a single app causing the issue, not using rtnet or having additional cobalt applications running. Since I can easily reproduce the issue, I will now try using debian's gcc-8, to rule out troubles with the toolchain. Norbert. This message and any attachments are solely for the use of the intended recipients. They may contain privileged and/or confidential information or other information protected from disclosure. If you are not an intended recipient, you are hereby notified that you received this email in error and that any review, dissemination, distribution or copying of this email and any attachment is strictly prohibited. If you have received this email in error, please contact the sender and delete the message and any attachment from your system. ANDRITZ HYDRO GmbH Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation Firmensitz/ Registered seat: Wien Firmenbuchgericht/ Court of registry: Handelsgericht Wien Firmenbuchnummer/ Company registration: FN 61833 g DVR: 0605077 UID-Nr.: ATU14756806 Thank You -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
Re: hith latency when cpu is fully loaded
On 19.11.19 02:01, chensong via Xenomai wrote: Dear experts, i'm new in xenomai, i got an issue, here is the detail: Main processor architect: ARM64 phytium ft2000ahk Kernel release number: 4.14.4 cmdline:BOOT_IMAGE=/Image-tmp root=UUID=9fea0634-a9c9-4e9f-906c-9c36b7249822 console=ttyS1,115200 earlyprintk=uart8250-32bit,0x28001000 rw rootdelay=10 KEYBOARDTYPE=pc KEYTABLE=us security= xenomai release number:3.1-devel xenomai configuration: kylin@kylin-os:~/workspace/code/nudt-hgj-xenomai-tjrd$ grep configure config.status # Generated by configure. # Compiler output produced by configure, useful for debugging # configure, is in config.log if it exists. configured by ./configure, generated by GNU Autoconf 2.69, ac_configure_extra_args= ac_configure_extra_args="$ac_configure_extra_args --silent" set X /bin/bash './configure' '--with-core=cobalt' '--enable-smp' '--enable-pshared' $ac_configure_extra_args --no-create --no-recursion configure_time_dlsearch_path='/lib /usr/lib /lib/aarch64-linux-gnu /usr/lib/aarch64-linux-gnu /usr/lib/aarch64-linux-gnu/mesa-egl /usr/lib/aarch64-linux-gnu/mesa /usr/local/lib ' configure_time_lt_sys_library_path='' for var in reload_cmds old_postinstall_cmds old_postuninstall_cmds old_archive_cmds extract_expsyms_cmds old_archive_from_new_cmds old_archive_from_expsyms_cmds archive_cmds archive_expsym_cmds module_cmds module_expsym_cmds export_symbols_cmds prelink_cmds postlink_cmds postinstall_cmds postuninstall_cmds finish_cmds sys_lib_search_path_spec configure_time_dlsearch_path configure_time_lt_sys_library_path; do # on some systems where configure will not decide to define it. # Let's still pretend it is `configure' which instantiates (i.e., don't configure_input='Generated from '` `' by configure.' configure_input="$ac_file. $configure_input" case $configure_input in #( ac_sed_conf_input=`$as_echo "$configure_input" | *) ac_sed_conf_input=$configure_input;; s|@configure_input@|$ac_sed_conf_input|;t t $as_echo "/* $configure_input */" \ $as_echo "/* $configure_input */" \ # Libtool was configured on host `(hostname || uname -n) 2>/dev/null | sed 1q`: : \${LT_SYS_LIBRARY_PATH="$configure_time_lt_sys_library_path"} sys_lib_dlsearch_path_spec=$lt_configure_time_dlsearch_path # Explicit LT_SYS_LIBRARY_PATH set during ./configure time. configure_time_lt_sys_library_path=$lt_configure_time_lt_sys_library_path OR: kylin@kylin-os:~$ xeno-config --info Xenomai version: Xenomai/cobalt v3.1-devel -- # () Linux kylin-os 4.14.4.kylin.rt-1118-ipipe-trace+ #2 SMP PREEMPT Mon Nov 18 18:28:17 CST 2019 aarch64 aarch64 aarch64 GNU/Linux Kernel parameters: BOOT_IMAGE=/Image-tmp root=UUID=9fea0634-a9c9-4e9f-906c-9c36b7249822 console=ttyS1,115200 earlyprintk=uart8250-32bit,0x28001000 rw rootdelay=10 KEYBOARDTYPE=pc KEYTABLE=us security= I-pipe release #2 detected Cobalt core 3.1-devel detected Compiler: gcc version 5.4.0 20160609 (Ubuntu/Linaro 5.4.0-6kord1~16.04.10) Build args: --prefix=/usr --includedir=/usr/include/xenomai --mandir=/usr/share/man --with-testdir=/usr/lib/xenomai/testsuite --enable-smp --build aarch64-linux-gnu build_alias=aarch64-linux-gnu Desktop: kylin 4.0.2 (ubuntu likely desktop) Issue description: latency and cyclictest work fine in my system in most of cases, the worst latency is around 100us ~ 200us. however, when i ran a script to increase the cpu load in the system, the worst latency reached 2000us ~ 5000us or even worse. Basically, the script forks 6 processes by default and each process applies a four-pages buffer and keeps writing without any breath, no warning or error messages in dmesg. below is the script: #include #include #include #include #include #include #define PAGE_SIZE 4096 #define TEST_THREADS 6 unsigned int test_threads; void do_thread_test(void) { void *mm; char i = 0; printf("mem test thread start \n"); mm = malloc(PAGE_SIZE * 4); // 1M while(1) { for (i = 0; i<100; i++) memset(mm, i, PAGE_SIZE * 4); } You cannot run Xenomai threads at 100% on Linux. You need to leave some time for the rest of the system to do housekeeping. That explains the "deadlock" you see. If you turn on CONFIG_XENO_OPT_WATCHDOG, it will detect such mistakes and kick the task out of RT. Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
hith latency when cpu is fully loaded
Dear experts, i'm new in xenomai, i got an issue, here is the detail: Main processor architect: ARM64 phytium ft2000ahk Kernel release number: 4.14.4 cmdline:BOOT_IMAGE=/Image-tmp root=UUID=9fea0634-a9c9-4e9f-906c-9c36b7249822 console=ttyS1,115200 earlyprintk=uart8250-32bit,0x28001000 rw rootdelay=10 KEYBOARDTYPE=pc KEYTABLE=us security= xenomai release number:3.1-devel xenomai configuration: kylin@kylin-os:~/workspace/code/nudt-hgj-xenomai-tjrd$ grep configure config.status # Generated by configure. # Compiler output produced by configure, useful for debugging # configure, is in config.log if it exists. configured by ./configure, generated by GNU Autoconf 2.69, ac_configure_extra_args= ac_configure_extra_args="$ac_configure_extra_args --silent" set X /bin/bash './configure' '--with-core=cobalt' '--enable-smp' '--enable-pshared' $ac_configure_extra_args --no-create --no-recursion configure_time_dlsearch_path='/lib /usr/lib /lib/aarch64-linux-gnu /usr/lib/aarch64-linux-gnu /usr/lib/aarch64-linux-gnu/mesa-egl /usr/lib/aarch64-linux-gnu/mesa /usr/local/lib ' configure_time_lt_sys_library_path='' for var in reload_cmds old_postinstall_cmds old_postuninstall_cmds old_archive_cmds extract_expsyms_cmds old_archive_from_new_cmds old_archive_from_expsyms_cmds archive_cmds archive_expsym_cmds module_cmds module_expsym_cmds export_symbols_cmds prelink_cmds postlink_cmds postinstall_cmds postuninstall_cmds finish_cmds sys_lib_search_path_spec configure_time_dlsearch_path configure_time_lt_sys_library_path; do # on some systems where configure will not decide to define it. # Let's still pretend it is `configure' which instantiates (i.e., don't configure_input='Generated from '` `' by configure.' configure_input="$ac_file. $configure_input" case $configure_input in #( ac_sed_conf_input=`$as_echo "$configure_input" | *) ac_sed_conf_input=$configure_input;; s|@configure_input@|$ac_sed_conf_input|;t t $as_echo "/* $configure_input */" \ $as_echo "/* $configure_input */" \ # Libtool was configured on host `(hostname || uname -n) 2>/dev/null | sed 1q`: : \${LT_SYS_LIBRARY_PATH="$configure_time_lt_sys_library_path"} sys_lib_dlsearch_path_spec=$lt_configure_time_dlsearch_path # Explicit LT_SYS_LIBRARY_PATH set during ./configure time. configure_time_lt_sys_library_path=$lt_configure_time_lt_sys_library_path OR: kylin@kylin-os:~$ xeno-config --info Xenomai version: Xenomai/cobalt v3.1-devel -- # () Linux kylin-os 4.14.4.kylin.rt-1118-ipipe-trace+ #2 SMP PREEMPT Mon Nov 18 18:28:17 CST 2019 aarch64 aarch64 aarch64 GNU/Linux Kernel parameters: BOOT_IMAGE=/Image-tmp root=UUID=9fea0634-a9c9-4e9f-906c-9c36b7249822 console=ttyS1,115200 earlyprintk=uart8250-32bit,0x28001000 rw rootdelay=10 KEYBOARDTYPE=pc KEYTABLE=us security= I-pipe release #2 detected Cobalt core 3.1-devel detected Compiler: gcc version 5.4.0 20160609 (Ubuntu/Linaro 5.4.0-6kord1~16.04.10) Build args: --prefix=/usr --includedir=/usr/include/xenomai --mandir=/usr/share/man --with-testdir=/usr/lib/xenomai/testsuite --enable-smp --build aarch64-linux-gnu build_alias=aarch64-linux-gnu Desktop: kylin 4.0.2 (ubuntu likely desktop) Issue description: latency and cyclictest work fine in my system in most of cases, the worst latency is around 100us ~ 200us. however, when i ran a script to increase the cpu load in the system, the worst latency reached 2000us ~ 5000us or even worse. Basically, the script forks 6 processes by default and each process applies a four-pages buffer and keeps writing without any breath, no warning or error messages in dmesg. below is the script: #include #include #include #include #include #include #define PAGE_SIZE 4096 #define TEST_THREADS 6 unsigned int test_threads; void do_thread_test(void) { void *mm; char i = 0; printf("mem test thread start \n"); mm = malloc(PAGE_SIZE * 4); // 1M while(1) { for (i = 0; i<100; i++) memset(mm, i, PAGE_SIZE * 4); } } void thread_test(void) { pid_t pid[TEST_THREADS]; int i = 0; printf("begin start\n"); for (i = 0; i< test_threads; i++) { pid[i] = fork(); if (pid[i] == 0) do_thread_test(); } } int get_cpu_idle_info(int *idle, int *total) { FILE *fp; int var[5][7]; char name[5][20]; char *line; ssize_t read, len = 0; int i = 0; if ((fp = fopen("/proc/stat", "r")) == NULL) { printf("open /proc/stat err !\n"); return -1; } while((read = getline(, , fp)) != -1) { // fgets(buff, sizeof(buff), fp); sscanf(line, "%s %u %u %u %u %u %u %u", [i][0], [i][0], [i][1], [i][2], [i][3],
RE: Deadlock during debugging
One more, Note that there seem to be quite different reports, from a recursive fault to some threads getting marked as "runaway". I can reproduce the issue now easily, but its proprietary software I cant reach around. Norbert [ 226.354729] I-pipe: Detected stalled head domain, probably caused by a bug. [ 226.354729] A critical section may have been left unterminated. [ 226.370156] CPU: 1 PID: 0 Comm: swapper/2 Tainted: GW 4.19.84-xenod8-static #1 [ 226.370160] CPU: 2 PID: 732 Comm: fup.fast Tainted: GW 4.19.84-xenod8-static #1 [ 226.378775] Hardware name: TQ-Group TQMxE39M/Type2 - Board Product Name, BIOS 5.12.30.21.16 01/31/2019 [ 226.387475] Hardware name: TQ-Group TQMxE39M/Type2 - Board Product Name, BIOS 5.12.30.21.16 01/31/2019 [ 226.396782] I-pipe domain: Linux [ 226.406089] I-pipe domain: Linux [ 226.409320] RIP: 0010:do_idle+0xaf/0x140 [ 226.412549] Call Trace: [ 226.416476] Code: 85 92 00 00 00 e8 51 f5 04 00 e8 bc 65 03 00 e8 77 36 7c 00 f0 80 4d 02 20 9c 58 f6 c4 02 74 7e e8 66 2d 07 00 48 85 c0 74 6b <0f> 0b e8 0a 42 07 00 e8 45 68 03 00 9c 58 f6 c4 02 0f 85 79 ff ff [ 226.418936] dump_stack+0x8c/0xc0 [ 226.437687] RSP: 0018:932cc00afef8 EFLAGS: 00010002 [ 226.441009] ipipe_root_only.cold+0x11/0x32 [ 226.446240] ipipe_stall_root+0xe/0x60 [ 226.450424] RAX: 0001 RBX: 0002 RCX: 000b [ 226.454182] __ipipe_trap_prologue+0x2ae/0x2f0 [ 226.461319] RDX: a3fc RSI: 8f63f99c8208 RDI: [ 226.465767] ? __ipipe_complete_domain_migration+0x40/0x40 [ 226.472899] RBP: 8f63f815a7c0 R08: R09: 0002e248 [ 226.478386] invalid_op+0x26/0x51 [ 226.485518] R10: 00015800 R11: 003480cf3801 R12: 8f63f815a7c0 [ 226.488839] RIP: 0010:xnthread_suspend+0x3ef/0x540 [ 226.495973] R13: R14: R15: [ 226.500766] Code: 58 12 00 00 4c 89 e7 e8 ef ca ff ff 41 83 8c 24 c4 11 00 00 01 e9 82 fd ff ff 0f 0b 48 83 bf 58 12 00 00 00 0f 84 49 fc ff ff <0f> 0b 0f 0b 9c 58 f6 c4 02 0f 84 85 fd ff ff fa bf 00 00 00 80 e8 [ 226.507900] FS: () GS:8f63f980() knlGS: [ 226.52] RSP: 0018:932cc083bd60 EFLAGS: 00010082 [ 226.534755] CS: 0010 DS: ES: CR0: 80050033 [ 226.539986] CR2: 7ff8dca27000 CR3: 000174c54000 CR4: 003406e0 [ 226.545738] RAX: 932cc0617e30 RBX: 00025090 RCX: [ 226.552870] Call Trace: [ 226.560005] RDX: RSI: 0002 RDI: 932cc0616240 [ 226.562461] cpu_startup_entry+0x6f/0x80 [ 226.569590] RBP: 932cc0617e08 R08: 932cc0617e08 R09: 0005cc88 [ 226.573520] start_secondary+0x169/0x1b0 [ 226.580655] R10: R11: R12: 932cc0616240 [ 226.584585] secondary_startup_64+0xa4/0xb0 [ 226.591716] R13: R14: R15: 932cc0617e08 [ 226.595905] ---[ end trace aa5dc96dbf303c58 ]--- [ 226.603042] xnsynch_sleep_on+0x117/0x2d0 [ 226.611670] __cobalt_cond_wait_prologue+0x29f/0x950 [ 226.616647] ? __cobalt_cond_wait_prologue+0x950/0x950 [ 226.621798] CoBaLt_cond_wait_prologue+0x23/0x30 [ 226.626425] handle_head_syscall+0xe1/0x370 [ 226.630618] ipipe_fastcall_hook+0x14/0x20 [ 226.634724] ipipe_handle_syscall+0x57/0xe0 [ 226.638920] do_syscall_64+0x4b/0x500 [ 226.642598] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 226.647660] RIP: 0033:0x77f9c134 [ 226.651244] Code: 8b 73 04 49 89 dc e8 fb ef ff ff 48 89 de 48 8b 5c 24 10 45 31 c0 b9 23 00 00 10 48 8d 54 24 44 45 31 d2 48 89 df 89 c8 0f 05 <8b> 7c 24 2c 31 f6 49 89 c5 89 c5 e8 cc ef ff ff 4c 89 ff e8 74 e9 [ 226.670014] RSP: 002b:7fffe1a6bb10 EFLAGS: 0246 ORIG_RAX: 1023 [ 226.677599] RAX: ffda RBX: 74d91c78 RCX: 77f9c134 [ 226.684744] RDX: 7fffe1a6bb54 RSI: 74d91c48 RDI: 74d91c78 [ 226.691885] RBP: 7fffe1a6bc30 R08: R09: [ 226.699027] R10: R11: 0246 R12: 74d91c48 [ 226.706166] R13: R14: 0001 R15: 7fffe1a6bb60 [ 226.713325] I-pipe tracer log (100 points): [ 226.717520] |*+func0 ipipe_trace_panic_freeze+0x0 (ipipe_root_only+0xcf) [ 226.726114] |*+func0 ipipe_root_only+0x0 (ipipe_stall_root+0xe) [ 226.733926] |*+func -1 ipipe_stall_root+0x0 (__ipipe_trap_prologue+0x2ae) [ 226.742431] |# func -2 ipipe_trap_hook+0x0 (__ipipe_notify_trap+0x98) [ 226.750590] |# func -3 __ipipe_notify_trap+0x0 (__ipipe_trap_prologue+0x7f) [ 226.759268] |# func -3 __ipipe_trap_prologue+0x0 (invalid_op+0x26) [ 226.767167] |# func -5 xnthread_suspend+0x0 (xnsynch_sleep_on+0x117) [
RE: Deadlock during debugging
New crash, same thing with ipipe panic trace (the decoded log does not add information to the relevant parts). Is the dump_stack function itself trashing the stack? [ 168.411205] [Xenomai] watchdog triggered on CPU #1 -- runaway thread 'main' signaled [ 209.176742] [ cut here ] [ 209.181381] xnthread_relax() failed for thread aboard_runner[790] [ 209.181389] BUG: Unhandled exception over domain Xenomai at 0x7fed - switching to ROOT [ 209.196451] CPU: 0 PID: 790 Comm: aboard_runner Tainted: GW 4.19.84-xenod8-static #1 [ 209.205588] Hardware name: TQ-Group TQMxE39M/Type2 - Board Product Name, BIOS 5.12.30.21.16 01/31/2019 [ 209.214900] I-pipe domain: Linux [ 209.218137] Call Trace: [ 209.220593] dump_stack+0x8c/0xc0 [ 209.223919] __ipipe_trap_prologue.cold+0x1f/0x5e [ 209.228629] invalid_op+0x26/0x51 [ 209.231952] RIP: 0010:xnthread_relax+0x46d/0x4a0 [ 209.236576] Code: f6 83 c2 11 00 00 01 75 0e 48 8b 03 48 85 c0 74 33 8b 90 c0 04 00 00 48 8d b3 5c 14 00 00 48 c7 c7 90 00 8b 9a e8 02 02 ef ff <0f> 0b e9 42 fd ff ff 89 c6 48 c7 c7 c4 f8 a3 9a e8 2e 71 f3 ff e9 [ 209.255347] RSP: 0018:9a0e4074fd90 EFLAGS: 00010286 [ 209.260586] RAX: RBX: 9a0e4065aa40 RCX: 000b [ 209.267728] RDX: 5129 RSI: 902a794791f8 RDI: 007800c0 [ 209.274869] RBP: 9a0e4074fe68 R08: 007800c0 R09: 0002e248 [ 209.282013] R10: 9bb72040 R11: 9bb3209c R12: 9bbfdc80 [ 209.289157] R13: 902a76da8000 R14: 0001 R15: 0292 [ 209.296299] ? xnthread_prepare_wait+0x20/0x20 [ 209.300752] ? trace+0x59/0x8d [ 209.303814] ? __cobalt_clock_nanosleep+0x540/0x540 [ 209.308700] handle_head_syscall+0x307/0x370 [ 209.312979] ipipe_fastcall_hook+0x14/0x20 [ 209.317083] ipipe_handle_syscall+0x57/0xe0 [ 209.321280] do_syscall_64+0x4b/0x500 [ 209.324950] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 209.330011] RIP: 0033:0x77f9bd68 [ 209.333598] Code: 89 fb bf 01 00 00 00 48 83 ec 18 48 8d 74 24 0c e8 bd f3 ff ff b9 19 00 00 10 48 63 f5 48 63 fb 4d 89 ea 4c 89 e2 89 c8 0f 05 <8b> 7c 24 0c 48 89 c3 31 f6 e8 9a f3 ff ff 48 83 c4 18 89 d8 f7 d8 [ 209.352370] RSP: 002b:7fffe7d0 EFLAGS: 0246 ORIG_RAX: 1019 [ 209.359954] RAX: fe00 RBX: 0001 RCX: 77f9bd68 [ 209.367098] RDX: 7fffe820 RSI: 0001 RDI: 0001 [ 209.374237] RBP: 0001 R08: 0001 R09: 0014 [ 209.381381] R10: 7fffe820 R11: 0246 R12: 7fffe820 [ 209.388524] R13: 7fffe820 R14: R15: [ 209.395665] I-pipe tracer log (100 points): [ 209.399857] | #func0 ipipe_trace_panic_freeze+0x0 (__ipipe_trap_prologue+0x237) [ 209.409056] | +func0 ipipe_root_only+0x0 (ipipe_stall_root+0xe) [ 209.416862] | +func -1 ipipe_stall_root+0x0 (__ipipe_trap_prologue+0x2ae) [ 209.425365] |+ func -2 ipipe_trap_hook+0x0 (__ipipe_notify_trap+0x98) [ 209.433523] |+ func -3 __ipipe_notify_trap+0x0 (__ipipe_trap_prologue+0x7f) [ 209.442199] |+ func -4 __ipipe_trap_prologue+0x0 (invalid_op+0x26) [ 209.450097] |+ end 0x8001 -5 __ipipe_spin_unlock_irqrestore+0x4f (<>) [ 209.458425] |# func -6 __ipipe_spin_unlock_irqrestore+0x0 (__ipipe_log_printk+0x69) [ 209.467797] |+ begin 0x8001-10 __ipipe_spin_lock_irqsave+0x5e (<>) [ 209.475693] + func -10 __ipipe_spin_lock_irqsave+0x0 (__ipipe_log_printk+0x22) [ 209.484630] + func -10 __ipipe_log_printk+0x0 (__warn_printk+0x6c) [ 209.492525] |+ end 0x8001-11 do_vprintk+0xf6 (<>) [ 209.499120] |+ begin 0x8001-11 do_vprintk+0x106 (<>) [ 209.505799] + func -12 do_vprintk+0x0 (__warn_printk+0x6c) [ 209.513000] + func -12 vprintk+0x0 (__warn_printk+0x6c) [ 209.519939] |+ end 0x8001-12 ipipe_raise_irq+0x70 (<>) [ 209.526969] |+ func -13 __ipipe_set_irq_pending+0x0 (__ipipe_dispatch_irq+0xad) [ 209.535905] |+ func -14 __ipipe_dispatch_irq+0x0 (ipipe_raise_irq+0x7e) [ 209.544148] |+ begin 0x8001-14 ipipe_raise_irq+0x64 (<>) [ 209.551178] + func -15 ipipe_raise_irq+0x0 (__ipipe_log_printk+0x84) [ 209.559250] |+ end 0x8001-15 __ipipe_spin_unlock_irqrestore+0x4f (<>) [ 209.567581] |# func -15 __ipipe_spin_unlock_irqrestore+0x0 (__ipipe_log_printk+0x69) [ 209.576951] |+ begin 0x8001-17 __ipipe_spin_lock_irqsave+0x5e (<>) [ 209.584847] + func -18 __ipipe_spin_lock_irqsave+0x0 (__ipipe_log_printk+0x22) [
Deadlock during debugging
Hello, Here's one of my deadlocks, the output seems interleaved from 2 concurrent dumps, I ran the crashlog through decode_stacktrace.sh. I got to this, after enabling a breakpoint in gdb (execution did stop there), setting another breakpoint and hitting continue. [ 135.414273] CPU: 1 PID: 0 Comm: swapper/2 Tainted: GW 4.19.84-xeno8-static #1 [ 135.414275] I-pipe: Detected stalled head domain, probably caused by a bug. [ 135.414275] A critical section may have been left unterminated. [ 135.414287] CPU: 2 PID: 798 Comm: fup.fast Tainted: GW 4.19.84-xeno8-static #1 [ 135.422810] Hardware name: TQ-Group TQMxE39M/Type2 - Board Product Name, BIOS 5.12.30.21.16 01/31/2019 [ 135.436373] Hardware name: TQ-Group TQMxE39M/Type2 - Board Product Name, BIOS 5.12.30.21.16 01/31/2019 [ 135.444984] I-pipe domain: Linux [ 135.454290] I-pipe domain: Linux [ 135.463598] RIP: 0010:rcu_nmi_exit+0x140/0x150 [ 135.466825] Call Trace: [ 135.470057] Code: 45 89 f0 4c 89 f9 4c 89 e2 4c 89 ee ff d0 48 8b 03 48 85 c0 75 e2 48 8b 45 08 4c 8d 78 fe e9 5b ff ff ff 0f 0b e9 ee fe ff ff <0f> 0b e9 f8 [ 135.474513] dump_stack+0x8c/0xc0 [ 135.476950] RSP: 0018:a3513bb03f18 EFLAGS: 00010046 [ 135.495720] ipipe_stall_root+0xc/0x30 [ 135.504264] __ipipe_trap_prologue+0x209/0x210 [ 135.508011] RAX: 000573f4 RBX: 00019480 RCX: 001f [ 135.512458] invalid_op+0x26/0x51 [ 135.519592] RDX: RSI: 50523fbe RDI: 0001 [ 135.522914] RIP: 0010:xnthread_suspend+0x3d5/0x4e0 [ 135.530050] RBP: a3513ba99480 R08: 0001 R09: [ 135.534843] Code: 58 12 00 00 4c 89 e7 e8 f9 cf ff ff 41 83 8c 24 c4 11 00 00 01 e9 92 fd ff ff 0f 0b 48 83 bf 58 12 00 00 00 0f 84 63 fc ff ff <0f> 0b 0f 0b [ 135.541979] R10: a35139832440 R11: 0424 R12: [ 135.560746] RSP: 0018:bddd0073fd60 EFLAGS: 00010082 [ 135.567878] R13: 0022 R14: R15: [ 135.580241] FS: () GS:a3513ba8() knlGS: [ 135.580246] RAX: bddd005fbe30 RBX: 00025090 RCX: [ 135.588336] CS: 0010 DS: ES: CR0: 80050033 [ 135.595477] RDX: RSI: 0002 RDI: bddd005fa240 [ 135.601225] CR2: 7f8899c36a10 CR3: 00017b31c000 CR4: 003406e0 [ 135.608362] RBP: bddd005fbe08 R08: bddd005fbe08 R09: [ 135.615500] Call Trace: [ 135.622637] R10: R11: R12: bddd005fa240 [ 135.625085] ---[ end trace adb8b44963759cc1 ]--- [ 135.632220] R13: R14: R15: bddd005fbe08 [ 135.636851] WARNING: CPU: 1 PID: 0 at kernel/rcu/tree.c:941 rcu_nmi_enter+0xe4/0xf0 [ 135.643982] xnsynch_sleep_on+0x102/0x260 [ 135.651634] Modules linked in: [ 135.655649] __cobalt_cond_wait_prologue+0x295/0x8c0 [ 135.655653] rt_igb [ 135.658713] ? __cobalt_cond_wait_prologue+0x8c0/0x8c0 [ 135.663677] plusb [ 135.665781] CoBaLt_cond_wait_prologue+0x23/0x30 [ 135.670918] usbnet [ 135.672936] handle_head_syscall+0xe1/0x370 [ 135.677555] mii [ 135.679658] ipipe_fastcall_hook+0x14/0x20 [ 135.685687] ipipe_handle_syscall+0x4a/0xa0 [ 135.689784] CPU: 1 PID: 0 Comm: swapper/2 Tainted: GW 4.19.84-xeno8-static #1 [ 135.693971] do_syscall_64+0x41/0x3d0 [ 135.702495] Hardware name: TQ-Group TQMxE39M/Type2 - Board Product Name, BIOS 5.12.30.21.16 01/31/2019 [ 135.706160] ? __ipipe_handle_irq+0xb7/0x200 [ 135.715464] I-pipe domain: Linux [ 135.719738] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 135.722972] RIP: 0010:rcu_nmi_enter+0xe4/0xf0 [ 135.728025] RIP: 0033:0x77f9c134 [ 135.732386] Code: 48 85 c0 75 d9 48 8b 6b 08 eb 9a e8 b6 a9 ff ff 48 8b 6b 08 41 bd 01 00 00 00 4c 8b 35 5d cb 23 01 4c 8d 7d 01 e9 72 ff ff ff <0f> 0b e9 44 [ 135.735966] Code: 8b 73 04 49 89 dc e8 fb ef ff ff 48 89 de 48 8b 5c 24 10 45 31 c0 b9 23 00 00 10 48 8d 54 24 44 45 31 d2 48 89 df 89 c8 0f 05 <8b> 7c 24 29 [ 135.754730] RSP: 0018:a3513bb03f38 EFLAGS: 00010082 [ 135.773496] RSP: 002b:7fffe82dab10 EFLAGS: 0246 [ 135.778728] ORIG_RAX: 1023 [ 135.783954] RAX: 00019480 RBX: a3513ba99480 RCX: a3513ba9c008 [ 135.787792] RAX: ffda RBX: 74127c78 RCX: 77f9c134 [ 135.794927] RDX: a3513ba9c000 RSI: 0001 RDI: 1140 [ 135.802065] RDX: 7fffe82dab54 RSI: 74127c48 RDI: 74127c78 [ 135.809201] RBP: fffe R08: a3513ba9c228 R09: 0045 [ 135.816337] RBP: 7fffe82dac30 R08: R09: [ 135.823470] R10: a35139832440 R11: 0424 R12: 9e7af080 [ 135.830604] R10: R11: 0246 R12: 74127c48 [ 135.837738] R13: 00045000 R14:
Re: [PATCH 3/3] boilerplate/avl: fix NULL link representation in pshared mode - take #2
On 18.11.19 09:20, Philippe Gerum wrote: On 11/18/19 9:12 AM, Jan Kiszka wrote: On 17.11.19 12:41, Philippe Gerum via Xenomai wrote: Since zero is the offset pointing at the AVL tree anchor, it cannot be used for representing a NULL link. Use (ptrdiff_t)-1 instead. Signed-off-by: Philippe Gerum --- include/boilerplate/avl-inner.h | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/include/boilerplate/avl-inner.h b/include/boilerplate/avl-inner.h index 8e4de8487..9c0576213 100644 --- a/include/boilerplate/avl-inner.h +++ b/include/boilerplate/avl-inner.h @@ -105,14 +105,14 @@ shavlh_link(const struct shavl *const avl, const struct shavlh *const holder, unsigned int dir) { ptrdiff_t offset = holder->link[avl_type2index(dir)].offset; - return offset ? (void *)avl + offset : NULL; + return offset == (ptrdiff_t)-1 ? NULL : (void *)avl + offset; } static inline void shavlh_set_link(struct shavl *const avl, struct shavlh *lhs, int dir, struct shavlh *rhs) { - ptrdiff_t offset = rhs ? (void *)rhs - (void *)avl : 0; + ptrdiff_t offset = rhs ? (void *)rhs - (void *)avl : (ptrdiff_t)-1; lhs->link[avl_type2index(dir)].offset = offset; } @@ -120,13 +120,13 @@ static inline struct shavlh *shavl_end(const struct shavl *const avl, int dir) { ptrdiff_t offset = avl->end[avl_type2index(dir)].offset; - return offset ? (void *)avl + offset : NULL; + return offset == (ptrdiff_t)-1 ? NULL : (void *)avl + offset; } static inline void shavl_set_end(struct shavl *const avl, int dir, struct shavlh *holder) { - ptrdiff_t offset = holder ? (void *)holder - (void *)avl : 0; + ptrdiff_t offset = holder ? (void *)holder - (void *)avl : (ptrdiff_t)-1; avl->end[avl_type2index(dir)].offset = offset; } Thanks, all applied to next. But, again, please always add a proper "From:" line to the commit so that I do not need to manually edit all of them to avoid the infamous "Philippe Gerum via Xenomai " entries. TIA. This is the output of git send-email. Will check. The trick I'm using for scripted submission is to format-patch with a --from set to dummy value. That enforces git format-patch to add the "From: ". When sending this, I'm using the correct email of course. Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
Re: [PATCH 3/3] boilerplate/avl: fix NULL link representation in pshared mode - take #2
On 11/18/19 9:12 AM, Jan Kiszka wrote: > On 17.11.19 12:41, Philippe Gerum via Xenomai wrote: >> Since zero is the offset pointing at the AVL tree anchor, it cannot be >> used for representing a NULL link. Use (ptrdiff_t)-1 instead. >> >> Signed-off-by: Philippe Gerum >> --- >> include/boilerplate/avl-inner.h | 8 >> 1 file changed, 4 insertions(+), 4 deletions(-) >> >> diff --git a/include/boilerplate/avl-inner.h >> b/include/boilerplate/avl-inner.h >> index 8e4de8487..9c0576213 100644 >> --- a/include/boilerplate/avl-inner.h >> +++ b/include/boilerplate/avl-inner.h >> @@ -105,14 +105,14 @@ shavlh_link(const struct shavl *const avl, >> const struct shavlh *const holder, unsigned int dir) >> { >> ptrdiff_t offset = holder->link[avl_type2index(dir)].offset; >> - return offset ? (void *)avl + offset : NULL; >> + return offset == (ptrdiff_t)-1 ? NULL : (void *)avl + offset; >> } >> static inline void >> shavlh_set_link(struct shavl *const avl, struct shavlh *lhs, >> int dir, struct shavlh *rhs) >> { >> - ptrdiff_t offset = rhs ? (void *)rhs - (void *)avl : 0; >> + ptrdiff_t offset = rhs ? (void *)rhs - (void *)avl : (ptrdiff_t)-1; >> lhs->link[avl_type2index(dir)].offset = offset; >> } >> @@ -120,13 +120,13 @@ static inline >> struct shavlh *shavl_end(const struct shavl *const avl, int dir) >> { >> ptrdiff_t offset = avl->end[avl_type2index(dir)].offset; >> - return offset ? (void *)avl + offset : NULL; >> + return offset == (ptrdiff_t)-1 ? NULL : (void *)avl + offset; >> } >> static inline void >> shavl_set_end(struct shavl *const avl, int dir, struct shavlh *holder) >> { >> - ptrdiff_t offset = holder ? (void *)holder - (void *)avl : 0; >> + ptrdiff_t offset = holder ? (void *)holder - (void *)avl : >> (ptrdiff_t)-1; >> avl->end[avl_type2index(dir)].offset = offset; >> } >> > > Thanks, all applied to next. > > But, again, please always add a proper "From:" line to the commit so that I > do not need to manually edit all of them to avoid the infamous "Philippe > Gerum via Xenomai " entries. TIA. > This is the output of git send-email. Will check. -- Philippe.
Re: [PATCH 3/3] boilerplate/avl: fix NULL link representation in pshared mode - take #2
On 17.11.19 12:41, Philippe Gerum via Xenomai wrote: Since zero is the offset pointing at the AVL tree anchor, it cannot be used for representing a NULL link. Use (ptrdiff_t)-1 instead. Signed-off-by: Philippe Gerum --- include/boilerplate/avl-inner.h | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/include/boilerplate/avl-inner.h b/include/boilerplate/avl-inner.h index 8e4de8487..9c0576213 100644 --- a/include/boilerplate/avl-inner.h +++ b/include/boilerplate/avl-inner.h @@ -105,14 +105,14 @@ shavlh_link(const struct shavl *const avl, const struct shavlh *const holder, unsigned int dir) { ptrdiff_t offset = holder->link[avl_type2index(dir)].offset; - return offset ? (void *)avl + offset : NULL; + return offset == (ptrdiff_t)-1 ? NULL : (void *)avl + offset; } static inline void shavlh_set_link(struct shavl *const avl, struct shavlh *lhs, int dir, struct shavlh *rhs) { - ptrdiff_t offset = rhs ? (void *)rhs - (void *)avl : 0; + ptrdiff_t offset = rhs ? (void *)rhs - (void *)avl : (ptrdiff_t)-1; lhs->link[avl_type2index(dir)].offset = offset; } @@ -120,13 +120,13 @@ static inline struct shavlh *shavl_end(const struct shavl *const avl, int dir) { ptrdiff_t offset = avl->end[avl_type2index(dir)].offset; - return offset ? (void *)avl + offset : NULL; + return offset == (ptrdiff_t)-1 ? NULL : (void *)avl + offset; } static inline void shavl_set_end(struct shavl *const avl, int dir, struct shavlh *holder) { - ptrdiff_t offset = holder ? (void *)holder - (void *)avl : 0; + ptrdiff_t offset = holder ? (void *)holder - (void *)avl : (ptrdiff_t)-1; avl->end[avl_type2index(dir)].offset = offset; } Thanks, all applied to next. But, again, please always add a proper "From:" line to the commit so that I do not need to manually edit all of them to avoid the infamous "Philippe Gerum via Xenomai " entries. TIA. Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux