On Tue, Oct 27, 2020 at 07:01:34AM +0100, Jan Kiszka wrote:
> On 27.10.20 06:23, Fino Meng wrote:
> >>>>>>> I also tested hackbench:
> >>>>>>>
> >>>>>>> while true ; do sudo taskset -c 1 hackbench -s 512 -l 200 -g 20 -f 50 
> >>>>>>> -P ; done
> >>>>>>>
> >>>>>>> it output errors, but the board is still alive. 
> >>>>>>>
> >>>>>>
> >>>>>> Will check. Was that with my FPU fixes in place already?
> >>>>>>
> >>>>>> Jan
> >>>>>
> >>>>> yes, without the FPU fixes, the board will hang after trigger
> >>>>> hackbench.
> >>>>
> >>>> How long did it run to trigger? Anything happening in parallel? How do
> >>>> the errors look like? Currently running, nothing happened so far.
> >>>>
> >>>> Maybe you can also retry with ipipe-x86-5.4.y.
> >>>>
> >>>> Jan
> >>>>
> >>>
> >>> sounds good, will pull latest code. my board's error print like this,
> >>> nothing parallel, only run a hackbench.
> >>>
> >>> [ 3711.348060] RIP: 0033:0x7f4a7edc9471
> >>> [ 3711.354108] Code: 00 00 75 05 48 83 c4 58 c3 e8 0b 4d ff ff 66 2e 0f 
> >>> 1f 84 00 00 00 00 00 90 8b 05 da ef 00 00 85 c0 75 16 b8 01 00 00 00 0f 
> >>> 05 <48> 3d 00 f0 ff ff 77 57 c3 66 0f 1f 44 00 00 41 54 49 89 d4 55 48
> >>> [ 3711.377358] RSP: 002b:00007ffe59265888 EFLAGS: 00000246 ORIG_RAX: 
> >>> 0000000000000001
> >>> [ 3711.388126] RAX: ffffffffffffffda RBX: 0000000000000200 RCX: 
> >>> 00007f4a7edc9471
> >>> [ 3711.398415] RDX: 0000000000000200 RSI: 00007ffe59265890 RDI: 
> >>> 0000000000000014
> >>> [ 3711.408711] RBP: 00007ffe59265ae0 R08: 00007ffe592657e0 R09: 
> >>> 00007f4a7edd42f0
> >>> [ 3711.419019] R10: fffffffffffff8d7 R11: 0000000000000246 R12: 
> >>> 00007ffe59265890
> >>> [ 3711.429338] R13: 000000000000000c R14: 00005644c8742a20 R15: 
> >>> 0000000000000000
> >>> [ 3711.439678] hackbench       R  running task        0  2381    627 
> >>> 0x00000000
> >>> [ 3711.449928] Call Trace:
> >>> [ 3711.455031]  __schedule+0x34d/0x790
> >>> [ 3711.461305]  ? try_to_wake_up+0x8b/0x6b0
> >>> [ 3711.468067]  ? ___preempt_schedule+0x16/0x20
> >>> [ 3711.475219]  preempt_schedule_common+0x74/0x80
> >>> [ 3711.482568]  ___preempt_schedule+0x16/0x20
> >>> [ 3711.489531]  _raw_spin_unlock_irqrestore+0x36/0x40
> >>> [ 3711.497268]  __wake_up_common_lock+0x92/0xc0
> >>> [ 3711.504295]  sock_def_readable+0x41/0x80
> >>> [ 3711.510830]  unix_stream_sendmsg+0x231/0x3c0
> >>> [ 3711.517743]  sock_sendmsg+0x5b/0x60
> >>> [ 3711.523763]  sock_write_iter+0x97/0x100
> >>> [ 3711.530167]  new_sync_write+0x11b/0x1b0
> >>> [ 3711.536554]  vfs_write+0xa5/0x1a0
> >>> [ 3711.542337]  ksys_write+0x59/0xd0
> >>> [ 3711.548100]  do_syscall_64+0x66/0x180
> >>> [ 3711.554232]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>> [ 3711.561932] RIP: 0033:0x7f4a7edc9471
> >>> [ 3711.567977] Code: 00 00 75 05 48 83 c4 58 c3 e8 0b 4d ff ff 66 2e 0f 
> >>> 1f 84 00 00 00 00 00 90 8b 05 da ef 00 00 85 c0 75 16 b8 01 00 00 00 0f 
> >>> 05 <48> 3d 00 f0 ff ff 77 57 c3 66 0f 1f 44 00 00 41 54 49 89 d4 55 48
> >>> [ 3711.591216] RSP: 002b:00007ffe59265888 EFLAGS: 00000246 ORIG_RAX: 
> >>> 0000000000000001
> >>> [ 3711.601984] RAX: ffffffffffffffda RBX: 0000000000000200 RCX: 
> >>> 00007f4a7edc9471
> >>> [ 3711.612276] RDX: 0000000000000200 RSI: 00007ffe59265890 RDI: 
> >>> 000000000000001a
> >>> [ 3711.622577] RBP: 00007ffe59265ae0 R08: 00007ffe592657e0 R09: 
> >>> 00007f4a7edd42f0
> >>> [ 3711.632885] R10: fffffffffffff8d7 R11: 0000000000000246 R12: 
> >>> 00007ffe59265890
> >>> [ 3711.643206] R13: 0000000000000012 R14: 00005644c8742a20 R15: 
> >>> 0000000000000000
> >>> [ 3711.653541] hackbench       R  running task        0  2382    627 
> >>> 0x00000000
> >>> [ 3711.663797] Call Trace:
> >>> [ 3711.668897]  __schedule+0x34d/0x790
> >>> [ 3711.675165]  ? try_to_wake_up+0x8b/0x6b0
> >>> [ 3711.681924]  ? ___preempt_schedule+0x16/0x20
> >>> [ 3711.689085]  preempt_schedule_common+0x74/0x80
> >>> [ 3711.696427]  ___preempt_schedule+0x16/0x20
> >>> [ 3711.703377]  _raw_spin_unlock_irqrestore+0x36/0x40
> >>> [ 3711.710985]  __wake_up_common_lock+0x92/0xc0
> >>> [ 3711.717901]  sock_def_readable+0x41/0x80
> >>> [ 3711.724414]  unix_stream_sendmsg+0x231/0x3c0
> >>> [ 3711.731306]  sock_sendmsg+0x5b/0x60
> >>> [ 3711.737311]  sock_write_iter+0x97/0x100
> >>> [ 3711.743693]  new_sync_write+0x11b/0x1b0
> >>> [ 3711.750061]  vfs_write+0xa5/0x1a0
> >>> [ 3711.755829]  ksys_write+0x59/0xd0
> >>> [ 3711.761574]  do_syscall_64+0x66/0x180
> >>> [ 3711.767709]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>> [ 3711.775409] RIP: 0033:0x7f4a7edc9471
> >>>
> >>>
> >>
> >> Could you send me your config if the issue persists with the latest 
> >> version?
> >>
> >> TIA,
> >> Jan
> >>
> > 
> > latest ipipe-x86 + xenomai-next behaves much better than my previous
> > build, but still print similar error.
> > 
> > "hackbench -s 512 -l 200 -g 20 -f 50 -P" don't give error, which just
> > run once.
> > 
> > "while true; do taskset -c 1 hackbench -s 512 -l 200 -g 20 -f 50 -P;
> > done" will give error, it will keep folking and system pressure will bigger
> > and bigger; the way to stop it is keep pressing Ctrl-C. We use this
> > script as a torture method. 
> > 
> > the error appears in dmesg, after the script run for sometime. Test hardware
> > is UP Xtreme board (WHL8365UE).
> > 
> > I tested this script on Debian 10's original 4.19 kernel, no such error
> > appears in dmesg.
> 
> I'm also getting this, but first an OOM. I gave 4G to that machine, do
> you have more?
> 

I have 8G on board.

> Does the issue also happen with the same kernel when I-pipe is off?

well, I pop off ipipe and xenomai patches, build a vanilla 5.4.72
kernel, the scripts also print such error. So maybe the issue is not
within ipipe/xenomai code~

BR fino

> Turning on debugging knobs now.
> 
> Jan
> 
> -- 
> Siemens AG, T RDA IOT
> Corporate Competence Center Embedded Linux

Reply via email to