Re: lmw/stmw and corrupted frame on G5

Mark Kettenis Wed, 14 Oct 2015 05:17:15 -0700

> Date: Tue, 13 Oct 2015 18:10:50 +0200
> From: Martin Pieuchot <m...@openbsd.org>
> 
> On my G5s it is impossible to "c[ontinue]" execution after breaking into
> ddb(4).  Doing so always result in a:
> 
>   panic: trap 9300 at 101000 (ddb_trap+0x40) lr 0x1b
> 
> Note that 0x9300 == 0x8000 | 0x1300, so it seems that EXC_BPT that is
> set to enter ddb(8) has not been cleared (or better say the previous
> value has not been properly restored).
> 
> I tracked down the problem to the "stmw" instruction in ddb_trap as
> adding an "isync" right after this instruction "fixes" the problem:
> 
> @@ -1264,6 +1264,7 @@ _C_LABEL(ddb_trap):
>       isync
>       GET_CPUINFO(%r3)
>       stmw    %r28,CI_DDBSAVE(%r3)
> +     isync
>  
> 
> I'm far from being a PowerPC expert so I'm really interested in hearing
> what others think of this issue.  Nonetheless the 970FX user manual says
> about lmw/stmw:
> 
>   "The architecture allows these instructions to be interrupted by
>    certain types of asynchronous interrupts (external interrupts,
>    decrementer interrupts, machine check interrupts, and system reset
>    interrupts). In these  cases, for the load multiple  instructions,
>    all of the registers that were to be updated will have an undefined
>    value, and the instruction must be completely restarted to achieve
>    the full effect (that is, no partial restart  capability is supported).
>    For the store multiple instructions, some of the storage locations
>    referenced by the instruction may have been updated. However, to 
>    guarantee full completion of the store multiple instruction, they must
>    also be completely restarted."
> 
> But given the fact that these are microcoded instructions, apparently
> slower and obviously non-safe, I'd head towards replacing them with
> multiple stw/lwz.


I don't think those instructions really are unsafe.  The text you cite
just says that they are not really atomic and might be interrupted
somewhere in the middle, at which point you simply can't know which of
the multiple loads or stores actually happened.

> Diff below also fixes the issue for me.  If this is a sensible approach
> I'll try to get rid of all lmw/stmw.  I'll obviously keep the socppc 
> version in sync.
> 
> Comments?  Ok?

While adding MP support to ddb on sparc64 I notice that ddb was
inserting breakpoints even if you only entered ddb and used continue
again.  So I wonder if there is a cache synchronization issue of some
sort and by adding those additional instructions you just manage to
avoid it.

> Index: macppc/locore.S
> ===================================================================
> RCS file: /cvs/src/sys/arch/macppc/macppc/locore.S,v
> retrieving revision 1.50
> diff -u -p -r1.50 locore.S
> --- macppc/locore.S   29 Jul 2015 18:52:44 -0000      1.50
> +++ macppc/locore.S   9 Oct 2015 16:41:16 -0000
> @@ -692,7 +692,10 @@ nop32_7s:
>       mtmsrd  %r1
>  nop32_7e:
>       GET_CPUINFO(%r1)
> -     stmw    %r28,CI_DDBSAVE(%r1)    /* free r28-r31 */
> +     stw     %r28,(CI_DDBSAVE+0)(%r1) /* free r28 */
> +     stw     %r29,(CI_DDBSAVE+4)(%r1) /* free r29 */
> +     stw     %r30,(CI_DDBSAVE+8)(%r1) /* free r30 */
> +     stw     %r31,(CI_DDBSAVE+12)(%r1) /* free r31 */
>       mflr    %r28                    /* save LR */
>       mfcr    %r29                    /* save CR */
>       GET_CPUINFO(%r30)
> @@ -1263,7 +1266,10 @@ _C_LABEL(ddb_trap):
>       mtmsr   %r3                             /* disable interrupts */
>       isync
>       GET_CPUINFO(%r3)
> -     stmw    %r28,CI_DDBSAVE(%r3)
> +     stw     %r28,(CI_DDBSAVE+0)(%r3) /* save r28 */
> +     stw     %r29,(CI_DDBSAVE+4)(%r3) /* save r29 */
> +     stw     %r30,(CI_DDBSAVE+8)(%r3) /* save r30 */
> +     stw     %r31,(CI_DDBSAVE+12)(%r3) /* save r31 */
>  
>       /*
>        * If we are already running in interrupt context, the CPU
> 
>

Re: lmw/stmw and corrupted frame on G5

Reply via email to