HI Julien,

> -----Original Message-----
> From: Julien Grall <jul...@xen.org>
> Sent: 2021年1月8日 19:56
> To: Wei Chen <wei.c...@arm.com>; xen-devel@lists.xenproject.org;
> sstabell...@kernel.org
> Cc: Bertrand Marquis <bertrand.marq...@arm.com>; Penny Zheng
> <penny.zh...@arm.com>; Jiamei Xie <jiamei....@arm.com>; nd
> <n...@arm.com>
> Subject: Re: [PATCH v2 2/2] xen/arm: Add defensive barrier in get_cycles for
> Arm64
> 
> Hi Wei,
> 
> On 08/01/2021 06:21, Wei Chen wrote:
> > Per the discussion [1] on the mailing list, we'd better to
> > have a barrier after reading CNTPCT in get_cycles. If there
> > is not any barrier there. When get_cycles being used in some
> > seqlock critical context in the future, the seqlock can be
> > speculated potentially.
> >
> > We import Linux commit 75a19a0202db21638a1c2b424afb867e1f9a2376:
> >      arm64: arch_timer: Ensure counter register reads occur with seqlock 
> > held
> >
> >      When executing clock_gettime(), either in the vDSO or via a system 
> > call,
> >      we need to ensure that the read of the counter register occurs within
> >      the seqlock reader critical section. This ensures that updates to the
> >      clocksource parameters (e.g. the multiplier) are consistent with the
> >      counter value and therefore avoids the situation where time appears to
> >      go backwards across multiple reads.
> >
> >      Extend the vDSO logic so that the seqlock critical section covers the
> >      read of the counter register as well as accesses to the data page. 
> > Since
> >      reads of the counter system registers are not ordered by memory barrier
> >      instructions, introduce dependency ordering from the counter read to a
> >      subsequent memory access so that the seqlock memory barriers apply to
> >      the counter access in both the vDSO and the system call paths.
> >
> >      Cc: <sta...@vger.kernel.org>
> >      Cc: Marc Zyngier <marc.zyng...@arm.com>
> >      Tested-by: Vincenzo Frascino <vincenzo.frasc...@arm.com>
> >      Link: https://lore.kernel.org/linux-arm-
> kernel/alpine.deb.2.21.1902081950260.1...@nanos.tec.linutronix.de/
> >      Reported-by: Thomas Gleixner <t...@linutronix.de>
> >      Signed-off-by: Will Deacon <will.dea...@arm.com>
> >
> > While we are not aware of such use in Xen, it would be best to add the
> > barrier to avoid any suprise.
> >
> > In order to reduce the impact of new barrier, we perfer to
> > use enforce order instead of ISB [2].
> >
> > Currently, enforce order is not applied to arm32 as this is
> > not done in Linux at the date of this patch. If this is done
> > in Linux it will need to be also done in Xen.
> >
> > [1] https://lists.xenproject.org/archives/html/xen-devel/2020-
> 12/msg00181.html
> > [2] https://lkml.org/lkml/2020/3/13/645
> >
> > Signed-off-by: Wei Chen <wei.c...@arm.com>
> > ---
> > v1 -> v2:
> > 1. Update commit message to refer Linux commit.
> > 2. Change u64 to uint64_t
> > ---
> >   xen/include/asm-arm/time.h | 43
> ++++++++++++++++++++++++++++++++++++--
> >   1 file changed, 41 insertions(+), 2 deletions(-)
> >
> > diff --git a/xen/include/asm-arm/time.h b/xen/include/asm-arm/time.h
> > index 5c4529ebb5..6b8fd839dd 100644
> > --- a/xen/include/asm-arm/time.h
> > +++ b/xen/include/asm-arm/time.h
> > @@ -11,9 +11,26 @@
> >
> >   typedef uint64_t cycles_t;
> >
> > -static inline cycles_t get_cycles(void)
> > +/*
> > + * Ensure that reads of the counter are treated the same as memory reads
> > + * for the purposes of ordering by subsequent memory barriers.
> > + */
> > +#if defined(CONFIG_ARM_64)
> > +#define read_cntpct_enforce_ordering(val) do { \
> > +    uint64_t tmp, _val = (val);                \
> > +                                               \
> > +    asm volatile(                              \
> > +    "eor %0, %1, %1\n"                         \
> > +    "add %0, sp, %0\n"                         \
> > +    "ldr xzr, [%0]"                            \
> > +    : "=r" (tmp) : "r" (_val));                \
> > +} while (0)
> > +#else
> > +#define read_cntpct_enforce_ordering(val) do {} while (0)
> > +#endif
> > +
> > +static inline cycles_t read_cntpct_stable(void)
> 
> OOI, is there any particular reason to create a new helper?
> 

Yes, I try to reduce the #if defined(CONFIG_ARM_64) chunks. I think
if we introduce an empty helper for Arm32, we can reduce the other
chunk inside get_cycles. In addition, if we need to do the same work
for Arm32 in the future, we may not need to modify get_cycles.

> >   {
> > -    isb();
> >       /*
> >        * ARM_WORKAROUND_858921: Cortex-A73 (all versions) counter read
> >        * can return a wrong value when the counter crosses a 32bit boundary.
> > @@ -36,6 +53,28 @@ static inline cycles_t get_cycles(void)
> >       }
> >   }
> >
> > +static inline cycles_t get_cycles(void)
> > +{
> > +    cycles_t cnt;
> > +
> > +    isb();
> > +    cnt = read_cntpct_stable();
> > +
> > +    /*
> > +     * If there is not any barrier here. When get_cycles being used in
> > +     * some seqlock critical context in the future, the seqlock can be
> > +     * speculated potentially.
> > +     *
> > +     * To prevent seqlock from being speculated silently, we add a barrier
> > +     * here defensively. Normally, we just need an ISB here is enough, but
> > +     * considering the minimum performance cost. We prefer to use enforce
> > +     * order here.
> > +     */
> 
> I thought, we agreed to remove the comment. Did I miss anything?
> 
> I can fix this one on commit if there is no need for a new revision.
> 

Ah, sorry I forget that. If we don't need a new revision, please help me to
remove it.

Thanks.

> Cheers,
> 
> > +    read_cntpct_enforce_ordering(cnt);
> > +
> > +    return cnt;
> > +}
> > +
> >   /* List of timer's IRQ */
> >   enum timer_ppi
> >   {
> >
> 
> --
> Julien Grall

Reply via email to