Re: TLB tiredown by ASID bump

Matt Thomas Wed, 05 Jan 2011 23:30:22 -0800

On Jan 5, 2011, at 9:36 PM, Toru Nishimura wrote:

> Matt Thomas made a comment;
> 
>> The ASID generational stuff has a downside in that valid entries will be 
>> thrown away.  For mips (and booke) I use
>> a different algorithm which eliminates the overhead of
>> discarding all the TLB entries when you run out of ASIDs.
> 
> It's a good move to pursue efficent ASID management
> schemes since it's the key area for runtime VM/TLB activity.
> 
> Matt points loosing valid entries is a problem when ASID
> generation is going to get bumped.  I think, however, it'd be
> forgiven that tbia() operation, to discard all entries but global
> or locked, discards "live entries" since TLB size is still small
> enough.  Some CPU architectures do it  with a single
> special instruction or others do at-most 64 time loop to discard.
> TLB is a cache for VA->PA translation and the management
> scheme always provokes "efficiency v.s. correctness"
> arguments.  It's a matter of implementation tradeoff, I believe.
> 
> BTW, how do you approach to implement a remote TLB
> shootdown?


It depends on the reason for the shootdown.  Might as include the
comments of pmap_tlb.c here:

/*
 * Manages address spaces in a TLB.
 *
 * Normally there is a 1:1 mapping between a TLB and a CPU.  However, some
 * implementations may share a TLB between multiple CPUs (really CPU thread
 * contexts).  This requires the TLB abstraction to be separated from the
 * CPU abstraction.  It also requires that the TLB be locked while doing
 * TLB activities.
 *
 * For each TLB, we track the ASIDs in use in a bitmap and a list of pmaps
 * that have a valid ASID.
 *
 * We allocate ASIDs in increasing order until we have exhausted the supply,
 * then reinitialize the ASID space, and start allocating again at 1.  When
 * allocating from the ASID bitmap, we skip any ASID who has a corresponding
 * bit set in the ASID bitmap.  Eventually this causes the ASID bitmap to fill
 * and, when completely filled, a reinitialization of the ASID space.
 *
 * To reinitialize the ASID space, the ASID bitmap is reset and then the ASIDs
 * of non-kernel TLB entries get recorded in the ASID bitmap.  If the entries
 * in TLB consume more than half of the ASID space, all ASIDs are invalidated,
 * the ASID bitmap is recleared, and the list of pmaps is emptied.  Otherwise,
 * (the normal case), any ASID present in the TLB (even those which are no
 * longer used by a pmap) will remain active (allocated) and all other ASIDs
 * will be freed.  If the size of the TLB is much smaller than the ASID space,
 * this algorithm completely avoids TLB invalidation.
 *
 * For multiprocessors, we also have to deal TLB invalidation requests from
 * other CPUs, some of which are dealt with the reinitialization of the ASID
 * space.  Whereas above we keep the ASIDs of those pmaps which have active
 * TLB entries, this type of reinitialization preserves the ASIDs of any
 * "onproc" user pmap and all other ASIDs will be freed.  We must do this
 * since we can't change the current ASID.
 *
 * Each pmap has two bitmaps: pm_active and pm_onproc.  Each bit in pm_active
 * indicates whether that pmap has an allocated ASID for a CPU.  Each bit in
 * pm_onproc indicates that pmap's ASID is active (equal to the ASID in COP 0
 * register EntryHi) on a CPU.  The bit number comes from the CPU's cpu_index().
 * Even though these bitmaps contain the bits for all CPUs, the bits that
 * correspond to the bits belonging to the CPUs sharing a TLB can only be
 * manipulated while holding that TLB's lock.  Atomic ops must be used to
 * update them since multiple CPUs may be changing different sets of bits at
 * same time but these sets never overlap.
 *
 * When a change to the local TLB may require a change in the TLB's of other
 * CPUs, we try to avoid sending an IPI if at all possible.  For instance, if
 * are updating a PTE and that PTE previously was invalid and therefore
 * couldn't support an active mapping, there's no need for an IPI since can be
 * no TLB entry to invalidate.  The other case is when we change a PTE to be
 * modified we just update the local TLB.  If another TLB has a stale entry,
 * a TLB MOD exception will be raised and that will cause the local TLB to be
 * updated.
 *
 * We never need to update a non-local TLB if the pmap doesn't have a valid
 * ASID for that TLB.  If it does have a valid ASID but isn't current "onproc"
 * we simply reset its ASID for that TLB and at the time it goes "onproc" it
 * will allocate a new ASID and any existing TLB entries will be orphaned.
 * Only in the case that pmap has an "onproc" ASID do we actually have to send
 * an IPI.
 *
 * Once we determined we must send an IPI to shootdown a TLB, we need to send
 * it to one of CPUs that share that TLB.  We choose the lowest numbered CPU
 * that has one of the pmap's ASID "onproc".  In reality, any CPU sharing that
 * TLB would do, but interrupting an active CPU seems best.
 *
 * A TLB might have multiple shootdowns active concurrently.  The shootdown
 * logic compresses these into a few cases:
 *      0) nobody needs to have its TLB entries invalidated
 *      1) one ASID needs to have its TLB entries invalidated
 *      2) more than one ASID needs to have its TLB entries invalidated
 *      3) the kernel needs to have its TLB entries invalidated
 *      4) the kernel and one or more ASID need their TLB entries invalidated.
 *
 * And for each case we do:
 *      0) nothing,
 *      1) if that ASID is still "onproc", we invalidate the TLB entries for
 *         that single ASID.  If not, just reset the pmap's ASID to invalidate
 *         and let it allocated the next time it goes "onproc",
 *      2) we reinitialize the ASID space (preserving any "onproc" ASIDs) and
 *         invalidate all non-wired non-global TLB entries,
 *      3) we invalidate all of the non-wired global TLB entries,
 *      4) we reinitialize the ASID space (again preserving any "onproc" ASIDs)
 *         invalidate all non-wried TLB entries.
 *
 * As you can see, shootdowns are not concerned with addresses, just address
 * spaces.  Since the number of TLB entries is usually quite small, this avoids
 * a lot of overhead for not much gain.
 */

Re: TLB tiredown by ASID bump

Reply via email to