On Jan 5, 2011, at 9:36 PM, Toru Nishimura wrote: > Matt Thomas made a comment; > >> The ASID generational stuff has a downside in that valid entries will be >> thrown away. For mips (and booke) I use >> a different algorithm which eliminates the overhead of >> discarding all the TLB entries when you run out of ASIDs. > > It's a good move to pursue efficent ASID management > schemes since it's the key area for runtime VM/TLB activity. > > Matt points loosing valid entries is a problem when ASID > generation is going to get bumped. I think, however, it'd be > forgiven that tbia() operation, to discard all entries but global > or locked, discards "live entries" since TLB size is still small > enough. Some CPU architectures do it with a single > special instruction or others do at-most 64 time loop to discard. > TLB is a cache for VA->PA translation and the management > scheme always provokes "efficiency v.s. correctness" > arguments. It's a matter of implementation tradeoff, I believe. > > BTW, how do you approach to implement a remote TLB > shootdown?
It depends on the reason for the shootdown. Might as include the comments of pmap_tlb.c here: /* * Manages address spaces in a TLB. * * Normally there is a 1:1 mapping between a TLB and a CPU. However, some * implementations may share a TLB between multiple CPUs (really CPU thread * contexts). This requires the TLB abstraction to be separated from the * CPU abstraction. It also requires that the TLB be locked while doing * TLB activities. * * For each TLB, we track the ASIDs in use in a bitmap and a list of pmaps * that have a valid ASID. * * We allocate ASIDs in increasing order until we have exhausted the supply, * then reinitialize the ASID space, and start allocating again at 1. When * allocating from the ASID bitmap, we skip any ASID who has a corresponding * bit set in the ASID bitmap. Eventually this causes the ASID bitmap to fill * and, when completely filled, a reinitialization of the ASID space. * * To reinitialize the ASID space, the ASID bitmap is reset and then the ASIDs * of non-kernel TLB entries get recorded in the ASID bitmap. If the entries * in TLB consume more than half of the ASID space, all ASIDs are invalidated, * the ASID bitmap is recleared, and the list of pmaps is emptied. Otherwise, * (the normal case), any ASID present in the TLB (even those which are no * longer used by a pmap) will remain active (allocated) and all other ASIDs * will be freed. If the size of the TLB is much smaller than the ASID space, * this algorithm completely avoids TLB invalidation. * * For multiprocessors, we also have to deal TLB invalidation requests from * other CPUs, some of which are dealt with the reinitialization of the ASID * space. Whereas above we keep the ASIDs of those pmaps which have active * TLB entries, this type of reinitialization preserves the ASIDs of any * "onproc" user pmap and all other ASIDs will be freed. We must do this * since we can't change the current ASID. * * Each pmap has two bitmaps: pm_active and pm_onproc. Each bit in pm_active * indicates whether that pmap has an allocated ASID for a CPU. Each bit in * pm_onproc indicates that pmap's ASID is active (equal to the ASID in COP 0 * register EntryHi) on a CPU. The bit number comes from the CPU's cpu_index(). * Even though these bitmaps contain the bits for all CPUs, the bits that * correspond to the bits belonging to the CPUs sharing a TLB can only be * manipulated while holding that TLB's lock. Atomic ops must be used to * update them since multiple CPUs may be changing different sets of bits at * same time but these sets never overlap. * * When a change to the local TLB may require a change in the TLB's of other * CPUs, we try to avoid sending an IPI if at all possible. For instance, if * are updating a PTE and that PTE previously was invalid and therefore * couldn't support an active mapping, there's no need for an IPI since can be * no TLB entry to invalidate. The other case is when we change a PTE to be * modified we just update the local TLB. If another TLB has a stale entry, * a TLB MOD exception will be raised and that will cause the local TLB to be * updated. * * We never need to update a non-local TLB if the pmap doesn't have a valid * ASID for that TLB. If it does have a valid ASID but isn't current "onproc" * we simply reset its ASID for that TLB and at the time it goes "onproc" it * will allocate a new ASID and any existing TLB entries will be orphaned. * Only in the case that pmap has an "onproc" ASID do we actually have to send * an IPI. * * Once we determined we must send an IPI to shootdown a TLB, we need to send * it to one of CPUs that share that TLB. We choose the lowest numbered CPU * that has one of the pmap's ASID "onproc". In reality, any CPU sharing that * TLB would do, but interrupting an active CPU seems best. * * A TLB might have multiple shootdowns active concurrently. The shootdown * logic compresses these into a few cases: * 0) nobody needs to have its TLB entries invalidated * 1) one ASID needs to have its TLB entries invalidated * 2) more than one ASID needs to have its TLB entries invalidated * 3) the kernel needs to have its TLB entries invalidated * 4) the kernel and one or more ASID need their TLB entries invalidated. * * And for each case we do: * 0) nothing, * 1) if that ASID is still "onproc", we invalidate the TLB entries for * that single ASID. If not, just reset the pmap's ASID to invalidate * and let it allocated the next time it goes "onproc", * 2) we reinitialize the ASID space (preserving any "onproc" ASIDs) and * invalidate all non-wired non-global TLB entries, * 3) we invalidate all of the non-wired global TLB entries, * 4) we reinitialize the ASID space (again preserving any "onproc" ASIDs) * invalidate all non-wried TLB entries. * * As you can see, shootdowns are not concerned with addresses, just address * spaces. Since the number of TLB entries is usually quite small, this avoids * a lot of overhead for not much gain. */