On 06/18, Frederic Weisbecker wrote:
>
> On Tue, Jun 18, 2013 at 04:42:25PM +0200, Oleg Nesterov wrote:
> >
> > Simplest example,
> >
> >     for_each_possible_cpu(cpu)
> >             total_count = per_cpu(per_cpu_count, cpu);
> >
> > Every per_cpu() likely means the cache miss. Not to mention we need the
> > additional math to calculate the address of the local counter.
> >
> >     for_each_possible_cpu(cpu)
> >             total_count = bootmem_or_kmalloc_array[cpu];
> >
> > is much better in this respect.
> >
> > And note also that per_cpu_count above can share the cacheline with
> > another "hot" per-cpu variable.
>
> Ah I see, that's good to know.
>
> But these variables are supposed to only be touched from slow path
> (perf events syscall, ptrace breakpoints creation, etc...), right?
> So this is probably not a problem?

Yes, sure. But please note that this can also penalize other CPUs.
For example, toggle_bp_slot() writes to per_cpu(nr_cpu_bp_pinned),
this invalidates the cachline which can contain another per-cpu
variable.

But let me clarify. I agree, this all is minor, I am not trying to
say this change can actually improve the performance.

The main point of this patch is to make the code look a bit better,
and you seem to agree. The changelog mentions s/percpu/array/ only
as a potential change which obviously needs more discussion, I didnt
mean that we should necessarily do this.

Although yes, personally I really dislike per-cpu in this case, but
of course this is subjective and I won't argue ;)

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe trinity" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to