date:20080202

Re: Memory allocation performance

2008-02-02 Thread Julian Elischer

Robert Watson wrote: be a good time to try to revalidate that. Basically, the goal would be to make the pcpu cache FIFO as much as possible as that maximizes the you mean FILO or LIFO right? chances that the newly allocated object already has lines in the cache. It's a fairly trivial twea

Re: Memory allocation performance

2008-02-02 Thread Peter Jeremy

On Sat, Feb 02, 2008 at 11:31:31AM +0200, Alexander Motin wrote: >To check UMA dependency I have made a trivial one-element cache which in my >test case allows to avoid two for four allocations per packet. You should be able to implement this lockless using atomic(9). I haven't verified it, but

Re: Memory allocation performance

2008-02-02 Thread Alexander Motin

Robert Watson wrote: Basically, the goal would be to make the pcpu cache FIFO as much as possible as that maximizes the chances that the newly allocated object already has lines in the cache. Why FIFO? I think LIFO (stack) should be better for this goal as the last freed object has more cha

Re: Memory allocation performance

2008-02-02 Thread Max Laier

Am Sa, 2.02.2008, 23:05, schrieb Alexander Motin: > Robert Watson wrote: >> Hence my request for drilling down a bit on profiling -- the question >> I'm asking is whether profiling shows things running or taking time that >> shouldn't be. > > I have not yet understood why does it happend, but hwpm

Re: Memory allocation performance

2008-02-02 Thread Robert Watson

On Sun, 3 Feb 2008, Alexander Motin wrote: Robert Watson wrote: Basically, the goal would be to make the pcpu cache FIFO as much as possible as that maximizes the chances that the newly allocated object already has lines in the cache. Why FIFO? I think LIFO (stack) should be better for this

Re: Memory allocation performance

2008-02-02 Thread Robert Watson

On Sat, 2 Feb 2008, Kris Kennaway wrote: Alexander Motin wrote: Robert Watson wrote: Hence my request for drilling down a bit on profiling -- the question I'm asking is whether profiling shows things running or taking time that shouldn't be. I have not yet understood why does it happend, bu

Re: Memory allocation performance

2008-02-02 Thread Kris Kennaway

Alexander Motin wrote: Robert Watson wrote: Hence my request for drilling down a bit on profiling -- the question I'm asking is whether profiling shows things running or taking time that shouldn't be. I have not yet understood why does it happend, but hwpmc shows huge amount of "p4-resource-

Re: Memory allocation performance

2008-02-02 Thread Alexander Motin

Robert Watson wrote: Hence my request for drilling down a bit on profiling -- the question I'm asking is whether profiling shows things running or taking time that shouldn't be. I have not yet understood why does it happend, but hwpmc shows huge amount of "p4-resource-stall"s in UMA functions

Re: Memory allocation performance

2008-02-02 Thread Peter Jeremy

On Sat, Feb 02, 2008 at 09:56:42PM +0200, Alexander Motin wrote: >Peter Jeremy ?: >> On Sat, Feb 02, 2008 at 11:31:31AM +0200, Alexander Motin wrote: >>> To check UMA dependency I have made a trivial one-element cache which in >>> my test case allows to avoid two for four allocations per packe

Re: Memory allocation performance

2008-02-02 Thread Alexander Motin

Peter Jeremy пишет: On Sat, Feb 02, 2008 at 11:31:31AM +0200, Alexander Motin wrote: To check UMA dependency I have made a trivial one-element cache which in my test case allows to avoid two for four allocations per packet. You should be able to implement this lockless using atomic(9). I have

Re: Memory allocation performance

2008-02-02 Thread Joseph Koshy

> Thanks, I have already found this. There was only problem, that by > default it counts cycles only when both logical cores are active while > one of my cores was halted. Did you try the 'active' event modifier: "p4-global-power-events,active=any"? > Sampling on this, profiler shown results clos

Re: Memory allocation performance

2008-02-02 Thread Joseph Koshy

> I have tried it for measuring number of instructions. But I am in doubt > that instructions is a correct counter for performance measurement as > different instructions may have very different execution times depending > on many reasons, like cache misses and current memory traffic. I have > trie

Re: Memory allocation performance

2008-02-02 Thread Alexander Motin

Joseph Koshy wrote: You cannot sample with the TSC since the TSC does not interrupt the CPU. For CPU cycles you would probably want to use "p4-global-power-events"; see pmc(3). Thanks, I have already found this. There was only problem, that by default it counts cycles only when both logical co

Re: Memory allocation performance

2008-02-02 Thread Alexander Motin

Robert Watson wrote: I guess the question is: where are the cycles going? Are we suffering excessive cache misses in managing the slabs? Are you effectively "cycling through" objects rather than using a smaller set that fits better in the cache? In my test setup only several objects from zo

Re: Memory allocation performance

2008-02-02 Thread Robert Watson

On Sat, 2 Feb 2008, Alexander Motin wrote: Robert Watson wrote: I guess the question is: where are the cycles going? Are we suffering excessive cache misses in managing the slabs? Are you effectively "cycling through" objects rather than using a smaller set that fits better in the cache?

Re: Memory allocation performance

Re: Memory allocation performance

Re: Memory allocation performance

Re: Memory allocation performance

Re: Memory allocation performance

Re: Memory allocation performance

Re: Memory allocation performance

Re: Memory allocation performance

Re: Memory allocation performance

Re: Memory allocation performance

Re: Memory allocation performance

Re: Memory allocation performance

Re: Memory allocation performance

Re: Memory allocation performance

Re: Memory allocation performance

15 matches

Site Navigation

Mail list logo

Footer information