For the record, I tried out a custom allocator for requests. I measured how
long it took to run twolf smred on ARM/gem5.opt with no modification, with
a custom allocator that keeps around all deleted requests for reuse, and
one that only kept around one. I was not able to measure any consistent
>
> As a side note, tcmalloc has bitten me a number of times here. I've run
> valgind with tcmalloc and seen "0 bytes allocated and 0 bytes deleted"
> after waiting for hours too many times. I have finally learned to recompile
> everything using "--without-tcmalloc" before running valgind.
>
>
Couple more comments below.
On Thu, Nov 29, 2018 at 3:08 PM Gabe Black wrote:
> I looked at the code as is, and Packets themselves aren't ever inherited by
> anything, and they maintain a stack (I think) of sender state objects which
> doesn't affect the size of the packet. The sender state is
I looked at the code as is, and Packets themselves aren't ever inherited by
anything, and they maintain a stack (I think) of sender state objects which
doesn't affect the size of the packet. The sender state is inherited in a
bunch of places since that is often customized. As far as compressed
Hello,
Polymorphism could provide a good cleanup (Just random possibilities/examples:
Split between source types (CPU, Mem, GPU...), or between cmd types), but maybe
it is unrealistic, since these classes are all over. Regarding compressed
packets, they would still be "easily" implemented
Hey all,
The bigger opportunity is probably around packets as sometimes we are
allocating a number of them per request.
A custom allocator would be a great contribution if it would make gem5
substantially faster. But if it proved make things only marginally
better, I agree with Jason, we would
Hey Gabe,
Are you thinking that a custom allocator would make a difference in terms
of memory footprint or in terms of performance (or both)?
A couple of thoughts:
- I'm hesitant to put the final keyword on Packet. I think we could see
some code cleanup by making Packet a polymorphic object
I was just wondering whether it would make sense to have custom allocators
defined for Request and Packet types which would keep around a pool of them
rather than defaulting to the normal allocator. I suspect since both types
of objects are allocated very frequently this could save a lot of heap