Re: [Valgrind-users] Perf cache misses vs. Cachegrind simulated cache misses through PARSEC benchmark suite

Jason Palaszewski Fri, 07 Dec 2012 12:16:12 -0800

Josef,

The difference with ferret is in the D1 misses. Even with a native
input set of data, there are almost no LL misses on this machine. I'm
unsure as to what type of space ferret actually triggers its workload
in. As far as prefetching goes, I am under the impression from the
Valgrind documentation that Cachegrind does not include a prefetch
algorithm.


Another larger issue I'm having is that even a benchmark as simple as
blackscholes is showing only 1523 I1 misses in the cache sim, and then
when run on real hardware, Perf reports an enormous 54,259,977 I1 load
misses. This is another question I would like to pose.

Thanks,

Jason

On Fri, Dec 7, 2012 at 3:06 PM, Josef Weidendorfer
<[email protected]> wrote:
> Am 07.12.2012 20:46, schrieb Jason Palaszewski:
>
>> Hi, I'm trying to compare the number of cache misses (D1, I1, and LL)
>> between what Perf gives me (on the hardware itself) vs. what
>> Cachegrind thinks the number of misses should be. The server machine
>> has two Sandy Bridge Intel Xeon E5-2430 CPUs on it, and the PARSEC 3.0
>> suite (compiled in gcc-serial format, single threaded) is being run
>> through cachegrind for analysis to obtain the number of D1, I1, and LL
>> misses vs. the number of real misses on the hardware obtained by
>> running the same benchmark binaries through Perf and counting D1 load
>> and store misses as well as I1 misses. A ratio of the Perf misses to
>> Cachegrind misses holds to about a factor of 1-2x. However, some
>> benchmarks like ferret have a much higher number of misses on Perf
>> than on Cachegrind.
>
>
> Is this LL misses, or L1 misses? For L1 misses, you may observe much
> more misses as real caches are asynchronous, ie. consecutive loads to
> the same line will give as much misses as loads, while in Cachegrind
> after the first miss all others will be hits.
>
> Hm. Is ferret pure user-space, or does it trigger work in the kernel?
> It may be that the kernel side evicts a lot of data from the cache,
> and this becomes visible via user-level cache misses.
>
> Another possibility is that hardware prefetching is too clever, and
> evicts lines to enable prefetching of data which is not actually used.
>
> Josef
>
>
>  Has anyone else done analysis on Perf results vs.
>>
>> Cachegrind simulated results by running benchmarks on both of these?
>> The machines are also running RedHat 5. Thanks for any information.
>>
>>
>> ------------------------------------------------------------------------------
>> LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
>> Remotely access PCs and mobile devices and provide instant support
>> Improve your efficiency, and focus on delivering more value-add services
>> Discover what IT Professionals Know. Rescue delivers
>> http://p.sf.net/sfu/logmein_12329d2d
>> _______________________________________________
>> Valgrind-users mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/valgrind-users
>>
>

------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
_______________________________________________
Valgrind-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Re: [Valgrind-users] Perf cache misses vs. Cachegrind simulated cache misses through PARSEC benchmark suite

Reply via email to