On 5/27/20 Paul FLOYD wrote:
Well, no real surprises. This is with a testcase that runs standalone in about 5 seconds and under DHAT in about 200 seconds (so a reasonable slowdown of 40x).# Overhead Command Shared Object Symbol # ........ ............... .................. ................................................................................................................................................................................................................ # 29.11% dhat-amd64-linu dhat-amd64-linux [.] interval_tree_Cmp 21.13% dhat-amd64-linu perf-26905.map [.] 0x00000010057a25f8 13.32% dhat-amd64-linu dhat-amd64-linux [.] vgPlain_lookupFM 9.56% dhat-amd64-linu dhat-amd64-linux [.] dh_handle_read 8.83% dhat-amd64-linu dhat-amd64-linux [.] vgPlain_nextIterFM 4.66% dhat-amd64-linu dhat-amd64-linux [.] check_for_peak 1.85% dhat-amd64-linu dhat-amd64-linux [.] vgPlain_disp_cp_xindir 1.32% dhat-amd64-linu [kernel.kallsyms] [k] 0xffffffff8103ec0a 1.00% dhat-amd64-linu dhat-amd64-linux [.] dh_handle_write
To me this suggests two things: 1) investigate the coding of the 4 or 5 highest-use subroutines (interval_tree_Cmp, vgPlain_lookupFM, dh_handle_read, vgPlain_nextIterFM) 2) see whether DHAT might recognize and use higher-level abstractions than MemoryRead and MemoryWrite of individual addresses. Similar to memcheck intercepting and analyzing strlen (etc.) as a complete concept instead of as its individual Reads and Writes, perhaps DHAT could intercept (and/or recognize) vector linear search, vector addition, vector partial sum, other BLAS routines, etc., and then analyze the algorithm as a whole. _______________________________________________ Valgrind-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/valgrind-users
