anirudh2290 commented on issue #14973: [MXNET-1404] Added the GPU memory 
profiler
URL: https://github.com/apache/incubator-mxnet/pull/14973#issuecomment-496661805
 
 
   > ## Discussion
   > * [x]  Preprocessor Directives (1. 2.).
   >   
   >   * The reason why I included the GPU memory profiler as header flags 
rather than compilation flags is because in the former implementation, when 
users decide to switch on/off the GPU memory profiler, **the build system can 
automatically help them figure out the build dependency**, whereas in the 
latter implementation they need to compile the entire MXNet from scratch.
   > * [x]  Separate Preprocessor Directives (3.).
   >   
   >   * As their names indicate, **storage tagging** and **GPU memory 
profiling** are actually two separate things - **storage tagging** is equally 
applicable to CPU memory profiling (or other devices) while **GPU memory 
profiling** is only an example that benefits from it. That is why I keep them 
separate.
   >   * I would like to propose **PERMANENTLY adding storage tagging** to the 
C++ backend, as I believe it might be a useful feature in the future and I do 
not think propagating an extra string from the Python frontend to the C++ 
backend will hurt performance because: (1) most strings are small (2) the 
storage tagging only happens once at the beginning of the training.
   
   I think we can have additional tag parameter in ndarray, storage and 
resource. I think it should be fine and I dont see this additional parameter 
for these different API calls becoming a bottleneck.
   
   I think we should do this PR in an incremental fashion though. First phase 
should just allow for API changes limited to MXNET_ENABLE_STORAGE_TAGGING. Once 
we get that merged, we can monitor our internal benchmarks for change in 
performance and move on to the additional changes. I am open to alternative 
suggestions.
   
   > * [ ]  Profiler API Integration (5. 6. 7. 8.)
   >   
   >   * The GPU memory profiler is different from the existing profilers in 
many ways:
   >     (1) It is not using the `chrome://tracing` visualization backend, and 
the reason is because it needs to accept users' input on defining the **keyword 
dictionaries for grouping storage tags** (also, I do not see a very good way of 
visualizing bar charts using `chrome://tracing`).
   >     (2) Because it requires users' input, the users must first look into 
the memory profiler logs to see what contributes most of the memory footprint, 
and this is why those logs are stored as `.csv` because they need to be first 
digested by the users.
   >     (3) Current profiler API designs are more geared towards performance 
profiling, which in my opinion is different from storage profiling (e.g., in 
storage profiling, you do not really need the `pause()` and the `resume()` API 
call).
   >   * Based on these, I decided to keep the current GPU memory profiler from 
the existing profiler APIs because I do not see a very good way of integrating 
them.
   
   I think the MXNet profiler features are not limited just to profiling 
performance but also memory. The current profiler code provides data structures 
for counter that you can increment and it collects all the stats behind the 
scenes.
   
   You can find examples for how to use the profiler for your custom use cases:
   
https://github.com/apache/incubator-mxnet/blob/master/src/profiler/storage_profiler.h
   and here:
   
https://github.com/anirudh2290/mxnet/blob/memory_profiler_poc2/src/profiler/pool_memory_profiler.h
   
https://github.com/anirudh2290/mxnet/blob/memory_profiler_poc2/src/storage/pooled_storage_manager.h#L160
   
   It seems like where your use case differs is you want to dump in a different 
format. You can add support for additional format in 
https://github.com/apache/incubator-mxnet/blob/master/src/profiler/aggregate_stats.cc#L46
   
   and expose that additional format in frontend dumps API.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to