nicklhy opened a new issue #15218: memory leak of ImageRecordIter ?
URL: https://github.com/apache/incubator-mxnet/issues/15218
 
 
   Hi, all
   I have a very big dataset (over 20 million pics) which has been split and 
transformed into multiple recordio files. Thus, I wrote a custom data_iter that 
would repeatedly open theses recordio files during training. However, I found 
the memory usage keeps going up and the process would crash eventually.
   It seems the memory leak is cause by `mx.io.ImageRecordIter`. A simple 
script to reproduce it is like below
   ```python
   import os
   import psutil
   import mxnet as mx
   
   def sizeof_fmt(num, suffix='B'):
       for unit in ['','Ki','Mi','Gi','Ti','Pi','Ei','Zi']:
           if abs(num) < 1024.0:
               return "%3.1f%s%s" % (num, unit, suffix)
           num /= 1024.0
       return "%.1f%s%s" % (num, 'Yi', suffix)
   
   pid = os.getpid()
   
   for i in range(50):
       rec = mx.io.ImageRecordIter(
           path_imgidx='data/imagenet_rec/rec/train.idx',
           path_imgrec='data/imagenet_rec/rec/train.rec',
           preprocess_threads=4,
           data_shape=(3, 224, 224),
           batch_size=64
       )
       del rec
       # get the memory usage of this process
       process = psutil.Process(pid)
       mem_bytes = process.memory_info().rss
       print('Memory usage of round '+str(i)+': '+sizeof_fmt(mem_bytes))
   ```
   The output is like
   ```
   $ python3 test.py
   Memory usage of round 0: 1.6GiB
   Memory usage of round 1: 1.7GiB
   Memory usage of round 2: 1.7GiB
   Memory usage of round 3: 1.7GiB
   Memory usage of round 4: 1.8GiB
   Memory usage of round 5: 1.8GiB
   Memory usage of round 6: 1.8GiB
   Memory usage of round 7: 1.9GiB
   Memory usage of round 8: 1.9GiB
   Memory usage of round 9: 1.9GiB
   Memory usage of round 10: 2.0GiB
   Memory usage of round 11: 2.0GiB
   Memory usage of round 12: 2.0GiB
   Memory usage of round 13: 2.1GiB
   Memory usage of round 14: 2.1GiB
   Memory usage of round 15: 2.2GiB
   Memory usage of round 16: 2.2GiB
   Memory usage of round 17: 2.2GiB
   Memory usage of round 18: 2.3GiB
   Memory usage of round 19: 2.3GiB
   Memory usage of round 20: 2.3GiB
   Memory usage of round 21: 2.4GiB
   Memory usage of round 22: 2.4GiB
   Memory usage of round 23: 2.4GiB
   Memory usage of round 24: 2.5GiB
   Memory usage of round 25: 2.5GiB
   Memory usage of round 26: 2.6GiB
   Memory usage of round 27: 2.6GiB
   Memory usage of round 28: 2.6GiB
   Memory usage of round 29: 2.7GiB
   Memory usage of round 30: 2.7GiB
   Memory usage of round 31: 2.7GiB
   Memory usage of round 32: 2.8GiB
   Memory usage of round 33: 2.8GiB
   Memory usage of round 34: 2.9GiB
   Memory usage of round 35: 2.9GiB
   Memory usage of round 36: 2.9GiB
   Memory usage of round 37: 3.0GiB
   Memory usage of round 38: 3.0GiB
   Memory usage of round 39: 3.0GiB
   Memory usage of round 40: 3.1GiB
   Memory usage of round 41: 3.1GiB
   Memory usage of round 42: 3.1GiB
   Memory usage of round 43: 3.2GiB
   Memory usage of round 44: 3.2GiB
   Memory usage of round 45: 3.2GiB
   Memory usage of round 46: 3.3GiB
   Memory usage of round 47: 3.3GiB
   Memory usage of round 48: 3.4GiB
   Memory usage of round 49: 3.4GiB
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to