nicklhy opened a new issue #15218: memory leak of ImageRecordIter ? URL: https://github.com/apache/incubator-mxnet/issues/15218 Hi, all I have a very big dataset (over 20 million pics) which has been split and transformed into multiple recordio files. Thus, I wrote a custom data_iter that would repeatedly open theses recordio files during training. However, I found the memory usage keeps going up and the process would crash eventually. It seems the memory leak is cause by `mx.io.ImageRecordIter`. A simple script to reproduce it is like below ```python import os import psutil import mxnet as mx def sizeof_fmt(num, suffix='B'): for unit in ['','Ki','Mi','Gi','Ti','Pi','Ei','Zi']: if abs(num) < 1024.0: return "%3.1f%s%s" % (num, unit, suffix) num /= 1024.0 return "%.1f%s%s" % (num, 'Yi', suffix) pid = os.getpid() for i in range(50): rec = mx.io.ImageRecordIter( path_imgidx='data/imagenet_rec/rec/train.idx', path_imgrec='data/imagenet_rec/rec/train.rec', preprocess_threads=4, data_shape=(3, 224, 224), batch_size=64 ) del rec # get the memory usage of this process process = psutil.Process(pid) mem_bytes = process.memory_info().rss print('Memory usage of round '+str(i)+': '+sizeof_fmt(mem_bytes)) ``` The output is like ``` $ python3 test.py Memory usage of round 0: 1.6GiB Memory usage of round 1: 1.7GiB Memory usage of round 2: 1.7GiB Memory usage of round 3: 1.7GiB Memory usage of round 4: 1.8GiB Memory usage of round 5: 1.8GiB Memory usage of round 6: 1.8GiB Memory usage of round 7: 1.9GiB Memory usage of round 8: 1.9GiB Memory usage of round 9: 1.9GiB Memory usage of round 10: 2.0GiB Memory usage of round 11: 2.0GiB Memory usage of round 12: 2.0GiB Memory usage of round 13: 2.1GiB Memory usage of round 14: 2.1GiB Memory usage of round 15: 2.2GiB Memory usage of round 16: 2.2GiB Memory usage of round 17: 2.2GiB Memory usage of round 18: 2.3GiB Memory usage of round 19: 2.3GiB Memory usage of round 20: 2.3GiB Memory usage of round 21: 2.4GiB Memory usage of round 22: 2.4GiB Memory usage of round 23: 2.4GiB Memory usage of round 24: 2.5GiB Memory usage of round 25: 2.5GiB Memory usage of round 26: 2.6GiB Memory usage of round 27: 2.6GiB Memory usage of round 28: 2.6GiB Memory usage of round 29: 2.7GiB Memory usage of round 30: 2.7GiB Memory usage of round 31: 2.7GiB Memory usage of round 32: 2.8GiB Memory usage of round 33: 2.8GiB Memory usage of round 34: 2.9GiB Memory usage of round 35: 2.9GiB Memory usage of round 36: 2.9GiB Memory usage of round 37: 3.0GiB Memory usage of round 38: 3.0GiB Memory usage of round 39: 3.0GiB Memory usage of round 40: 3.1GiB Memory usage of round 41: 3.1GiB Memory usage of round 42: 3.1GiB Memory usage of round 43: 3.2GiB Memory usage of round 44: 3.2GiB Memory usage of round 45: 3.2GiB Memory usage of round 46: 3.3GiB Memory usage of round 47: 3.3GiB Memory usage of round 48: 3.4GiB Memory usage of round 49: 3.4GiB ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services