James Coder created ARROW-17634: ----------------------------------- Summary: pyarrow.fs import reserves large amount of memory Key: ARROW-17634 URL: https://issues.apache.org/jira/browse/ARROW-17634 Project: Apache Arrow Issue Type: Bug Affects Versions: 9.0.0 Reporter: James Coder
It seems that in version 9.0.0 `import pyarrow.fs` reserves 1+ (close to 2) gigs of virtual memory, this was not present in 8.0.0 Test code: ```python def memory_snapshot(label=''): from util.System import System rss = System.process_rss_gigabytes() vms = _max = System.process_gigabytes() _max = System.process_max_gigabytes() print("Memory snapshot (%s); rss=%.1f vms=%.1f max=%.1f GB" % (label, rss, vms, _max)) memory_snapshot() import pyarrow print(pyarrow.__version__) memory_snapshot() import pyarrow.fs memory_snapshot() ``` 8.0.0 output ``` Memory snapshot (); rss=0.1 vms=0.4 max=0.4 GB 8.0.0 Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB ``` 9.0.0 output ``` Memory snapshot (); rss=0.1 vms=0.4 max=0.4 GB 9.0.0 Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB Memory snapshot (); rss=0.2 vms=2.2 max=2.3 GB ``` digging further into what happens during import, it seems `initialize_s3` is what is the culprit. ``` before s3 initialize Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB after s3 initialize Memory snapshot (); rss=0.2 vms=2.2 max=2.3 GB ``` -- This message was sent by Atlassian Jira (v8.20.10#820010)