[ https://issues.apache.org/jira/browse/ARROW-17634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17646282#comment-17646282 ]
James Coder commented on ARROW-17634: ------------------------------------- This seems to be resolved in 10.0.1 ``` Memory snapshot (); rss=0.1 vms=0.4 max=0.4 GB 10.0.1 Memory snapshot (); rss=0.1 vms=0.5 max=0.5 GB Memory snapshot (); rss=0.1 vms=0.5 max=0.5 GB ``` > pyarrow.fs import reserves large amount of memory > ------------------------------------------------- > > Key: ARROW-17634 > URL: https://issues.apache.org/jira/browse/ARROW-17634 > Project: Apache Arrow > Issue Type: Bug > Affects Versions: 9.0.0 > Reporter: James Coder > Priority: Major > > It seems that in version 9.0.0 `import pyarrow.fs` reserves 1+ (close to 2) > gigs of virtual memory, this was not present in 8.0.0 > Test code: > {code:python} > def memory_snapshot(label=''): > from util.System import System > rss = System.process_rss_gigabytes() > vms = _max = System.process_gigabytes() > _max = System.process_max_gigabytes() > print("Memory snapshot (%s); rss=%.1f vms=%.1f max=%.1f GB" % (label, rss, > vms, _max)) > memory_snapshot() > import pyarrow > print(pyarrow.__version__) > memory_snapshot() > import pyarrow.fs > memory_snapshot() > {code} > 8.0.0 output > {code} > Memory snapshot (); rss=0.1 vms=0.4 max=0.4 GB > 8.0.0 > Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB > Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB > {code} > 9.0.0 output > {code} > Memory snapshot (); rss=0.1 vms=0.4 max=0.4 GB > 9.0.0 > Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB > Memory snapshot (); rss=0.2 vms=2.2 max=2.3 GB > {code} > digging further into what happens during import, it seems `initialize_s3` is > what is the culprit. > {code} > before s3 initialize > Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB > after s3 initialize > Memory snapshot (); rss=0.2 vms=2.2 max=2.3 GB > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)