[ https://issues.apache.org/jira/browse/PARQUET-1261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gabor Szadovszky reassigned PARQUET-1261: ----------------------------------------- Assignee: Robert Kruszewski > Parquet-format interns strings when reading filemetadata > -------------------------------------------------------- > > Key: PARQUET-1261 > URL: https://issues.apache.org/jira/browse/PARQUET-1261 > Project: Parquet > Issue Type: Bug > Affects Versions: 1.9.0 > Reporter: Robert Kruszewski > Assignee: Robert Kruszewski > Priority: Major > > Parquet-format when deserializing metadata will intern strings. References I > could find suggested that it had been done to reduce memory pressure early > on. Java (and jvm in particular) went a long way since then and interning is > generally discouraged, see > [https://shipilev.net/jvm-anatomy-park/10-string-intern/] for a good > explanation. What is more since java 8 there's string deduplication > implemented at GC level per [http://openjdk.java.net/jeps/192.] During our > usage and testing we found the interning to cause significant gc pressure for > long running applications due to bigger GC root set. > This issue proposes removing interning given it's questionable whether it > should be used in modern jvms. -- This message was sent by Atlassian JIRA (v7.6.3#76005)