[ https://issues.apache.org/jira/browse/SPARK-32149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Attila Zsolt Piros updated SPARK-32149: --------------------------------------- Affects Version/s: (was: 3.0.1) 3.1.0 > Improve file path name normalisation at block resolution within the external > shuffle service > -------------------------------------------------------------------------------------------- > > Key: SPARK-32149 > URL: https://issues.apache.org/jira/browse/SPARK-32149 > Project: Spark > Issue Type: Improvement > Components: Shuffle > Affects Versions: 3.1.0 > Reporter: Attila Zsolt Piros > Priority: Major > > In the external shuffle service during the block resolution the file paths > (for disk persisted RDD and for shuffle blocks) are normalized by a custom > Spark code which uses an OS dependent regexp. This is a redundant code of the > package-private JDK counterpart. > As the code not a perfect match even it could happen one method results in a > bit different (but semantically equal) path. > The reason of this redundant transformation is the interning of the > normalized path to save some heap here which is only possible if both results > in the same string. > Checking the JDK code I believe there is a better solution which is perfect > match for the JDK code as it uses that package private method. Moreover based > on some benchmarking even this new method seams to be more performant too. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org