[ https://issues.apache.org/jira/browse/SPARK-18273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aniket Bhatnagar resolved SPARK-18273. -------------------------------------- Resolution: Not A Problem Glob patterns can be passed instead of full paths to reduce the numbers of paths passed in to load method. > DataFrameReader.load takes a lot of time to start the job if a lot of > file/dir paths are pass > ---------------------------------------------------------------------------------------------- > > Key: SPARK-18273 > URL: https://issues.apache.org/jira/browse/SPARK-18273 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 2.0.1 > Reporter: Aniket Bhatnagar > Priority: Minor > > If the paths Seq parameter contains a lot of elements, then > DataFrameReader.load takes a lot of time starting the job as it attempts to > check if each of the path exists using fs.exists. There should be a boolean > configuration option to disable the checking for path's existence and that > should be passed in as parameter to DataSource.resolveRelation call. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org