Cheng Lian created SPARK-8125: --------------------------------- Summary: Accelerate ParquetRelation2 metadata discovery Key: SPARK-8125 URL: https://issues.apache.org/jira/browse/SPARK-8125 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 1.4.0 Reporter: Cheng Lian Assignee: Cheng Lian Priority: Blocker
For large Parquet tables (e.g., with thousands of partitions), it can be very slow to discover Parquet metadata for schema merging and generating splits for Spark jobs. We need to accelerate this processes. One possible solution is to do the discovery via a distributed Spark job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org