Cheng Lian created SPARK-8125:
---------------------------------

             Summary: Accelerate ParquetRelation2 metadata discovery
                 Key: SPARK-8125
                 URL: https://issues.apache.org/jira/browse/SPARK-8125
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
    Affects Versions: 1.4.0
            Reporter: Cheng Lian
            Assignee: Cheng Lian
            Priority: Blocker


For large Parquet tables (e.g., with thousands of partitions), it can be very 
slow to discover Parquet metadata for schema merging and generating splits for 
Spark jobs. We need to accelerate this processes. One possible solution is to 
do the discovery via a distributed Spark job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to