[ 
https://issues.apache.org/jira/browse/SPARK-48649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48649:
-----------------------------------

    Assignee: Ivan Sadikov

> Add "ignoreInvalidPartitionPaths" and 
> "spark.sql.files.ignoreInvalidPartitionPaths" configs to allow ignoring 
> invalid partition paths
> -------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-48649
>                 URL: https://issues.apache.org/jira/browse/SPARK-48649
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 4.0.0
>            Reporter: Ivan Sadikov
>            Assignee: Ivan Sadikov
>            Priority: Major
>              Labels: pull-request-available
>
> When having a table directory with invalid partitions such as:
> {code:java}
> table/
>   invalid/...
>   part=1/...
>   part=2/...
>   part=3/...{code}
> a SQL query reading all of the partitions would fail with 
> {code:java}
> java.lang.AssertionError: assertion failed: Conflicting directory structures 
> detected. Suspicious paths: 
>  table 
>  table/invalid {code}
>  
> I propose to add a data source option and Spark SQL config to ignore invalid 
> partition paths. The config will be disabled by default to retain the current 
> behaviour.
> {code:java}
> spark.conf.set("spark.sql.files.ignoreInvalidPartitionPaths", "true"){code}
> {code:java}
> spark.read.format("parquet").option("ignoreInvalidPartitionPaths", 
> "true").load(...)  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to