[ https://issues.apache.org/jira/browse/SPARK-45908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated SPARK-45908: ----------------------------------- Labels: pull-request-available (was: ) > write empty parquet file while using partitioned write > ------------------------------------------------------ > > Key: SPARK-45908 > URL: https://issues.apache.org/jira/browse/SPARK-45908 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 3.5.0 > Reporter: Paride Casulli > Priority: Minor > Labels: pull-request-available > > Hi, > I'm currently using pyspark and if I try to write an empty dataframe in > parquet file in a partitioned way no file is written in the target folder > df.write.mode("overwrite").partitionBy("BUSINESS_DATE").parquet("/data_dir/"+stg+"/ISS/exchange/WORK_ISSR_EOD_EXT_SETTLEMENT_CA_"+se) > > this creates a problem because I have another job which reads the file and > can't infer the schema and raises an error. I made a workaround in this way: > #implemented to manage empty data also > def write_partitioned_df(df,partition_col,partition_val,save_path): > df.write.mode("overwrite").partitionBy(partition_col).parquet(save_path) > if df.isEmpty(): > df=df.drop(partition_col) > > df.write.mode("overwrite").parquet(save_path+"/"+partition_col+"="+partition_val) > > in order to write an empty parquet in the target folder but It would be great > to have an option in the write function to avoid this custom implementation. > I see other users interested in this feature asking on StackOverflow. > > Thank you very much > Paride > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org