I have a PySpark method that applies the explode function on every Array column on the DataFrame.
def explode_column(df, column): select_cols = list(df.columns) col_position = select_cols.index(column) select_cols[col_position] = explode_outer(column).alias(column) return df.select(select_cols) def explode_all_arrays(df): still_has_arrays = True exploded_df = df while still_has_arrays: still_has_arrays = False for f in exploded_df.schema.fields: if isinstance(f.dataType, ArrayType): print(f"Exploding: {f}") still_has_arrays = True exploded_df = explode_column(exploded_df, f.name) return exploded_df When I have a small number of columns to explode it works perfectly, but on large DataFrames (~200 columns with ~40 explosions), after finishing, the DataFrame can't be written as Parquet. Even a small amount of data fails (400KB), not during the method processing, but on the write step. Any tips? I tried to write the dataframe as table and as a parquet file. It works when I store it as a temp view though. Thank you, henrique.