Hi,

I'm still trying to decide whether to store my data as deeply nested or
flat parquet file.

The main reason for storing the nested file is it stores data in its raw
format, no information loss.

I have two questions:

1. Is it always necessary to flatten a nested dataframe for the purpose of
building a machine learning model? (I don't want to use the explode
function as there's only one response per row)

2. Could anyone point me to a few examples of dealing with deeply nested
(say 5 levels deep) dataframes in pyspark?

Reply via email to