examples of dealing with nested parquet/ dataframe file

Zeming Yu Sun, 30 Apr 2017 06:09:07 -0700

Hi,

I'm still trying to decide whether to store my data as deeply nested or
flat parquet file.


The main reason for storing the nested file is it stores data in its raw
format, no information loss.

I have two questions:

1. Is it always necessary to flatten a nested dataframe for the purpose of
building a machine learning model? (I don't want to use the explode
function as there's only one response per row)

2. Could anyone point me to a few examples of dealing with deeply nested
(say 5 levels deep) dataframes in pyspark?

examples of dealing with nested parquet/ dataframe file

Reply via email to