I am creating a dataFrame from parquet files. The schema is based on the parquet files, I do not know it before hand. What I want to do is group the entire DF into buckets based on a column.
val df = sqlContext.read.parquet("/path/to/files") val groupedBuckets: DataFrame[String, Array[Rows]] = df.groupBy($"columnName") I know this does not work because the DataFrame's groupBy is only used for aggregate functions. I cannot convert my DataFrame to a DataSet because I do not have a case class for the DataSet schema. The only thing I can do is convert the df to an RDD[Rows] and try to deal with the types. This is ugly and difficult. Is there any better way? Can I convert a DataFrame to a DataSet without a predefined case class? Brandon