Hi, During my aggregation I end up having following schema.
Row(Row(val1,val2), Row(val1,val2,val3...)) val values = Seq( (Row(10, 11), Row(10, 2, 11)), (Row(10, 11), Row(10, 2, 11)), (Row(20, 11), Row(10, 2, 11)) ) 1st tuple is used to group the relevant records for aggregation. I have used following to create dataset. val s = StructType(Seq( StructField("x", IntegerType, true), StructField("y", IntegerType, true) )) val s1 = StructType(Seq( StructField("u", IntegerType, true), StructField("v", IntegerType, true), StructField("z", IntegerType, true) )) val ds = sparkSession.sqlContext.createDataset(sparkSession.sparkContext.parallelize(values))(Encoders.tuple(RowEncoder(s), RowEncoder(s1))) Is this correct way of representing this? How do I create dataset and row encoder for such use case for doing groupByKey on this? Regards Sandeep