I ended up post-processing the result in hive with a dynamic partition
insert query to get a table partitioned by the key.
Looking further, it seems that 'dynamic partition' insert is in Spark SQL
and working well in Spark SQL versions > 1.2.0:
https://issues.apache.org/jira/browse/SPARK-3007
On
Is there an efficient way to save an RDD with saveAsTextFile in such a way
that the data gets shuffled into separated directories according to a key?
(My end goal is to wrap the result in a multi-partitioned Hive table)
Suppose you have:
case class MyData(val0: Int, val1: string, directory_name: