Let's assume that I have a code like the following: val sqlQuery = "select * from whse.table_a cluster by user_id" val df = hc.sql(sqlQuery)
My understanding is that the cluster function will partition the data frame by user_id and also sort inside each partition (something very useful for performing joins later). Is that true? And second question, if I save *df* just after the query into a hive table, when I reload this table from hive, does spark will remember the partitioning? I am using at the moment 1.3.1 spark version. Thanks -- Cesar Flores