A question about sql clustering

Cesar Flores Mon, 23 Nov 2015 07:28:45 -0800

Let's assume that I have a code like the following:

val sqlQuery = "select * from whse.table_a cluster by user_id"
val df = hc.sql(sqlQuery)


My understanding is that the cluster function will partition the data frame
by user_id and also sort inside each partition (something very useful for
performing joins later). Is that true?

And second question, if I save *df* just after the query into a hive table,
when I reload this table from hive, does spark will remember the
partitioning?

I am using at the moment 1.3.1 spark version.

Thanks
-- 
Cesar Flores

A question about sql clustering

Reply via email to