Hi all:

I wonder if there is a way to save a table in order to optimize join at a
later time.

For example if I do something like:

val df = anotherDF.repartition("id")//some data frame
df.registerTempTable("tableAlias")

hiveContext.sql(
  "INSERT INTO whse.someTable
   SELECT * FROM tableAlias
 "
)

Do the partition information ("id") will be stored in whse.someTable such
that when querying on that table in a second spark job, the information
will be used for optimizing joins for example?

If this approach do not work, can you suggest one that works?


Thanks
-- 
Cesar Flores

Reply via email to