saving data frame to optimize joins at a later time

Cesar Tue, 02 Aug 2016 12:09:54 -0700

Hi all:

I wonder if there is a way to save a table in order to optimize join at a
later time.


For example if I do something like:

val df = anotherDF.repartition("id")//some data frame
df.registerTempTable("tableAlias")

hiveContext.sql(
  "INSERT INTO whse.someTable
   SELECT * FROM tableAlias
 "
)

Do the partition information ("id") will be stored in whse.someTable such
that when querying on that table in a second spark job, the information
will be used for optimizing joins for example?

If this approach do not work, can you suggest one that works?


Thanks
-- 
Cesar Flores

saving data frame to optimize joins at a later time

Reply via email to