You will need to create a hive parquet table that points to the data and run "ANALYZE TABLE tableName noscan" so that we have statistics on the size.
On Tue, Mar 31, 2015 at 9:36 PM, Jitesh chandra Mishra <jitesh...@gmail.com> wrote: > Hi Michael, > > Thanks for your response. I am running 1.2.1. > > Is there any workaround to achieve the same with 1.2.1? > > Thanks, > Jitesh > > On Wed, Apr 1, 2015 at 12:25 AM, Michael Armbrust <mich...@databricks.com> > wrote: > >> In Spark 1.3 I would expect this to happen automatically when the parquet >> table is small (< 10mb, configurable with >> spark.sql.autoBroadcastJoinThreshold). >> If you are running 1.3 and not seeing this, can you show the code you are >> using to create the table? >> >> On Tue, Mar 31, 2015 at 3:25 AM, jitesh129 <jitesh...@gmail.com> wrote: >> >>> How can we implement a BroadcastHashJoin for spark with python? >>> >>> My SparkSQL inner joins are taking a lot of time since it is performing >>> ShuffledHashJoin. >>> >>> Tables on which join is performed are stored as parquet files. >>> >>> Please help. >>> >>> Thanks and regards, >>> Jitesh >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/Broadcasting-a-parquet-file-using-spark-and-python-tp22315.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> >>> >> >