: Jitesh chandra Mishra
Cc: user
Subject: Re: Broadcasting a parquet file using spark and python
You will need to create a hive parquet table that points to the data and run
"ANALYZE TABLE tableName noscan" so that we have statistics on the size.
On Tue, Mar 31, 2015 at 9:36 PM, Jite
>
>
>
> Regards,
>
>
>
> Shuai
>
>
>
> *From:* Michael Armbrust [mailto:mich...@databricks.com]
> *Sent:* Wednesday, April 01, 2015 2:01 PM
> *To:* Jitesh chandra Mishra
> *Cc:* user
> *Subject:* Re: Broadcasting a parquet file using spark and python
>
From: Michael Armbrust [mailto:mich...@databricks.com]
Sent: Wednesday, April 01, 2015 2:01 PM
To: Jitesh chandra Mishra
Cc: user
Subject: Re: Broadcasting a parquet file using spark and python
You will need to create a hive parquet table that points to the data and run
"ANALYZE
You will need to create a hive parquet table that points to the data and
run "ANALYZE TABLE tableName noscan" so that we have statistics on the size.
On Tue, Mar 31, 2015 at 9:36 PM, Jitesh chandra Mishra
wrote:
> Hi Michael,
>
> Thanks for your response. I am running 1.2.1.
>
> Is there any wor
Hi Michael,
Thanks for your response. I am running 1.2.1.
Is there any workaround to achieve the same with 1.2.1?
Thanks,
Jitesh
On Wed, Apr 1, 2015 at 12:25 AM, Michael Armbrust
wrote:
> In Spark 1.3 I would expect this to happen automatically when the parquet
> table is small (< 10mb, confi
In Spark 1.3 I would expect this to happen automatically when the parquet
table is small (< 10mb, configurable with
spark.sql.autoBroadcastJoinThreshold).
If you are running 1.3 and not seeing this, can you show the code you are
using to create the table?
On Tue, Mar 31, 2015 at 3:25 AM, jitesh129