RE: Broadcasting a parquet file using spark and python

2015-12-07 Thread Shuai Zheng
: Jitesh chandra Mishra Cc: user Subject: Re: Broadcasting a parquet file using spark and python You will need to create a hive parquet table that points to the data and run "ANALYZE TABLE tableName noscan" so that we have statistics on the size. On Tue, Mar 31, 2015 at 9:36 PM, Jite

Re: Broadcasting a parquet file using spark and python

2015-12-05 Thread Michael Armbrust
> > > > Regards, > > > > Shuai > > > > *From:* Michael Armbrust [mailto:mich...@databricks.com] > *Sent:* Wednesday, April 01, 2015 2:01 PM > *To:* Jitesh chandra Mishra > *Cc:* user > *Subject:* Re: Broadcasting a parquet file using spark and python >

RE: Broadcasting a parquet file using spark and python

2015-12-04 Thread Shuai Zheng
From: Michael Armbrust [mailto:mich...@databricks.com] Sent: Wednesday, April 01, 2015 2:01 PM To: Jitesh chandra Mishra Cc: user Subject: Re: Broadcasting a parquet file using spark and python You will need to create a hive parquet table that points to the data and run "ANALYZE

Re: Broadcasting a parquet file using spark and python

2015-04-01 Thread Michael Armbrust
You will need to create a hive parquet table that points to the data and run "ANALYZE TABLE tableName noscan" so that we have statistics on the size. On Tue, Mar 31, 2015 at 9:36 PM, Jitesh chandra Mishra wrote: > Hi Michael, > > Thanks for your response. I am running 1.2.1. > > Is there any wor

Re: Broadcasting a parquet file using spark and python

2015-03-31 Thread Jitesh chandra Mishra
Hi Michael, Thanks for your response. I am running 1.2.1. Is there any workaround to achieve the same with 1.2.1? Thanks, Jitesh On Wed, Apr 1, 2015 at 12:25 AM, Michael Armbrust wrote: > In Spark 1.3 I would expect this to happen automatically when the parquet > table is small (< 10mb, confi

Re: Broadcasting a parquet file using spark and python

2015-03-31 Thread Michael Armbrust
In Spark 1.3 I would expect this to happen automatically when the parquet table is small (< 10mb, configurable with spark.sql.autoBroadcastJoinThreshold). If you are running 1.3 and not seeing this, can you show the code you are using to create the table? On Tue, Mar 31, 2015 at 3:25 AM, jitesh129