Re: Efficient Spark-Sql queries when only nth Column changes

ayan guha Sat, 18 Feb 2017 13:28:59 -0800

Try grouping sets.

On Sun, Feb 19, 2017 at 8:23 AM, Patrick <titlibat...@gmail.com> wrote:


> Hi,
>
> I have read 5 columns from parquet into data frame. My queries on the
> parquet table is of below type:
>
> val df1 = sqlContext.sql(select col1,col2,count(*) from table groupby
> col1,col2)
> val df2 = sqlContext.sql(select col1,col3,count(*) from table  groupby
> col1,col3)
> val df3 = sqlContext.sql(select col1,col4,count(*) from table  groupby
> col1,col4)
> val df4 = sqlContext.sql(select col1,col5,count(*) from table  groupby
> col1,col5)
>
> And then i require to union the results from df1 to df4 into a single df.
>
>
> So basically, only the second column is changing, Is there any efficient
> way to write the above queries  in Spark-Sql instead of writing 4 different
> queries(OR in loop) and doing union to get the result.
>
>
> Thanks
>
>
>
>
>
>


-- 
Best Regards,
Ayan Guha

Re: Efficient Spark-Sql queries when only nth Column changes

Reply via email to