Re: BroadcastJoin failed on partitioned parquet table

2018-10-01 Thread Wenchen Fan
I'm not sure if Spark 1.6 is still maintained, can you try a 2.x spark version and see if the problem still exists? On Sun, Sep 30, 2018 at 4:14 PM 白也诗无敌 <445484...@qq.com> wrote: > Besides I have tried ANALYZE statement. It has no use cause I need the > single partition but get the total table s

Re: Pyspark Partitioning

2018-10-01 Thread Gourav Sengupta
Hi, the most simple option is create UDF's of these different functions and then use case statement (or similar) in SQL and pass it on. But this is low tech, in case you have conditions based on record values which are even more granular, why not use a single UDF, and then let conditions handle it

Re: Time-Series Forecasting

2018-10-01 Thread Mina Aslani
Thank you very much, really appreciate the information. Kindest regards, Mina On Sat, Sep 29, 2018 at 9:42 PM Peyman Mohajerian wrote: > Here's a blog on Flint: > https://databricks.com/blog/2018/09/11/introducing-flint-a-time-series-library-for-apache-spark.html > I don't have an opinion about

Unable to read multiple JSON.Gz File.

2018-10-01 Thread Mahender Sarangam
I’m trying to read multiple .json.gz files from a Blob storage path using the below scala code. But I’m unable to read the data from the files or print the schema. If the files are not compressed as .gz then we are able to read all the files into the Dataframe. I’ve even tried giving *.gz but n