Re: Spark SQL performance issue.

2015-04-23 Thread Arush Kharbanda
Hi Can you share your Web UI, depicting your task level breakup.I can see many thing s that can be improved. 1. JavaRDD rdds = ...rdds.cache(); ->this caching is not needed as you are not reading the rdd for any action 2.Instead of collecting as list, if you can save as text file, it would be b

Re: Spark SQL performance issue.

2015-04-23 Thread Nikolay Tikhonov
> why are you cache both rdd and table? I try to cache all the data to avoid the bad performance for the first query. Is it right? > Which stage of job is slow? The query is run many times on one sqlContext and each query execution takes 1 second. 2015-04-23 11:33 GMT+03:00 ayan guha : > Quick q

Re: Spark SQL performance issue.

2015-04-23 Thread ayan guha
Quick questions: why are you cache both rdd and table? Which stage of job is slow? On 23 Apr 2015 17:12, "Nikolay Tikhonov" wrote: > Hi, > I have Spark SQL performance issue. My code contains a simple JavaBean: > > public class Person implements Externalizable { > private int id; >