>>> Currently it takes around 1.5 hours for me just to cache in the partition information and after that I can see that the job gets queued in the SPARK UI. I guess you mean the stage of getting the split info. I suspect it might be your cluster issue (or metadata store), unusually it won't take such long time for splitting.
On Wed, Dec 16, 2015 at 8:06 AM, Gourav Sengupta <gourav.sengu...@gmail.com> wrote: > Hi, > > I have a HIVE table with few thousand partitions (based on date and time). > It takes a long time to run if for the first time and then subsequently it > is fast. > > Is there a way to store the cache of partition lookups so that every time > I start a new SPARK instance (cannot keep my personal server running > continuously), I can immediately restore back the temptable in hiveContext > without asking it go again and cache the partition lookups? > > Currently it takes around 1.5 hours for me just to cache in the partition > information and after that I can see that the job gets queued in the SPARK > UI. > > Regards, > Gourav > -- Best Regards Jeff Zhang