>>> Currently it takes around 1.5 hours for me just to cache in the
partition information and after that I can see that the job gets queued in
the SPARK UI.
I guess you mean the stage of getting the split info. I suspect it might be
your cluster issue (or metadata store), unusually it won't take such long
time for splitting.

On Wed, Dec 16, 2015 at 8:06 AM, Gourav Sengupta <gourav.sengu...@gmail.com>
wrote:

> Hi,
>
> I have a HIVE table with few thousand partitions (based on date and time).
> It takes a long time to run if for the first time and then subsequently it
> is fast.
>
> Is there a way to store the cache of partition lookups so that every time
> I start a new SPARK instance (cannot keep my personal server running
> continuously), I can immediately restore back the temptable in hiveContext
> without asking it go again and cache the partition lookups?
>
> Currently it takes around 1.5 hours for me just to cache in the partition
> information and after that I can see that the job gets queued in the SPARK
> UI.
>
> Regards,
> Gourav
>



-- 
Best Regards

Jeff Zhang

Reply via email to