How are these small RDDs created? Could the blockage be in their compute
creation instead of their caching?

Thanks,
Sonal
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>



On Thu, Aug 23, 2018 at 6:38 PM, Guillermo Ortiz <konstt2...@gmail.com>
wrote:

> I use spark with caching with persist method. I have several RDDs what I
> cache but some of them are pretty small (about 300kbytes). Most of time it
> works well and usually lasts 1s the whole job, but sometimes it takes about
> 40s to store 300kbytes to cache.
>
> If I go to the SparkUI->Cache, I can see how the percentage is increasing
> until 83% (250kbytes) and then it stops for a while. If I check the event
> time in the Spark UI I can see that when this happen there is a node where
> tasks takes very long time. This node could be any from the cluster, it's
> not always the same.
>
> In the spark executor logs I can see it's that it takes about 40s in store
> 3.7kb when this problem occurs
>
>     INFO  2018-08-23 12:46:58 Logging.scala:54 - 
> org.apache.spark.storage.BlockManager:
> Found block rdd_1705_23 locally
>     INFO  2018-08-23 12:47:38 Logging.scala:54 - 
> org.apache.spark.storage.memory.MemoryStore:
> Block rdd_1692_7 stored as bytes in memory (estimated size 3.7 KB, free
> 1048.0 MB)
>     INFO  2018-08-23 12:47:38 Logging.scala:54 - 
> org.apache.spark.storage.BlockManager:
> Found block rdd_1692_7 locally
>
> I have tried with MEMORY_ONLY, MEMORY_AND_SER and so on with the same
> results. I have checked the IO disk (although if I use memory_only I guess
> that it doesn't have sense) and I can't see any problem. This happens
> randomly, but it could be in the 25% of the jobs.
>
> Any idea about what it could be happening?
>

Reply via email to