RE: RDD caching, memory & network input

2015-01-28 Thread Andrianasolo Fanilo
= PredictionReader.getFeatures(…).cache Where getFeatures() loads the file then parses it. De : Sandy Ryza [mailto:sandy.r...@cloudera.com] Envoyé : mercredi 28 janvier 2015 17:12 À : Andrianasolo Fanilo Cc : user@spark.apache.org Objet : Re: RDD caching, memory & network input Hi Fanilo, How

Re: RDD caching, memory & network input

2015-01-28 Thread Sandy Ryza
Hi Fanilo, How many cores are you using per executor? Are you aware that you can combat the "container is running beyond physical memory limits" error by bumping the spark.yarn.executor.memoryOverhead property? Also, are you caching the parsed version or the text? -Sandy On Wed, Jan 28, 2015 a