assuming task memory x number of cores does not exceed ~5g, and block cache
manager ratio does not have some really weird setting, the next best thing
to look at is initial task split size. I don' think in the release you are
looking at the driver manages initial off-dfs splits  satisfactorily (that
is, in any way at all). Basically, you may want smaller splits, more tasks
than what DFS gives you from the beginning. These apps tend to run a bit
better when splits do not exceed 100...500k non-zero elements.

I think Pat has done some stop-gap measure on current master for that
(which i don't believe is a true optimal thing to do though).

On Mon, Jul 20, 2015 at 1:40 PM, Rodolfo Viana <rodolfodelimavi...@gmail.com
> wrote:

> I’m trying to run Mahout 0.10 with Spark 1.1.1.
> I have input files with 8k, 10M, 20M, 25M.
>
> So far I run with the following configuration:
>
> 8k with 1,2,3 slaves
> 10M with 1, 2, 3 slaves
> 20M with 1,2,3 slaves
>
> But when I try to run
> bin/mahout spark-itemsimilarity --master spark://node1:7077 --input
> filein.txt --output out --sparkExecutorMem 6g
>
> with 25M I got this error:
>
> java.lang.OutOfMemoryError: Java heap space
>
> or
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>
> Is that normal? Because when I was running 20M I didn’t get any error, now
> I have 5M more.
>
> Any ideas why this is happening?
>
> --
> Rodolfo de Lima Viana
> Undergraduate in Computer Science at UFCG
>

Reply via email to