As I said, the memory to worry about is for the driver—the client code that
launches Spark executor tasks, this runs on a single machine so using more
machines will not help. Increase your driver memory with export
MAHOUT_HEAPSIZE=6000 in your envoronment or JAVA_MAX_HEAP or if you are using
That should be plenty of memory on you executors but is that where you are
running low? This may be a low heap on your driver/client code.
Increase driver memory by setting MAHOUT_HEAPSIZE=6g or some such when
launching the driver. I think the default is 4g. If you are using Yarn the
answer
assuming task memory x number of cores does not exceed ~5g, and block cache
manager ratio does not have some really weird setting, the next best thing
to look at is initial task split size. I don' think in the release you are
looking at the driver manages initial off-dfs splits satisfactorily
I’m trying to run Mahout 0.10 with Spark 1.1.1.
I have input files with 8k, 10M, 20M, 25M.
So far I run with the following configuration:
8k with 1,2,3 slaves
10M with 1, 2, 3 slaves
20M with 1,2,3 slaves
But when I try to run
bin/mahout spark-itemsimilarity --master spark://node1:7077 --input