No. I am running Spark on YARN on a 3 node testing cluster.
My guess is that given the amount of splits done by a hundred trees of depth 30
(which should be more than 100 * 2^30), either the executors or the driver die
OOM while trying to store all the split metadata. I guess that the same
You running locally? Found exactly same issue.
2 solutions:
_ reduce datA size.
_ run on EMR
Hth
On 10 Jan 2017 10:07 am, "Julio Antonio Soto" wrote:
> Hi,
>
> I am running into OOM problems while training a Spark ML
> RandomForestClassifier (maxDepth of 30, 32 maxBins, 100
Hi,
I am running into OOM problems while training a Spark ML
RandomForestClassifier (maxDepth of 30, 32 maxBins, 100 trees).
My dataset is arguably pretty big given the executor count and size (8x5G),
with approximately 20M rows and 130 features.
The "fun fact" is that a single