Thanks Piotr for your feedback !
I did look into the sparkit-learn yesterday but couldn't locate the fact
that it contained RandomForestClassifier method in it. I would need to
request customer for downloading this for me as I don't have permission for
that. May I please get your possible help whe
Hi Debu,
I have not worked with pyspark yet and cannot resolve your error,
but have you tried out sparkit-learn?
https://github.com/lensacom/sparkit-learn
It seems to be a package combining pyspark with sklearn and it also has a
RandomForest and other classifiers:
(SparkRandomForestClassifier,
Hi Piotr,
Yes, I did use n_jobs = - 1 as well. But the code
didn't run successfully. On my output screen , I got the following message
instead of the JobLibMemoryError:
16/12/08 22:12:26 INFO YarnExtensionServices: In shutdown hook for
org.apache.spark.scheduler.cluster.YarnEx
Hi Debu,
it seems that you run out of memory.
Try using fewer processes.
I don't think that n_jobs = 1000 will perform as you wish.
Setting n_jobs to -1 uses the number of cores in your system.
Greets,
Piotr
On 09.12.2016 08:16, Debabrata Ghosh wrote:
Hi All,
Greetings !
Hi All,
Greetings !
I am getting JoblibMemoryError while executing a scikit-learn
RandomForestClassifier code. Here is my algorithm in short:
from sklearn.ensemble import RandomForestClassifier
from sklearn.cross_validation import train_test_split
import pandas as pd