Following the code snippets on this thread <https://github.com/dmlc/xgboost/issues/1698> , I got a working version of XGBoost on pyspark. But one issue I am still facing is the following File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/dummy_package/xgboost/xgboost.py", line 92, in __init__ self._java_obj = self._new_java_obj("ml.dmlc.xgboost4j.scala.spark.XGBoostEstimator", self.uid) File "/Users/ultrauser/Downloads/spark/python/pyspark/ml/wrapper.py", line 61, in _new_java_obj java_obj = getattr(java_obj, name) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/py4j/java_gateway.py", line 1598, in __getattr__ raise Py4JError("{0} does not exist in the JVM".format(new_fqn))py4j.protocol.Py4JError: ml.dmlc.xgboost4j.scala.spark.XGBoostEstimator does not exist in the JVMException ignored in: Traceback (most recent call last): File "/Users/ultrauser/Downloads/spark/python/pyspark/ml/wrapper.py", line 105, in __del__ SparkContext._active_spark_context._gateway.detach(self._java_obj) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/py4j/java_gateway.py", line 2000, in detach java_object._detach()AttributeError: 'NoneType' object has no attribute '_detach' >From what I read on StackOverflow and elsewhere, this looks like an issue of jar locations. I have two jar files that are needed for this code to work xgboost4j-0.72.jar xgboost4j-spark-0.72 But I am not sure how to proceed. This is what I have tried so far place the xgboost jar files in /Library/Java/Extensions set the environment variables import osos.environ['PYSPARK_SUBMIT_ARGS'] = '--jars /Users/ultrauser/Downloads/xgboost4j-0.72.jar, /Users/ultrauser/Downloads/xgboost4j-spark-0.72.jar pyspark-shell' Place the jar files in $SPARK_HOME/jars But the error still persists. Is there something I am missing here?
-- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/