Re: Custom Spark Error on Hadoop Cluster

Xiangrui Meng Mon, 11 Jul 2016 12:24:17 -0700

(+user@spark. Please copy user@ so other people could see and help.)

The error message means you have an MLlib jar on the classpath but it
didn't contain ALS$StandardNNLSSolver. So it is either the modified jar not
deployed to the workers or there existing an unmodified MLlib jar sitting
in front of the modified one on the classpath. You can check the worker
logs and see the classpath used in launching the worker, and then check the
MLlib jars on that classpath. -Xiangrui


On Sun, Jul 10, 2016 at 10:18 PM Alger Remirata <abremirat...@gmail.com>
wrote:

> Hi Xiangrui,
>
> We have the modified jars deployed both on master and slave nodes.
>
> What do you mean by this line?: 1. The unmodified Spark jars were not on
> the classpath (already existed on the cluster or pulled in by other
> packages).
>
> How would I check that the unmodified Spark jars are not on the classpath?
> We change entirely the contents of the directory for SPARK_HOME. The newly
> built customized spark is the new contents of the current SPARK_HOME we
> have right now.
>
> Thanks,
>
> Alger
>
> On Fri, Jul 8, 2016 at 1:32 PM, Xiangrui Meng <m...@databricks.com> wrote:
>
>> This seems like a deployment or dependency issue. Please check the
>> following:
>> 1. The unmodified Spark jars were not on the classpath (already existed
>> on the cluster or pulled in by other packages).
>> 2. The modified jars were indeed deployed to both master and slave nodes.
>>
>> On Tue, Jul 5, 2016 at 12:29 PM Alger Remirata <abremirat...@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> First of all, we like to thank you for developing spark. This helps us a
>>> lot on our data science task.
>>>
>>> I have a question. We have build a customized spark using the following
>>> command:
>>> mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive
>>> -Phive-thriftserver -DskipTests clean package.
>>>
>>> On the custom spark we built, we've added a new scala file or package
>>> called StandardNNLS file however it got an error saying:
>>>
>>> Name: org.apache.spark.SparkException
>>> Message: Job aborted due to stage failure: Task 21 in stage 34.0 failed
>>> 4 times, most recent failure: Lost task 21.3 in stage 34.0 (TID 2547,
>>> 192.168.60.115): java.lang.ClassNotFoundException:
>>> org.apache.spark.ml.recommendation.ALS$StandardNNLSSolver
>>>
>>> StandardNNLSolver is found on another scala file called
>>> StandardNNLS.scala
>>> as we replace the original NNLS solver from scala with StandardNNLS
>>> Do you guys have some idea about the error. Is there a config file we
>>> need to edit to add the classpath? Even if we insert the added codes in
>>> ALS.scala and not create another file like StandardNNLS.scala, the inserted
>>> code is not recognized. It still gets an error regarding
>>> ClassNotFoundException
>>>
>>> However, when we run this on our local machine and not on the hadoop
>>> cluster, it is working. We don't know if the error is because we are using
>>> mvn to build custom spark or it has something to do with communicating to
>>> hadoop cluster.
>>>
>>> We would like to ask some ideas from you how to solve this problem. We
>>> can actually create another package not dependent to Apache Spark but this
>>> is so slow. As of now, we are still learning scala and spark. Using Apache
>>> spark utilities make the code faster. However, if we'll make another
>>> package not dependent to apache spark, we have to recode the utilities that
>>> are set private in Apache Spark. So, it is better to use Apache Spark and
>>> insert some code that we can use.
>>>
>>> Thanks,
>>>
>>> Alger
>>>
>>
>

Re: Custom Spark Error on Hadoop Cluster

Reply via email to