Re: spark 1.1.0 save data to hdfs failed

Sean Owen Sat, 24 Jan 2015 01:47:48 -0800

Hadoop 2's artifact is hadoop-common rather than hadoop-core but I
assume you looked for that too. To answer your earlier question, no,
Spark works with both Hadoop 1 and Hadoop 2 and is source-compatible
with both. It can't be binary-compatible with both at once though. The
code you cite is correct; there is no bug there.


Your first error definitely indicates you have the wrong version of
Hadoop on the client side. It's not matching your HDFS version. And
the second suggests you are mixing code compiled for different
versions of Hadoop. I think you need to check what version of Hadoop
your Spark is compiled for. For example I saw a reference to CDH 5.2
which is Hadoop 2.5, but then you're showing that you are running an
old Hadoop 1.x HDFS? there seem to be a number of possible
incompatibilities here.

On Fri, Jan 23, 2015 at 11:38 PM, ey-chih chow <eyc...@hotmail.com> wrote:
> Sorry I still did not quiet get your resolution.  In my jar, there are
> following three related classes:
>
> org/apache/hadoop/mapreduce/task/TaskAttemptContextImpl.class
> org/apache/hadoop/mapreduce/task/TaskAttemptContextImpl$DummyReporter.class
> org/apache/hadoop/mapreduce/TaskAttemptContext.class
>
> I think the first two come from hadoop2 and the third from hadoop1.  I would
> like to get rid of the first two.  I checked my source code.  It does have a
> place using the class (or interface in hadoop2) TaskAttemptContext.
> Do you mean I make a separate jar for this portion of code and built with
> hadoop1 to get rid of dependency?  An alternative way is to  modify the code
> in SparkHadoopMapReduceUtil.scala and put it into my own source code to
> bypass the problem.  Any comment on this?  Thanks.
>
> ________________________________
> From: eyc...@hotmail.com
> To: so...@cloudera.com
> CC: user@spark.apache.org
> Subject: RE: spark 1.1.0 save data to hdfs failed
> Date: Fri, 23 Jan 2015 11:17:36 -0800
>
>
> Thanks.  I looked at the dependency tree.  I did not see any dependent jar
> of hadoop-core from hadoop2.  However the jar built from maven has the
> class:
>
>  org/apache/hadoop/mapreduce/task/TaskAttemptContextImpl.class
>
> Do you know why?
>
>
>
>
> ________________________________
> Date: Fri, 23 Jan 2015 17:01:48 +0000
> Subject: RE: spark 1.1.0 save data to hdfs failed
> From: so...@cloudera.com
> To: eyc...@hotmail.com
>
> Are you receiving my replies? I have suggested a resolution. Look at the
> dependency tree next.
>
> On Jan 23, 2015 2:43 PM, "ey-chih chow" <eyc...@hotmail.com> wrote:
>
> I looked into the source code of SparkHadoopMapReduceUtil.scala. I think it
> is broken in the following code:
>
>   def newTaskAttemptContext(conf: Configuration, attemptId: TaskAttemptID):
> TaskAttemptContext = {
>     val klass = firstAvailableClass(
>         "org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl",  //
> hadoop2, hadoop2-yarn
>         "org.apache.hadoop.mapreduce.TaskAttemptContext")           //
> hadoop1
>     val ctor = klass.getDeclaredConstructor(classOf[Configuration],
> classOf[TaskAttemptID])
>     ctor.newInstance(conf, attemptId).asInstanceOf[TaskAttemptContext]
>   }
>
> In other words, it is related to hadoop2, hadoop2-yarn, and hadoop1.  Any
> suggestion how to resolve it?
>
> Thanks.
>
>
>
>> From: so...@cloudera.com
>> Date: Fri, 23 Jan 2015 14:01:45 +0000
>> Subject: Re: spark 1.1.0 save data to hdfs failed
>> To: eyc...@hotmail.com
>> CC: user@spark.apache.org
>>
>> These are all definitely symptoms of mixing incompatible versions of
>> libraries.
>>
>> I'm not suggesting you haven't excluded Spark / Hadoop, but, this is
>> not the only way Hadoop deps get into your app. See my suggestion
>> about investigating the dependency tree.
>>
>> On Fri, Jan 23, 2015 at 1:53 PM, ey-chih chow <eyc...@hotmail.com> wrote:
>> > Thanks. But I think I already mark all the Spark and Hadoop reps as
>> > provided. Why the cluster's version is not used?
>> >
>> > Any way, as I mentioned in the previous message, after changing the
>> > hadoop-client to version 1.2.1 in my maven deps, I already pass the
>> > exception and go to another one as indicated below. Any suggestion on
>> > this?
>> >
>> > =================================
>> >
>> > Exception in thread "main" java.lang.reflect.InvocationTargetException
>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > at
>> >
>> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> > at
>> >
>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> > at java.lang.reflect.Method.invoke(Method.java:606)
>> > at
>> >
>> > org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:40)
>> > at
>> > org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
>> > Caused by: java.lang.IncompatibleClassChangeError: Implementing class
>> > at java.lang.ClassLoader.defineClass1(Native Method)
>> > at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
>> > at
>> > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>> > at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
>> > at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
>> > at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>> > at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>> > at java.security.AccessController.doPrivileged(Native Method)
>> > at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>> > at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>> > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>> > at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>> > at java.lang.Class.forName0(Native Method)
>> > at java.lang.Class.forName(Class.java:191)
>> > at
>> >
>> > org.apache.hadoop.mapreduce.SparkHadoopMapReduceUtil$class.firstAvailableClass(SparkHadoopMapReduceUtil.scala:73)
>> > at
>> >
>> > org.apache.hadoop.mapreduce.SparkHadoopMapReduceUtil$class.newTaskAttemptContext(SparkHadoopMapReduceUtil.scala:35)
>> > at
>> >
>> > org.apache.spark.rdd.PairRDDFunctions.newTaskAttemptContext(PairRDDFunctions.scala:53)
>> > at
>> >
>> > org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:932)
>> > at
>> >
>> > org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:832)
>> > at com.crowdstar.etl.ParseAndClean$.main(ParseAndClean.scala:103)
>> > at com.crowdstar.etl.ParseAndClean.main(ParseAndClean.scala)
>> >
>> > ... 6 more
>> >

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: spark 1.1.0 save data to hdfs failed

Reply via email to