Re: spark 1.2 defaults to MR1 class when calling newAPIHadoopRDD

Antony Mayi Wed, 07 Jan 2015 05:50:13 -0800

this is official cloudera compiled stack cdh 5.3.0 - nothing has been done by 
me and I presume they are pretty good in building it so I still suspect it now 
gets the classpath resolved in different way?
thx,Antony.


     On Wednesday, 7 January 2015, 18:55, Sean Owen <so...@cloudera.com> wrote:
   
 

 Problems like this are always due to having code compiled for Hadoop 1.x run 
against Hadoop 2.x, or vice versa. Here, you compiled for 1.x but at runtime 
Hadoop 2.x is used.
A common cause is actually bundling Spark / Hadoop classes with your app, when 
the app should just use the Spark / Hadoop provided by the cluster. It could 
also be that you're pairing Spark compiled for Hadoop 1.x with a 2.x cluster.
On Wed, Jan 7, 2015 at 9:38 AM, Antony Mayi <antonym...@yahoo.com.invalid> 
wrote:

Hi,
I am using newAPIHadoopRDD to load RDD from hbase (using pyspark running as 
yarn-client) - pretty much the standard case demonstrated in the 
hbase_inputformat.py from examples... the thing is the when trying the very 
same code on spark 1.2 I am getting the error bellow which based on similar 
cases on another forums suggest incompatibility between MR1 and MR2.
why would this now start happening? is that due to some changes in resolving 
the classpath which now picks up MR2 jars first while before it was MR1?
is there any workaround for this?
thanks,Antony.
the error:
py4j.protocol.Py4JJavaError: An error occurred while calling 
z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD.: 
java.lang.IncompatibleClassChangeError: Found interface 
org.apache.hadoop.mapreduce.JobContext, but class was expected at 
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:158)
 at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:98) at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at 
scala.Option.getOrElse(Option.scala:120) at 
org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at 
org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at 
scala.Option.getOrElse(Option.scala:120) at 
org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at 
org.apache.spark.rdd.RDD.take(RDD.scala:1060) at 
org.apache.spark.rdd.RDD.first(RDD.scala:1093) at 
org.apache.spark.api.python.SerDeUtil$.pairRDDToPython(SerDeUtil.scala:202) at 
org.apache.spark.api.python.PythonRDD$.newAPIHadoopRDD(PythonRDD.scala:500) at 
org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD(PythonRDD.scala) at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606) at 
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at 
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at 
py4j.Gateway.invoke(Gateway.java:259) at 
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at 
py4j.commands.CallCommand.execute(CallCommand.java:79) at 
py4j.GatewayConnection.run(GatewayConnection.java:207) at 
java.lang.Thread.run(Thread.java:745)

Re: spark 1.2 defaults to MR1 class when calling newAPIHadoopRDD

Reply via email to