Hi,
I am using newAPIHadoopRDD to load RDD from hbase (using pyspark running as
yarn-client) - pretty much the standard case demonstrated in the
hbase_inputformat.py from examples... the thing is the when trying the very
same code on spark 1.2 I am getting the error bellow which based on similar
cases on another forums suggest incompatibility between MR1 and MR2.
why would this now start happening? is that due to some changes in resolving
the classpath which now picks up MR2 jars first while before it was MR1?
is there any workaround for this?
thanks,Antony.
the error:
py4j.protocol.Py4JJavaError: An error occurred while calling
z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD.:
java.lang.IncompatibleClassChangeError: Found interface
org.apache.hadoop.mapreduce.JobContext, but class was expected at
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:158)
at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:98) at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at
scala.Option.getOrElse(Option.scala:120) at
org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at
org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at
scala.Option.getOrElse(Option.scala:120) at
org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at
org.apache.spark.rdd.RDD.take(RDD.scala:1060) at
org.apache.spark.rdd.RDD.first(RDD.scala:1093) at
org.apache.spark.api.python.SerDeUtil$.pairRDDToPython(SerDeUtil.scala:202) at
org.apache.spark.api.python.PythonRDD$.newAPIHadoopRDD(PythonRDD.scala:500) at
org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD(PythonRDD.scala) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606) at
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at
py4j.Gateway.invoke(Gateway.java:259) at
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at
py4j.commands.CallCommand.execute(CallCommand.java:79) at
py4j.GatewayConnection.run(GatewayConnection.java:207) at
java.lang.Thread.run(Thread.java:745)