Dmitry Goldenberg created SPARK-7154:
----------------------------------------

             Summary: Spark distro appears to be pulling in incorrect protobuf 
classes
                 Key: SPARK-7154
                 URL: https://issues.apache.org/jira/browse/SPARK-7154
             Project: Spark
          Issue Type: Bug
          Components: Build
    Affects Versions: 1.3.1
            Reporter: Dmitry Goldenberg


If you download Spark via the site: 
https://spark.apache.org/downloads.html,
for example I chose:
http://www.apache.org/dyn/closer.cgi/spark/spark-1.3.1/spark-1.3.1-bin-hadoop2.4.tgz

then you may see incompatibility with other libraries due to incorrect protobuf 
classes.

I'm seeing such a case in my Spark Streaming job which attempts to use Apache 
Phoenix to update records in HBase. The job is built with with protobuf 2.5.0 
dependency. However, at runtime Spark's classes take precedence in class 
loading and that is causing exceptions such as the following:

java.util.concurrent.ExecutionException: java.lang.IllegalAccessError: 
com/google/protobuf/HBaseZeroCopyByteString
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:192)
        at 
org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1620)
        at 
org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1577)
        at 
org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1007)
        at 
org.apache.phoenix.query.ConnectionQueryServicesImpl.getTable(ConnectionQueryServicesImpl.java:1257)
        at 
org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:350)
        at 
org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:311)
        at 
org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:307)
        at 
org.apache.phoenix.compile.FromCompiler$BaseColumnResolver.createTableRef(FromCompiler.java:333)
        at 
org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.<init>(FromCompiler.java:237)
        at 
org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.<init>(FromCompiler.java:231)
        at 
org.apache.phoenix.compile.FromCompiler.getResolverForMutation(FromCompiler.java:207)
        at 
org.apache.phoenix.compile.UpsertCompiler.compile(UpsertCompiler.java:248)
        at 
org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:503)
        at 
org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:494)
        at 
org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:295)
        at 
org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:288)
        at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
        at 
org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:287)
        at 
org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:219)
        at 
org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:174)
        at 
org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:179)
        at 
com.kona.core.upload.persistence.hdfshbase.HUploadWorkqueueHelper.updateUploadWorkqueueEntry(HUploadWorkqueueHelper.java:139)
        at 
com.kona.core.upload.persistence.hdfshbase.HdfsHbaseUploadPersistenceProvider.updateUploadWorkqueueEntry(HdfsHbaseUploadPersistenceProvider.java:144)
        at 
com.kona.pipeline.sparkplug.error.UploadEntryErrorHandlerImpl.onError(UploadEntryErrorHandlerImpl.java:62)
        at 
com.kona.pipeline.sparkplug.pipeline.KonaPipelineImpl.processError(KonaPipelineImpl.java:305)
        at 
com.kona.pipeline.sparkplug.pipeline.KonaPipelineImpl.processPipelineDocument(KonaPipelineImpl.java:208)
        at 
com.kona.pipeline.sparkplug.runner.KonaPipelineRunnerImpl.notifyItemReceived(KonaPipelineRunnerImpl.java:79)
        at 
com.kona.pipeline.streaming.spark.ProcessPartitionFunction.call(ProcessPartitionFunction.java:83)
        at 
com.kona.pipeline.streaming.spark.ProcessPartitionFunction.call(ProcessPartitionFunction.java:25)
        at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:198)
        at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:198)
        at 
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:806)
        at 
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:806)
        at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1497)
        at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1497)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
        at org.apache.spark.scheduler.Task.run(Task.scala:64)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalAccessError: 
com/google/protobuf/HBaseZeroCopyByteString
        at 
org.apache.phoenix.query.ConnectionQueryServicesImpl$7.call(ConnectionQueryServicesImpl.java:1265)
        at 
org.apache.phoenix.query.ConnectionQueryServicesImpl$7.call(ConnectionQueryServicesImpl.java:1258)
        at org.apache.hadoop.hbase.client.HTable$17.call(HTable.java:1608)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)

If you look at the protobuf classes inside the Spark assembly jar, they do not 
match (use the cmp command) the classes in the stock protobuf 2.5.0 jar:

BoundedByteString$1.class
BoundedByteString$BoundedByteIterator.class
BoundedByteString.class
ByteString$1.class
ByteString$ByteIterator.class
ByteString$CodedBuilder.class
ByteString$Output.class
ByteString.class
CodedInputStream.class
CodedOutputStream$OutOfSpaceException.class
CodedOutputStream.class
LiteralByteString$1.class
LiteralByteString$LiteralByteIterator.class
LiteralByteString.class

All of these are dependency classes for HBaseZeroCopyByteString and they're 
incompatible which explains the java.lang.IllegalAccessError.

What's not yet clear to me is how they can be wrong if the Spark pom specifies 
2.5.0:

    <profile>
      <id>hadoop-2.4</id>
      <properties>
        <hadoop.version>2.4.0</hadoop.version>
        <protobuf.version>2.5.0</protobuf.version>
        <jets3t.version>0.9.3</jets3t.version>
        <hbase.version>0.98.7-hadoop2</hbase.version>
        <commons.math3.version>3.1.1</commons.math3.version>
        <avro.mapred.classifier>hadoop2</avro.mapred.classifier>
        <codehaus.jackson.version>1.9.13</codehaus.jackson.version>
      </properties>
    </profile>

This looks correct and in theory should override the 
<protobuf.version>2.4.1</protobuf.version> specified higher up in the parent 
pom (https://github.com/apache/spark/blob/master/pom.xml).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to