Spark 1.1.0 and HBase: Snappy UnsatisfiedLinkError

Pietro Gentile Tue, 25 Nov 2014 22:09:43 -0800

Hi everyone,

I deployed Spark 1.1.0 and I m trying to use it with spark-job-server 0.4.0 
(https://github.com/ooyala/spark-jobserver).
I previously used Spark 1.0.2 and had no problems with it. I want to use the 
newer version of Spark (and Spark SQL) to create the SchemaRDD programmatically.


The CLASSPATH variable was properly setted because the following code works 
perfectly (from https://spark.apache.org/docs/1.1.0/sql-programming-guide.html 
<https://spark.apache.org/docs/1.1.0/sql-programming-guide.html> but with input 
form base table).
 
But when I try to put this in the override def runJob(sc:SparkContext, 
jobConfig: Config): Any = ??? method, this not work. The exception is:

java.lang.UnsatisfiedLinkError: 
org.xerial.snappy.SnappyNative.maxCompressedLength(I)I
        at org.xerial.snappy.SnappyNative.maxCompressedLength(Native Method)
        at org.xerial.snappy.Snappy.maxCompressedLength(Snappy.java:320)
        at 
org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:79)
        at 
org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:125)
        at 
org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:207)
        at 
org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:83)
        at 
org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:68)
        at 
org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:36)
        at 
org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)
        at 
org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
        at org.apache.spark.SparkContext.broadcast(SparkContext.scala:809)
        at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:829)
        at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:769)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:772)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:771)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:771)
        at 
org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:753)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1360)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
        at akka.actor.ActorCell.invoke(ActorCell.scala:456)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
        at akka.dispatch.Mailbox.run(Mailbox.scala:219)
        at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

This exception occurs at the line  "val peopleRows = new NewHadoopRDD” when try 
to read rows from HBase (0.98). I execute this code in both in Scala and Java. 

Any ideas?? From what could it depend?



CODE

// sc is an existing SparkContext.
val sqlContext = new org.apache.spark.sql.SQLContext(sc)

// Create an RDD
val people = sc.textFile("examples/src/main/resources/people.txt")

// The schema is encoded in a string
val schemaString = "name age"

// Import Spark SQL data types and Row.
import org.apache.spark.sql._

// Generate the schema based on the string of schema
val schema =
  StructType(
    schemaString.split(" ").map(fieldName => StructField(fieldName, StringType, 
true)))

val conf = HBaseConfiguration.create()
conf.set(TableInputFormat.INPUT_TABLE, “people")

val peopleRows = new NewHadoopRDD(sc,
  classOf[TableInputFormat],
  classOf[ImmutableBytesWritable],
  classOf[Result])


// Convert records of the RDD (people) to Rows.
val rowRDD = peopleRows.map // create Rows (name,age)

// Apply the schema to the RDD.
val peopleSchemaRDD = sqlContext.applySchema(rowRDD, schema)

// Register the SchemaRDD as a table.
peopleSchemaRDD.registerTempTable("people")

// SQL statements can be run by using the sql methods provided by sqlContext.
val results = sqlContext.sql("SELECT name FROM people")

// The results of SQL queries are SchemaRDDs and support all the normal RDD 
operations.
// The columns of a row in the result can be accessed by ordinal.
results.map(t => "Name: " + t(0)).collect().foreach(println)

best Regards,
Pietro.

Spark 1.1.0 and HBase: Snappy UnsatisfiedLinkError

Reply via email to