Re: Phoenix + Spark

Sergey Soldatov Wed, 26 Oct 2016 02:41:21 -0700

It seems that you need to use phoenix for CDH since it has some changes in
HBase API. In one of recent threads there were several links how to build
it.


Thanks,
Sergey

On Wed, Oct 26, 2016 at 2:31 AM, min zou <zoumin1...@gmail.com> wrote:

> Hi Sergey, i used the advice you gave me ,Then i got a error:
>  Exception in thread "main" java.lang.NoSuchMethodError:
> org.apache.hadoop.hbase.HTableDescriptor.setValue(
> Ljava/lang/String;Ljava/lang/String;)Lorg/apache/hadoop/
> hbase/HTableDescriptor;
> at org.apache.phoenix.query.ConnectionQueryServicesImpl.
> generateTableDescriptor(ConnectionQueryServicesImpl.java:756)
> at org.apache.phoenix.query.ConnectionQueryServicesImpl.
> ensureTableCreated(ConnectionQueryServicesImpl.java:1020)
> at org.apache.phoenix.query.ConnectionQueryServicesImpl.createTable(
> ConnectionQueryServicesImpl.java:1396)
> at org.apache.phoenix.schema.MetaDataClient.createTableInternal(
> MetaDataClient.java:2302)
> at org.apache.phoenix.schema.MetaDataClient.createTable(
> MetaDataClient.java:922)
> at org.apache.phoenix.compile.CreateTableCompiler$2.execute(
> CreateTableCompiler.java:194)
> at org.apache.phoenix.jdbc.PhoenixStatement$2.call(
> PhoenixStatement.java:343)
> at org.apache.phoenix.jdbc.PhoenixStatement$2.call(
> PhoenixStatement.java:331)
> at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
> at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(
> PhoenixStatement.java:329)
> at org.apache.phoenix.jdbc.PhoenixStatement.executeUpdate(
> PhoenixStatement.java:1421)
> at org.apache.phoenix.query.ConnectionQueryServicesImpl$13.call(
> ConnectionQueryServicesImpl.java:2353)
> at org.apache.phoenix.query.ConnectionQueryServicesImpl$13.call(
> ConnectionQueryServicesImpl.java:2300)
> at org.apache.phoenix.util.PhoenixContextExecutor.call(
> PhoenixContextExecutor.java:78)
> at org.apache.phoenix.query.ConnectionQueryServicesImpl.init(
> ConnectionQueryServicesImpl.java:2300)
> at org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(
> PhoenixDriver.java:231)
> at org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.createConnection(
> PhoenixEmbeddedDriver.java:144)
> at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:202)
> at java.sql.DriverManager.getConnection(DriverManager.java:664)
> at java.sql.DriverManager.getConnection(DriverManager.java:208)
> at org.apache.phoenix.mapreduce.util.ConnectionUtil.
> getConnection(ConnectionUtil.java:98)
> at org.apache.phoenix.mapreduce.util.ConnectionUtil.getInputConnection(
> ConnectionUtil.java:57)
> at org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(
> PhoenixInputFormat.java:114)
> at org.apache.phoenix.mapreduce.PhoenixInputFormat.getSplits(
> PhoenixInputFormat.java:81)
> at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:120)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at org.apache.phoenix.spark.PhoenixRDD.getPartitions(PhoenixRDD.scala:52)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(
> MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1940)
> at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:912)
> at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:910)
> at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:150)
> at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:111)
> at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
> at org.apache.spark.rdd.RDD.foreach(RDD.scala:910)
> at com.linkstec.bigdata.main.PhoenixTest$.main(PhoenixTest.scala:48)
> at com.linkstec.bigdata.main.PhoenixTest.main(PhoenixTest.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$
> deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
> thanks
>
>
> 2016-10-26 15:09 GMT+08:00 Sergey Soldatov <sergeysolda...@gmail.com>:
>
>> (1) You need only client jar (phoenix-xxxx-client.jar)
>> (2) set spark.executor.extraClassPath in the spark-defaults.conf to the
>> client jar
>> Hope that would help.
>>
>> Thanks,
>> Sergey
>>
>> On Tue, Oct 25, 2016 at 9:31 PM, min zou <zoumin1...@gmail.com> wrote:
>>
>>> Dear, i use spark to do data analysis,then save the result to Phonix.
>>> When i run the application on Intellij IDEA by local model, the apllication
>>> runs ok, but i run it by spark-submit(spark-submit --class
>>> com.bigdata.main.RealTimeMain --master yarn  --driver-memory 2G
>>> --executor-memory 2G --num-executors 5 /home/zt/rt-analyze-1.0-SNAPSHOT.jar)
>>> on my cluster, i get a error:Caused by: java.lang.ClassNotFoundException:
>>> Class org.apache.phoenix.mapreduce.PhoenixOutputFormat not found.
>>>
>>> Exception in thread "main" java.lang.RuntimeException:
>>> java.lang.ClassNotFoundException: Class 
>>> org.apache.phoenix.mapreduce.PhoenixOutputFormat
>>> not found    at org.apache.hadoop.conf.Configu
>>> ration.getClass(Configuration.java:2112)    at
>>> org.apache.hadoop.mapreduce.task.JobContextImpl.getOutputFor
>>> matClass(JobContextImpl.java:232)    at org.apache.spark.rdd.PairRDDFu
>>> nctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:971)    at
>>> org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:903)
>>>   at 
>>> org.apache.phoenix.spark.ProductRDDFunctions.saveToPhoenix(ProductRDDFunctions.scala:51)
>>>   at com.mypackage.save(DAOImpl.scala:41)    at
>>> com.mypackage.ProtoStreamingJob.execute(ProtoStreamingJob.scala:58)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)    at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>   at 
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>   at java.lang.reflect.Method.invoke(Method.java:606)    at
>>> com.mypackage.SparkApplication.sparkRun(SparkApplication.scala:95)
>>> at 
>>> com.mypackage.SparkApplication$delayedInit$body.apply(SparkApplication.scala:112)
>>>   at scala.Function0$class.apply$mcV$sp(Function0.scala:40)    at
>>> scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
>>>   at scala.App$$anonfun$main$1.apply(App.scala:71)    at
>>> scala.App$$anonfun$main$1.apply(App.scala:71)    at
>>> scala.collection.immutable.List.foreach(List.scala:318)    at
>>> scala.collection.generic.TraversableForwarder$class.foreach(
>>> TraversableForwarder.scala:32)    at scala.App$class.main(App.scala:71)
>>>   at com.mypackage.SparkApplication.main(SparkApplication.scala:15)
>>> at com.mypackage.ProtoStreamingJobRunner.main(ProtoStreamingJob.scala)
>>>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)    at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>   at 
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>   at java.lang.reflect.Method.invoke(Method.java:606)    at
>>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy
>>> $SparkSubmit$$runMain(SparkSubmit.scala:569)    at
>>> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
>>>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
>>>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
>>>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)Caused
>>> by: java.lang.ClassNotFoundException: Class
>>> org.apache.phoenix.mapreduce.PhoenixOutputFormat not found    at
>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2018)
>>>   at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2110)
>>>   ... 30 more
>>>
>>>
>>> Then i use spark-submit --jars(spark-submit --class
>>> com.bigdata.main.RealTimeMain --master yarn --jars
>>> /root/apache-phoenix-4.8.0-HBase-1.2-bin/phoenix-spark-4.8.0
>>> -HBase-1.2.jar,/root/apache-phoenix-4.8.0-HBase-1.2-bin/phoe
>>> nix-4.8.0-HBase-1.2-client.jar,/root/apache-phoenix-4.8.0-
>>> HBase-1.2-bin/phoenix-core-4.8.0-HBase-1.2.jar    --driver-memory 2G
>>> --executor-memory 2G --num-executors 5 /home/zm/rt-analyze-1.0-SNAPSHOT.jar)
>>> , i get the same error. My cluster is CDH5.7,phoenix4.8.0, Hbase1.2,
>>> spark1.6 . How can i solve the promble ? Please help me. thanks.
>>>
>>
>>
>

Re: Phoenix + Spark

Reply via email to