Re: Spark Streaming into HBase

Ted Yu Wed, 03 Sep 2014 14:37:30 -0700

Adding back user@

I am not familiar with the NotSerializableException. Can you show the full
stack trace ?


See SPARK-1297 for changes you need to make so that Spark works with hbase
0.98

Cheers


On Wed, Sep 3, 2014 at 2:33 PM, Kevin Peng <kpe...@gmail.com> wrote:

> Ted,
>
> The hbase-site.xml is in the classpath (had worse issues before... until I
> figured that it wasn't in the path).
>
> I get the following error in the spark-shell:
> org.apache.spark.SparkException: Job aborted due to stage failure: Task
> not serializable: java.io.NotSerializableException:
> org.apache.spark.streaming.StreamingContext
>         at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.sc
> ...
>
> I also double checked the hbase table, just in case, and nothing new is
> written in there.
>
> I am using hbase version: 0.98.1-cdh5.1.0 the default one with the
> CDH5.1.0 distro.
>
> Thank you for the help.
>
>
> On Wed, Sep 3, 2014 at 2:09 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> Is hbase-site.xml in the classpath ?
>> Do you observe any exception from the code below or in region server log ?
>>
>> Which hbase release are you using ?
>>
>>
>> On Wed, Sep 3, 2014 at 2:05 PM, kpeng1 <kpe...@gmail.com> wrote:
>>
>>> I have been trying to understand how spark streaming and hbase connect,
>>> but
>>> have not been successful. What I am trying to do is given a spark stream,
>>> process that stream and store the results in an hbase table. So far this
>>> is
>>> what I have:
>>>
>>> import org.apache.spark.SparkConf
>>> import org.apache.spark.streaming.{Seconds, StreamingContext}
>>> import org.apache.spark.streaming.StreamingContext._
>>> import org.apache.spark.storage.StorageLevel
>>> import org.apache.hadoop.hbase.HBaseConfiguration
>>> import org.apache.hadoop.hbase.client.{HBaseAdmin,HTable,Put,Get}
>>> import org.apache.hadoop.hbase.util.Bytes
>>>
>>> def blah(row: Array[String]) {
>>>   val hConf = new HBaseConfiguration()
>>>   val hTable = new HTable(hConf, "table")
>>>   val thePut = new Put(Bytes.toBytes(row(0)))
>>>   thePut.add(Bytes.toBytes("cf"), Bytes.toBytes(row(0)),
>>> Bytes.toBytes(row(0)))
>>>   hTable.put(thePut)
>>> }
>>>
>>> val ssc = new StreamingContext(sc, Seconds(1))
>>> val lines = ssc.socketTextStream("localhost", 9999,
>>> StorageLevel.MEMORY_AND_DISK_SER)
>>> val words = lines.map(_.split(","))
>>> val store = words.foreachRDD(rdd => rdd.foreach(blah))
>>> ssc.start()
>>>
>>> I am currently running the above code in spark-shell. I am not sure what
>>> I
>>> am doing wrong.
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-into-HBase-tp13378.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>
>

Re: Spark Streaming into HBase

Reply via email to