Hi,

I am trying to do some operation on an Hbase table that is being populated
by Spark Streaming.

Now this is just Spark on Hbase as opposed to Spark on Hive -> view on
Hbase etc. I also have Phoenix view on this Hbase table.

This is sample code

scala>     val tableName = "marketDataHbase"
>     val conf = HBaseConfiguration.create()
conf: org.apache.hadoop.conf.Configuration = Configuration:
core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml,
yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml,
hbase-default.xml, hbase-site.xml
scala>     conf.set(TableInputFormat.INPUT_TABLE, tableName)
scala>         //create rdd
scala>
*val hBaseRDD = sc.newAPIHadoopRDD(conf,
classOf[TableInputFormat],classOf[org.apache.hadoop.hbase.io
<http://hbase.io>.ImmutableBytesWritable],classOf[org.apache.hadoop.hbase.client.Result])*hBaseRDD:
org.apache.spark.rdd.RDD[(org.apache.hadoop.hbase.io.ImmutableBytesWritable,
org.apache.hadoop.hbase.client.Result)] = NewHadoopRDD[4] at
newAPIHadoopRDD at <console>:64
scala> hBaseRDD.count
res11: Long = 22272

scala>     // transform (ImmutableBytesWritable, Result) tuples into an RDD
of Result's
scala> val resultRDD = hBaseRDD.map(tuple => tuple._2)
resultRDD: org.apache.spark.rdd.RDD[org.apache.hadoop.hbase.client.Result]
= MapPartitionsRDD[8] at map at <console>:41

scala>  // transform into an RDD of (RowKey, ColumnValue)s  the RowKey has
the time removed

scala> val keyValueRDD = resultRDD.map(result =>
(Bytes.toString(result.getRow()).split(" ")(0),
Bytes.toString(result.value)))
keyValueRDD: org.apache.spark.rdd.RDD[(String, String)] =
MapPartitionsRDD[9] at map at <console>:43

scala> keyValueRDD.take(2).foreach(kv => println(kv))
(000055e2-63f1-4def-b625-e73f0ac36271,43.89760813529593664528)
(000151e9-ff27-493d-a5ca-288507d92f95,57.68882040742382868990)

OK above I am only getting the rowkey (UUID above) and the last
attribute (price).
However, I have the rowkey and 3 more columns there in Hbase table!

scan 'marketDataHbase', "LIMIT" => 1
ROW                                                   COLUMN+CELL
 000055e2-63f1-4def-b625-e73f0ac36271
column=price_info:price, timestamp=1476133232864,
value=43.89760813529593664528
 000055e2-63f1-4def-b625-e73f0ac36271
column=price_info:ticker, timestamp=1476133232864, value=S08
 000055e2-63f1-4def-b625-e73f0ac36271
column=price_info:timecreated, timestamp=1476133232864,
value=2016-10-10T17:12:22
1 row(s) in 0.0100 seconds
So how can I get the other columns?

Thanks


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Reply via email to