https://github.com/tmalaska/SparkOnHBase/blob/master/src/main/scala/org/apache/hadoop/hbase/spark/example/hbasecontext/HBaseBulkPutExample.scala
On Sat, Jan 28, 2017 at 9:17 AM, Chetan Khatri <chetan.opensou...@gmail.com> wrote: > Adding to @Ted Check Bulk Put Example - https://github.com/tmalaska/ > SparkOnHBase/blob/master/src/main/scala/org/apache/hadoop/ > hbase/spark/example/hbasecontext/HBaseBulkPutExampleFromFile.scala > > On Sat, Jan 28, 2017 at 9:11 AM, Ted Yu <yuzhih...@gmail.com> wrote: > >> Have you looked at hbase-spark module (currently in master branch) ? >> >> See hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/exa >> mple/datasources/AvroSource.scala >> and hbase-spark/src/test/scala/org/apache/hadoop/hbase/spark/Def >> aultSourceSuite.scala >> for examples. >> >> There may be other options. >> >> FYI >> >> On Fri, Jan 27, 2017 at 7:28 PM, jeff saremi <jeffsar...@hotmail.com> >> wrote: >> >> > Hi >> > I'm seeking some pointers/guidance on what we could do to insert >> billions >> > of records that we already have in avro files in hadoop into HBase. >> > >> > I read some articles online and one of them recommended using HFile >> > format. I took a cursory look at the documentation for that. Given the >> > complexity of that I think that may be the last resort we want to >> pursue. >> > Unless some library is out there that easily helps us write our files >> into >> > that format. I didn't see any. >> > Assuming that the Hbase native client may be our best bet, is there any >> > advice around pre-paritioning our records or such techniques that we >> could >> > use? >> > thanks >> > >> > Jeff >> > >> > >