https://github.com/tmalaska/SparkOnHBase/blob/master/src/main/scala/org/apache/hadoop/hbase/spark/example/hbasecontext/HBaseBulkPutExample.scala

On Sat, Jan 28, 2017 at 9:17 AM, Chetan Khatri <chetan.opensou...@gmail.com>
wrote:

> Adding to @Ted Check Bulk Put Example - https://github.com/tmalaska/
> SparkOnHBase/blob/master/src/main/scala/org/apache/hadoop/
> hbase/spark/example/hbasecontext/HBaseBulkPutExampleFromFile.scala
>
> On Sat, Jan 28, 2017 at 9:11 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> Have you looked at hbase-spark module (currently in master branch) ?
>>
>> See hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/exa
>> mple/datasources/AvroSource.scala
>> and hbase-spark/src/test/scala/org/apache/hadoop/hbase/spark/Def
>> aultSourceSuite.scala
>> for examples.
>>
>> There may be other options.
>>
>> FYI
>>
>> On Fri, Jan 27, 2017 at 7:28 PM, jeff saremi <jeffsar...@hotmail.com>
>> wrote:
>>
>> > Hi
>> > I'm seeking some pointers/guidance on what we could do to insert
>> billions
>> > of records that we already have in avro files in hadoop into HBase.
>> >
>> > I read some articles online and one of them recommended using HFile
>> > format. I took a cursory look at the documentation for that. Given the
>> > complexity of that I think that may be the last resort we want to
>> pursue.
>> > Unless some library is out there that easily helps us write our files
>> into
>> > that format. I didn't see any.
>> > Assuming that the Hbase native client may be our best bet, is there any
>> > advice around pre-paritioning our records or such techniques that we
>> could
>> > use?
>> > thanks
>> >
>> > Jeff
>> >
>>
>
>

Reply via email to