Re: MapReduce bulk load into Phoenix table

Nick Dimiduk Tue, 13 Jan 2015 10:52:26 -0800

On Tue, Jan 13, 2015 at 1:29 AM, [email protected] <
[email protected]> wrote:


> As far as I know, bulk loading into phoenix or hbase may be affected by
> several conditions, like wal enabled or numbers of split regions.
>

Bulkloading in HBase does not go through the WAL, it's using the
HFileOutputFormat to write HFiles directly. Region splits will have some
impact on bulkload, but not in the same way as it does with online writes.

I agree with James -- it seems your host is very underpowered or your
underlying cluster installation is not configured correctly. Please
consider profiling the individual steps in isolation so as to better
identify the bottleneck.

>
> *From:* Ciureanu, Constantin (GfK) <[email protected]>
> *Date:* 2015-01-13 17:12
> *To:* [email protected]
> *Subject:* MapReduce bulk load into Phoenix table
>
> Hello all,
>
>
>
> (Due to the slow speed of Phoenix JDBC – single machine ~ 1000-1500 rows
> /sec) I am also documenting myself about loading data into Phoenix via
> MapReduce.
>
>
>
> So far I understood that the Key + List<[Key,Value]> to be inserted into
> HBase table is obtained via a “dummy” Phoenix connection – then those rows
> are stored into HFiles (then after the MR job finishes it is Bulk loading
> those HFiles normally into HBase).
>
>
>
> My question: Is there any better / faster approach? I assume this cannot
> reach the maximum speed to load data into Phoenix / HBase table.
>
>
>
> Also I would like to find a better / newer sample code than this one:
>
>
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.phoenix/phoenix/4.0.0-incubating/org/apache/phoenix/mapreduce/CsvToKeyValueMapper.java#CsvToKeyValueMapper.loadPreUpsertProcessor%28org.apache.hadoop.conf.Configuration%29
>
>
>
> Thank you,
>
>    Constantin
>
>

Re: MapReduce bulk load into Phoenix table

Reply via email to