Thank you Vaclav,

I have just started today to write some code :) for MR job that will load data 
into HBase + Phoenix. 
Previously I wrote some application to load data using Phoenix JDBC (slow), but 
I also have experience with HBase so I can understand and write code to load 
data directly there.

If doing so, I'm also worry about:
- maintaining (some existing) Phoenix indexes (if any) - perhaps this still 
works in case the (same) coprocessors would trigger at insert time, but I 
cannot know how it works behind the scenes.
- having the Phoenix view around the HBase table would "solve" the above 
problem (so there's no index whatsoever) but would create a lot of other 
problems (my table has a limited number of common columns and the rest are too 
different from row to row - in total I have hundreds of possible columns)

So - to make things faster for me-  is there any good piece of code I can find 
on the internet about how to map my data types to Phoenix data types and use 
the results as regular HBase Bulk Load?

Regards,
  Constantin

-----Original Message-----
From: Vaclav Loffelmann [mailto:[email protected]] 
Sent: Tuesday, January 13, 2015 10:30 AM
To: [email protected]
Subject: Re: MapReduce bulk load into Phoenix table

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,
our daily usage is to import raw data directly to HBase, but mapped to Phoenix 
data types. And for querying we use Phoenix view on top of that HBase table.

Then you should hit bottleneck of HBase itself. It should be from 10 to 30+ 
times faster than your current solution. Depending on HW of course.

I'd prefer this solution for stream writes.

Vaclav

On 01/13/2015 10:12 AM, Ciureanu, Constantin (GfK) wrote:
> Hello all,
> 
> (Due to the slow speed of Phoenix JDBC – single machine ~ 1000-1500 
> rows /sec) I am also documenting myself about loading data into 
> Phoenix via MapReduce.
> 
> So far I understood that the Key + List<[Key,Value]> to be inserted 
> into HBase table is obtained via a “dummy” Phoenix connection – then 
> those rows are stored into HFiles (then after the MR job finishes it 
> is Bulk loading those HFiles normally into HBase).
> 
> My question: Is there any better / faster approach? I assume this 
> cannot reach the maximum speed to load data into Phoenix / HBase 
> table.
> 
> Also I would like to find a better / newer sample code than this
> one: 
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.phoenix/pho
> enix/4.0.0-incubating/org/apache/phoenix/mapreduce/CsvToKeyValueMapper
> .java#CsvToKeyValueMapper.loadPreUpsertProcessor%28org.apache.hadoop.c
> onf.Configuration%29
>
>  Thank you, Constantin
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJUtOWVAAoJEG3mD8Wtuk0WIwsIAI/P4DJ9fcVQlmwSCGbLjxsI
5gm2grAPe7kMXewc74GBKN56bAwi8vkg54pZW7ymp3hp1L9LlXa/iHhuUApwE24W
eZ3kArdhXbgK1KGYItjmGCGTypKM3HZ/8HlzljKMzaRsOkqDcsg0JdldeXYbZ7vW
MO58IBBjiyx8sGAN1x757ZimoUzcoDN/lMP9ypsKu9m9GmAEv87h7twMkkGLAl47
W9J9rjoCHDJqMlNZMy5gUBDdZWqtHYNWOsG0Q3s/rbwb4hTCsCwQiCBAjmZt7Nea
Wzgfr53WFeXWQ2LYFqqeWbbs5hdCJ3hfTew0gW4wpjzzsi5TocVcQow3cOTW3/E=
=dy7R
-----END PGP SIGNATURE-----

Reply via email to