Re: Using Phoenix as an InputFormat

Prashant Kommireddi Mon, 14 Apr 2014 15:10:24 -0700

>> Though these classes are currently within the Phoenix-Pig module, most
of the code can be reused for a MR job


+1. These are hadoop specific implementations and not tied to Pig at all.
Could definitely be moved to hadoop module if this needs to be reused by MR
job (which it should)


On Fri, Apr 4, 2014 at 8:37 PM, Ravi Kiran <[email protected]>wrote:

> Hi ,
>    I will definitely drop a mail to you once the code is available which
> definitely isn't far away.
>
> Regards
> Ravi
>
>
> On Sat, Apr 5, 2014 at 3:02 AM, Localhost shell <
> [email protected]> wrote:
>
>> Hey Ravi,
>>
>> Do you have any rough idea on when will PhoenixPigLoader be available for
>> use?
>>
>>
>>
>>
>> On Fri, Apr 4, 2014 at 1:52 PM, Andrew <[email protected]> wrote:
>>
>>> HI Ravi,
>>>
>>> That's helpful, thank you.  Are these in the Github repo yet, so I can
>>> have a look to get an idea?  (I don't see anything in
>>> phoenix-pig/src/main/java/org/apache/phoenix/pig/hadoop)
>>>
>>> Andrew.
>>>
>>>
>>> On 04/04/2014 15:54, Ravi Kiran wrote:
>>>
>>>> Hi Andrew,
>>>>
>>>>    As part of a custom Pig Loader , we are coming up with a
>>>> PhoenixInputFormat, PhoenixRecordReader.. Though these classes are
>>>> currently within the Phoenix-Pig module, most of the code can be reused for
>>>> a MR job.
>>>>
>>>> Regards
>>>> Ravi
>>>>
>>>>
>>>>
>>>> On Fri, Apr 4, 2014 at 10:38 AM, alex kamil <[email protected]<mailto:
>>>> [email protected]>> wrote:
>>>>
>>>>     you can create a custom function (for example
>>>>     http://phoenix-hbase.blogspot.com/2013/04/how-to-add-your-
>>>> own-built-in-function.html
>>>>     )
>>>>
>>>>
>>>>     On Fri, Apr 4, 2014 at 12:47 AM, Andrew <[email protected]
>>>>     <mailto:[email protected]>> wrote:
>>>>
>>>>         I am considering using Phoenix, but I know that I will want to
>>>>         transform
>>>>         my data via MapReduce, e.g. UPSERT some core data, then go
>>>>         back over the
>>>>         data set and "fill in" some additional columns (appropriately
>>>>         stored in
>>>>         additional column groups).
>>>>
>>>>         I think all I need to do is implement an InputFormat
>>>>         implementation that
>>>>         takes a table name (or more generally /select * from table
>>>>         where .../).
>>>>         But in order to define splits, I need to somehow discover key
>>>>         ranges so
>>>>         that I can issue a series of contiguous range scans.
>>>>
>>>>         Can you suggest how I might go about this in a general way...
>>>>         if I get
>>>>         this right then I'll contribute the code.  Else I will need to
>>>> use
>>>>         external knowledge of my specific table data to partition the
>>>>         task.  If
>>>>         Phoenix had a LIMIT with a SKIP option plus a table ROWCOUNT,
>>>>         then that
>>>>         would also achieve the goal.  Or is there some way to
>>>>         implement the
>>>>         InputFormat via a native HBase API call perhaps?
>>>>
>>>>         Andrew.
>>>>
>>>>         (MongoDB's InputFormat implementation, calls an internal
>>>>         function on the
>>>>         server to do this:
>>>>         https://github.com/mongodb/mongo-hadoop/blob/master/core/
>>>> src/main/java/com/mongodb/hadoop/splitter/StandaloneMongoSplitter.java)
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Re: Using Phoenix as an InputFormat

Reply via email to