HI Ravi,

That's helpful, thank you. Are these in the Github repo yet, so I can have a look to get an idea? (I don't see anything in phoenix-pig/src/main/java/org/apache/phoenix/pig/hadoop)

Andrew.

On 04/04/2014 15:54, Ravi Kiran wrote:
Hi Andrew,

As part of a custom Pig Loader , we are coming up with a PhoenixInputFormat, PhoenixRecordReader.. Though these classes are currently within the Phoenix-Pig module, most of the code can be reused for a MR job.

Regards
Ravi



On Fri, Apr 4, 2014 at 10:38 AM, alex kamil <[email protected] <mailto:[email protected]>> wrote:

    you can create a custom function (for example
    
http://phoenix-hbase.blogspot.com/2013/04/how-to-add-your-own-built-in-function.html
    )


    On Fri, Apr 4, 2014 at 12:47 AM, Andrew <[email protected]
    <mailto:[email protected]>> wrote:

        I am considering using Phoenix, but I know that I will want to
        transform
        my data via MapReduce, e.g. UPSERT some core data, then go
        back over the
        data set and "fill in" some additional columns (appropriately
        stored in
        additional column groups).

        I think all I need to do is implement an InputFormat
        implementation that
        takes a table name (or more generally /select * from table
        where .../).
        But in order to define splits, I need to somehow discover key
        ranges so
        that I can issue a series of contiguous range scans.

        Can you suggest how I might go about this in a general way...
        if I get
        this right then I'll contribute the code.  Else I will need to use
        external knowledge of my specific table data to partition the
        task.  If
        Phoenix had a LIMIT with a SKIP option plus a table ROWCOUNT,
        then that
        would also achieve the goal.  Or is there some way to
        implement the
        InputFormat via a native HBase API call perhaps?

        Andrew.

        (MongoDB's InputFormat implementation, calls an internal
        function on the
        server to do this:
        
https://github.com/mongodb/mongo-hadoop/blob/master/core/src/main/java/com/mongodb/hadoop/splitter/StandaloneMongoSplitter.java)




Reply via email to