HI Ravi,
That's helpful, thank you. Are these in the Github repo yet, so I can
have a look to get an idea? (I don't see anything in
phoenix-pig/src/main/java/org/apache/phoenix/pig/hadoop)
Andrew.
On 04/04/2014 15:54, Ravi Kiran wrote:
Hi Andrew,
As part of a custom Pig Loader , we are coming up with a
PhoenixInputFormat, PhoenixRecordReader.. Though these classes are
currently within the Phoenix-Pig module, most of the code can be
reused for a MR job.
Regards
Ravi
On Fri, Apr 4, 2014 at 10:38 AM, alex kamil <[email protected]
<mailto:[email protected]>> wrote:
you can create a custom function (for example
http://phoenix-hbase.blogspot.com/2013/04/how-to-add-your-own-built-in-function.html
)
On Fri, Apr 4, 2014 at 12:47 AM, Andrew <[email protected]
<mailto:[email protected]>> wrote:
I am considering using Phoenix, but I know that I will want to
transform
my data via MapReduce, e.g. UPSERT some core data, then go
back over the
data set and "fill in" some additional columns (appropriately
stored in
additional column groups).
I think all I need to do is implement an InputFormat
implementation that
takes a table name (or more generally /select * from table
where .../).
But in order to define splits, I need to somehow discover key
ranges so
that I can issue a series of contiguous range scans.
Can you suggest how I might go about this in a general way...
if I get
this right then I'll contribute the code. Else I will need to use
external knowledge of my specific table data to partition the
task. If
Phoenix had a LIMIT with a SKIP option plus a table ROWCOUNT,
then that
would also achieve the goal. Or is there some way to
implement the
InputFormat via a native HBase API call perhaps?
Andrew.
(MongoDB's InputFormat implementation, calls an internal
function on the
server to do this:
https://github.com/mongodb/mongo-hadoop/blob/master/core/src/main/java/com/mongodb/hadoop/splitter/StandaloneMongoSplitter.java)