you can create a custom function (for example http://phoenix-hbase.blogspot.com/2013/04/how-to-add-your-own-built-in-function.html)
On Fri, Apr 4, 2014 at 12:47 AM, Andrew <[email protected]> wrote: > I am considering using Phoenix, but I know that I will want to transform > my data via MapReduce, e.g. UPSERT some core data, then go back over the > data set and "fill in" some additional columns (appropriately stored in > additional column groups). > > I think all I need to do is implement an InputFormat implementation that > takes a table name (or more generally /select * from table where .../). > But in order to define splits, I need to somehow discover key ranges so > that I can issue a series of contiguous range scans. > > Can you suggest how I might go about this in a general way... if I get > this right then I'll contribute the code. Else I will need to use > external knowledge of my specific table data to partition the task. If > Phoenix had a LIMIT with a SKIP option plus a table ROWCOUNT, then that > would also achieve the goal. Or is there some way to implement the > InputFormat via a native HBase API call perhaps? > > Andrew. > > (MongoDB's InputFormat implementation, calls an internal function on the > server to do this: > https://github.com/mongodb/mongo-hadoop/blob/master/core/ > src/main/java/com/mongodb/hadoop/splitter/StandaloneMongoSplitter.java) > >
