Re: Simple Pig query returns inaccurate result size for HBase tables of 1.8m+ rows

2011-01-24 Thread Mr. Lukas
Hi Dmitriy, Sorry for the late reply, I was out of office. Discarding the caster and caching option (i.e. using only the -loadkey option) does not change anything except that some FIELD_DISCARDED_TYPE_CONVERSION_FAILED warnings are issued. On Fri, Jan 21, 2011 at 1:42 AM, Dmitriy Ryaboy

Custom partitioning and order for optimum hbase store

2011-01-24 Thread Dmitriy Lyubimov
Hi, so it seems to be more efficient if storing to hbase partitions by regions and orders by hbase keys. I see that pig 0.8 (pig-282) added custom partitioner in a group but i am not sure if order is enforced there. Is there a way to run single MR that orders and partitions data as per above

Re: Custom partitioning and order for optimum hbase store

2011-01-24 Thread Dmitriy Ryaboy
Pushing this logic into the storefunc would force an MR boundary before the store (unless the StoreFunc passed, I suppose) which can make things overly complex. I think for the purposes of bulk-loading into HBase, a better approach might be to use the native map-reduce functionality and feed

Re: Custom partitioning and order for optimum hbase store

2011-01-24 Thread Dmitriy Lyubimov
Thanks. So i take there's no way in pig to specify custom partitioner And the ordering in one MR step? I don't think prebuilding HFILEs is the best strategy in my case. For my job is incremental (i.e. i am not replacing 100% of the data). However, it is big enough that i don't want to create

Re: Custom partitioning and order for optimum hbase store

2011-01-24 Thread Alan Gates
Do you want to order the groups or just within the groups? If you want to order within the groups you can do that in Pig in a single job. Alan. On Jan 24, 2011, at 1:20 PM, Dmitriy Lyubimov wrote: Thanks. So i take there's no way in pig to specify custom partitioner And the ordering in

Re: Using HBaseStorage with Pig 0.8 and HBase 0.89

2011-01-24 Thread Dmitriy Ryaboy
Jacob, Are you sure you don't have pig 7 or earlier jars kicking around? I mean.. public class HBaseStorage extends LoadFunc implements StoreFuncInterface, LoadPushDown { ... On Mon, Jan 24, 2011 at 4:20 PM, jacob jacob.a.perk...@gmail.com wrote: I'm having problems getting HBaseStorage to

Re: Custom partitioning and order for optimum hbase store

2011-01-24 Thread Dmitriy Lyubimov
Thank you, Alan. Let me consider this for a moment. -d On Mon, Jan 24, 2011 at 2:26 PM, Alan Gates ga...@yahoo-inc.com wrote: Since Pig uses the partitioner to provide a total order (by which I mean an order across part files), we don't allow users to override the partitioner in that case.