Is this the good way to limit than using pig LIMIT like (fields = LIMIT fields 5;) since filtering is already done while loading ?
Thanks, On Thu, Mar 14, 2013 at 9:50 PM, Dmitriy Ryaboy <[email protected]> wrote: > To explain what's going on: > -limit for HBaseStorage limits the number of rows returned from *each > region* in the hbase table. It's an optimization -- there is no way for the > LIMIT operator to be pushed down to the loader, so you can do it explicitly > if you know you only need a few rows and don't want to pull the rest from > HBase just to drop them on the floor once they've been extracted and sent > to your mappers. > > > On Wed, Mar 13, 2013 at 9:17 AM, kiran chitturi > <[email protected]>wrote: > > > Thank you. This cleared my doubt. > > > > > > On Wed, Mar 13, 2013 at 11:37 AM, Bill Graham <[email protected]> > > wrote: > > > > > The -limit passed to HBaseStorage is the limit per mapper reading from > > > HBase. If you want to limit overall records, also use LIMIT: > > > > > > fields = LIMIT fields 5; > > > > > > > > > On Wed, Mar 13, 2013 at 7:48 AM, kiran chitturi > > > <[email protected]>wrote: > > > > > > > Hi! > > > > > > > > I am using Pig 0.10.0 with Hbase in distributed mode to read the > > records > > > > and I have used this command below. > > > > > > > > fields = load 'hbase://documents' using > > > > > > > > > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('field:fields_j','-loadKey > > > > true -limit 5') as (rowkey, fields:map[]); > > > > > > > > I want pig to limit the records to only 5 but it is quite different. > > > Please > > > > see the logs below. > > > > > > > > Input(s): > > > > Successfully read 250 records (16520 bytes) from: "hbase://documents" > > > > > > > > Output(s): > > > > Successfully stored 250 records (19051 bytes) in: > > > > "hdfs://LucidN1:50001/tmp/temp1510040776/tmp1443083789" > > > > > > > > Counters: > > > > > Total records written : 250 > > > > > Total bytes written : 19051 > > > > > Spillable Memory Manager spill count : 0 > > > > > Total bags proactively spilled: 0 > > > > > Total records proactively spilled: 0 > > > > > Job DAG: > > > > > job_201303121846_0056 > > > > > > > > > > 2013-03-13 14:43:10,186 [main] WARN > > > > > > > > > > > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > > > > > - Encountered Warning FIELD_DISCARDED_TYPE_CONVERSION_FAILED 250 > > > time(s). > > > > > 2013-03-13 14:43:10,186 [main] INFO > > > > > > > > > > > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > > > > > - Success! > > > > > 2013-03-13 14:43:10,210 [main] INFO > > > > > org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total > input > > > > paths > > > > > to process : 51 > > > > > 2013-03-13 14:43:10,211 [main] INFO > > > > > org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - > > Total > > > > > input paths to process : 51 > > > > > > > > > > > > Am I using the 'limit' keyword the wrong way ? > > > > > > > > Please let me know your suggestions. > > > > > > > > Thanks, > > > > -- > > > > Kiran Chitturi > > > > > > > > <http://www.linkedin.com/in/kiranchitturi> > > > > > > > > > > > > > > > > -- > > > *Note that I'm no longer using my Yahoo! email address. Please email me > > at > > > [email protected] going forward.* > > > > > > > > > > > -- > > Kiran Chitturi > > > > <http://www.linkedin.com/in/kiranchitturi> > > > -- Kiran Chitturi <http://www.linkedin.com/in/kiranchitturi>
