Hi Eduardo, there is no 0.9.1.. do you mean you built it from the 0.9 branch? Could you try trunk?
On Tue, Sep 6, 2011 at 9:50 AM, Eduardo Afonso Ferreira <eafon...@yahoo.com>wrote: > Hi there, > > We hit a possible issue with Pig (version 0.9.1) and HBaseStorage where we > try to LOAD multiple sets of data and UNION them. Here's a simple example > that shows the problem: > > HBase Data (use hbase shell to create table and add rows): > > > create 'test', {NAME => 'data', VERSIONS => 1} > > put 'test', '11111', 'data:value', '1' > put 'test', '11112', 'data:value', '2' > put 'test', '11113', 'data:value', '3' > put 'test', '22221', 'data:value', '4' > put 'test', '22222', 'data:value', '5' > > put 'test', '22223', 'data:value', '6' > > Pig Statements (create file test.pig): > > load1 = LOAD 'hbase://test' USING > org.apache.pig.backend.hadoop.hbase.HBaseStorage('data:*','-loadKey -gte > 11110 -lte 22220') AS (key:chararray, map:map[]); > load2 = LOAD 'hbase://test' USING > org.apache.pig.backend.hadoop.hbase.HBaseStorage('data:*','-loadKey -gte > 22220 -lte 33330') AS (key:chararray, map:map[]); > result = UNION load1, load2; > dump result; > > > Run Script: > pig -x local test.pig > > > Result: > (11111,[value#1]) > (11112,[value#2]) > (11113,[value#3]) > (11111,[value#1]) > (11112,[value#2]) > (11113,[value#3]) > > > > The result should be the following: > (11111,[value#1]) > (11112,[value#2]) > (11113,[value#3]) > (22221,[value#4]) > (22222,[value#5]) > (22223,[value#6]) > > If we dump load1 or load2 we see the results we expect, but when the UNION > is performed, it does not put the expected data together. > > Is this a known issue with Pig/HBaseStorage or are we not using them as we > should? > If it's a usage problem, what would be the proper way of loading multiple > sets of data and union them? > > Thanks in advance. > Eduardo. >