Hi there,

We hit a possible issue with Pig (version 0.9.1) and HBaseStorage where we try 
to LOAD multiple sets of data and UNION them. Here's a simple example that 
shows the problem:

HBase Data (use hbase shell to create table and add rows):


create 'test', {NAME => 'data', VERSIONS => 1}

put 'test', '11111', 'data:value', '1'
put 'test', '11112', 'data:value', '2'
put 'test', '11113', 'data:value', '3'
put 'test', '22221', 'data:value', '4'
put 'test', '22222', 'data:value', '5'

put 'test', '22223', 'data:value', '6'

Pig Statements (create file test.pig):

load1 = LOAD 'hbase://test' USING 
org.apache.pig.backend.hadoop.hbase.HBaseStorage('data:*','-loadKey -gte 11110 
-lte 22220') AS (key:chararray, map:map[]);
load2 = LOAD 'hbase://test' USING 
org.apache.pig.backend.hadoop.hbase.HBaseStorage('data:*','-loadKey -gte 22220 
-lte 33330') AS (key:chararray, map:map[]);
result = UNION load1, load2;
dump result;


Run Script:
pig -x local test.pig


Result:
(11111,[value#1])
(11112,[value#2])
(11113,[value#3])
(11111,[value#1])
(11112,[value#2])
(11113,[value#3])



The result should be the following:
(11111,[value#1])
(11112,[value#2])
(11113,[value#3])
(22221,[value#4])
(22222,[value#5])
(22223,[value#6])

If we dump load1 or load2 we see the results we expect, but when the UNION is 
performed, it does not put the expected data together.

Is this a known issue with Pig/HBaseStorage or are we not using them as we 
should?
If it's a usage problem, what would be the proper way of loading multiple sets 
of data and union them?

Thanks in advance.
Eduardo.

Reply via email to