I've reported the issue here: https://issues.apache.org/jira/browse/PIG-2193

Still investigating, but seems so far that the FILTER clause makes the HBase loader loose all fields that are not explicitly used in the script

I striped down the request to:

start_sessions = LOAD 'startSession.mde253811.preprod.ubithere.com' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid meta:infoid meta:imei meta:timestamp') AS (sid:chararray, infoid:chararray, imei:chararray, start:long); end_sessions = LOAD 'endSession.mde253811.preprod.ubithere.com' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid meta:timestamp meta:locid') AS (sid:chararray, end:long, locid:chararray);
sessions = JOIN start_sessions BY sid, end_sessions BY sid;
sessions = FILTER sessions BY end > start AND end - start < 86400000L;
dump sessions;

and in the result, the fields "infoid", "imei" and "locid" are empty, whereas the fields "sid", "start", "stop" are present.

(00000A2A33254B8FAE1E9AEAB2428EBE,,,1310649832970,00000A2A33254B8FAE1E9AEAB2428EBE,1310649838390,) (00001DCECDC842C0A745C151B9EC295F,,,1310628836846,00001DCECDC842C0A745C151B9EC295F,1310628839075,) (00001F8F2B3148D393963928188C72B6,,,1310681918742,00001F8F2B3148D393963928188C72B6,1310681949182,)
...

When using the HDFS loader, everything works correctly:

(00000A2A33254B8FAE1E9AEAB2428EBE,b87ac86bcf1d4cb44202aa826554a7b2,4e77d62e1839a470ec8386d42b85a076,1310649832970,00000A2A33254B8FAE1E9AEAB2428EBE,1310649838390,) (00001DCECDC842C0A745C151B9EC295F,4a4bb0fff26e368c8209f1e480fdf70b,db3924d2e4b88bd103fa19aaa30a9af4,1310628836846,00001DCECDC842C0A745C151B9EC295F,1310628839075,) (00001F8F2B3148D393963928188C72B6,5d7e58f68366b55d55862815f863a996,79e1ba90aa555e3e1041df4be657a11d,1310681918742,00001F8F2B3148D393963928188C72B6,1310681949182,)
...





Reply via email to