Bah, I forgot to paste the pig script like an idiot: table1 = LOAD 'hbase://table1' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage( '', '-loadKey -noWAL=true -minTimestamp=1451624400000 -maxTimestamp=1454302800000') AS (uid:chararray);
table2 = LOAD 'hbase://table2' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage( '', '-loadKey -noWAL=true -regex=\\\\|ago=156\\\\| -minTimestamp=1451624400000 -maxTimestamp=1454302800000') AS (uid:chararray); user_segment_with_event = JOIN table1 BY uid, table2 BY uid USING 'merge'; -- fails with TableSplitComparable cannot be cast to TableSplit -- http://ip-10-0-1-180.ec2.internal:19888/jobhistory/logs/ip-10-0-1-14.ec2.internal:45454/container_e10_1457365475473_0248_01_000029/attempt_1457365475473_0248_r_000000_0/hadoop/syslog/?start=0 ones = FOREACH user_segment_with_event GENERATE (int) 1 AS one:int; c = GROUP ones ALL; c = FOREACH c GENERATE COUNT(ones); dump c William Watson Lead Software Engineer On Thu, Mar 10, 2016 at 11:11 AM, Billy Watson <williamrwat...@gmail.com> wrote: > Thanks to a bug fix put in by a colleague of mine, merge joins work for > tables loaded into pig via HBaseStorage. In our test environment and in the > test environment for pig itself, I'm able to get all sorts of fairly > complex data merging without issue. > > However, when I use that same code on larger data sets in a production > environment, the merge join fails. If I run it on the same exact tables on > the same cluster after trimming the data down to just a few rows, the merge > join works fine. > > Here is the most basic I've been able to get the pig script. I've been > taking out pieces and parts trying to narrow it down but it still fails: > > > > If I change the count portion to a limit 5 or something, I'm able to dump > the relation. > > The merge join finishes all of its mappers, but when it gets to the reduce > step and starts doing a sort (don't ask me why it's even doing a sort on > pre-sorted data), it throws the following error: > > 2016-03-09 19:36:01,738 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: Error while > doing final merge > at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:160) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.ClassCastException: > org.apache.pig.backend.hadoop.hbase.TableSplitComparable cannot be cast to > org.apache.hadoop.hbase.mapreduce.TableSplit > at > org.apache.pig.backend.hadoop.hbase.TableSplitComparable.compareTo(TableSplitComparable.java:26) > at org.apache.pig.data.DataType.compare(DataType.java:566) > at org.apache.pig.data.DataType.compare(DataType.java:464) > at > org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareDatum(BinInterSedes.java:1106) > at > org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compare(BinInterSedes.java:1082) > at > org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinSedesTuple(BinInterSedes.java:787) > at > org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compare(BinInterSedes.java:728) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTupleSortComparator.compare(PigTupleSortComparator.java:100) > at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:587) > at org.apache.hadoop.util.PriorityQueue.upHeap(PriorityQueue.java:128) > at org.apache.hadoop.util.PriorityQueue.put(PriorityQueue.java:55) > at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:678) > at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:596) > at org.apache.hadoop.mapred.Merger.merge(Merger.java:131) > at org.apache.hadoop.mapred.Merger.merge(Merger.java:115) > at > org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.finalMerge(MergeManagerImpl.java:722) > at > org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.close(MergeManagerImpl.java:370) > at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:158) > > > > If I switch the order of the two relations in the merge join, I get a > different error which appears more promising, but I still don't know what > to do about it: > > 2016-03-09 19:55:24,789 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception > while executing (Name: c: Local Rearrange[tuple]{chararray}(false) - > scope-334 Operator Key: scope-334): > org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error while > executing ForEach at [c[62,4]] > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:316) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:291) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:279) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:274) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: > Error while executing ForEach at [c[62,4]] > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:325) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307) > ... 12 more > Caused by: java.lang.NullPointerException > at > org.apache.pig.impl.builtin.DefaultIndexableLoader.seekNear(DefaultIndexableLoader.java:190) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.seekInRightStream(POMergeJoin.java:542) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNextTuple(POMergeJoin.java:299) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPreCombinerLocalRearrange.getNextTuple(POPreCombinerLocalRearrange.java:126) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:252) > > > Again, I've tried replicating the exact scenario (and more complicated > ones) in local environments and I can't get it to fail. I think it's > related to yarn/mapreduce, but I can't figure out why that would matter or > what it's really doing. > > I'm trying to set up the e2e (end to end) tests in the pig repo, but I'm > not having any luck there, either. If I can't get a test failure, I'm > afraid I'm not going to be able to fix the bug or issue. > > Can anyone help point me in the right direction as far as next debugging > steps or what might be wrong? > > > William Watson > Lead Software Engineer >