Daniel Dai created PIG-4910: ------------------------------- Summary: Assert wrongly pushed up in optimizer Key: PIG-4910 URL: https://issues.apache.org/jira/browse/PIG-4910 Project: Pig Issue Type: Bug Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.17.0
The following script fail: {code} TEST_DATA = LOAD 'input' USING PigStorage() AS (c1:int); GR = FOREACH (GROUP TEST_DATA BY c1) GENERATE group as c1, COUNT_STAR(TEST_DATA) as count, TEST_DATA; ROWS_WITH_C1_EQUALS_ZERO = FILTER GR BY count > 1L; ROWS_WITH_C1_EQUALS_ZERO_FLATTENED = FOREACH ROWS_WITH_C1_EQUALS_ZERO GENERATE FLATTEN($0); -- Assert shouldn't fail as it should be applied after group by but because assert is getting pushed to mapper, it is failing. ASSERT ROWS_WITH_C1_EQUALS_ZERO_FLATTENED BY c1 == 0, 'Should have never seen this message, assert has a bug.'; DUMP ROWS_WITH_C1_EQUALS_ZERO_FLATTENED; {code} input: 0 0 1 The reason is assert is pushed before FILTER: {code} ROWS_WITH_C1_EQUALS_ZERO_FLATTENED: (Name: LOStore Schema: c1#14:int) | |---ROWS_WITH_C1_EQUALS_ZERO_FLATTENED: (Name: LOForEach Schema: c1#14:int) | | | (Name: LOGenerate[true] Schema: c1#14:int)ColumnPrune:InputUids=[14]ColumnPrune:OutputUids=[14] | | | | | c1:(Name: Project Type: int Uid: 14 Input: 0 Column: (*)) | | | |---(Name: LOInnerLoad[0] Schema: c1#14:int) | |---ROWS_WITH_C1_EQUALS_ZERO: (Name: LOFilter Schema: c1#14:int,count#31:long,ROWS_WITH_C1_EQUALS_ZERO_FLATTENED#29:bag{#30:tuple(c1#14:int)}) | | | (Name: GreaterThan Type: boolean Uid: 33) | | | |---count:(Name: Project Type: long Uid: 31 Input: 0 Column: 1) | | | |---(Name: Constant Type: long Uid: 32) | |---GR: (Name: LOForEach Schema: c1#14:int,count#31:long,ROWS_WITH_C1_EQUALS_ZERO_FLATTENED#29:bag{#30:tuple(c1#14:int)}) | | | (Name: LOGenerate[false,false,false] Schema: c1#14:int,count#31:long,ROWS_WITH_C1_EQUALS_ZERO_FLATTENED#29:bag{#30:tuple(c1#14:int)})ColumnPrune:InputUids=[29, 14]ColumnPrune:OutputUids=[14, 31] | | | | | group:(Name: Project Type: int Uid: 14 Input: 0 Column: (*)) | | | | | (Name: UserFunc(org.apache.pig.builtin.COUNT_STAR) Type: long Uid: 31) | | | | | |---ROWS_WITH_C1_EQUALS_ZERO_FLATTENED:(Name: Project Type: bag Uid: 29 Input: 1 Column: (*)) | | | | | ROWS_WITH_C1_EQUALS_ZERO_FLATTENED:(Name: Project Type: bag Uid: 29 Input: 2 Column: (*)) | | | |---(Name: LOInnerLoad[0] Schema: group#14:int) | | | |---ROWS_WITH_C1_EQUALS_ZERO_FLATTENED: (Name: LOInnerLoad[1] Schema: c1#14:int) | | | |---ROWS_WITH_C1_EQUALS_ZERO_FLATTENED: (Name: LOInnerLoad[1] Schema: c1#14:int) | |---1-3: (Name: LOCogroup Schema: group#14:int,ROWS_WITH_C1_EQUALS_ZERO_FLATTENED#29:bag{#44:tuple(c1#14:int)}) | | | c1:(Name: Project Type: int Uid: 14 Input: 0 Column: 0) | |---ROWS_WITH_C1_EQUALS_ZERO_FLATTENED: (Name: LOFilter Schema: c1#14:int) | | | (Name: UserFunc(org.apache.pig.builtin.Assert) Type: boolean Uid: 40) | | | |---(Name: BinCond Type: boolean Uid: 38) | | | | | |---(Name: Equal Type: boolean Uid: 35) | | | | | | | |---c1:(Name: Project Type: int Uid: 14 Input: 0 Column: 0) | | | | | | | |---(Name: Constant Type: int Uid: 34) | | | | | |---(Name: Constant Type: boolean Uid: 36) | | | | | |---(Name: Constant Type: boolean Uid: 37) | | | |---(Name: Constant Type: chararray Uid: 39) | |---TEST_DATA: (Name: LOForEach Schema: c1#14:int) | | | (Name: LOGenerate[false] Schema: c1#14:int)ColumnPrune:InputUids=[14]ColumnPrune:OutputUids=[14] | | | | | (Name: Cast Type: int Uid: 14) | | | | | |---c1:(Name: Project Type: bytearray Uid: 14 Input: 0 Column: (*)) | | | |---(Name: LOInnerLoad[0] Schema: c1#14:bytearray) | |---TEST_DATA: (Name: LOLoad Schema: c1#14:bytearray)RequiredFields:null {code} Runs fine by turning off PushUpFilter ("-t PushUpFilter"). -- This message was sent by Atlassian JIRA (v6.3.4#6332)