Alexandre Fouché created HIVE-3919:
--------------------------------------

             Summary: Partition query refused when PPD is disabled (due to 
HIVE-3699)
                 Key: HIVE-3919
                 URL: https://issues.apache.org/jira/browse/HIVE-3919
             Project: Hive
          Issue Type: Bug
          Components: Query Processor
         Environment: Cloudera 4.1 on Amazon Linux (rebranded Centos 6): 
hive-0.9.0+150-1.cdh4.1.1.p0.4.el6.noarch

            Reporter: Alexandre Fouché


The query below was working perfectly when hive.optimize.ppd=true. But i had to 
disable this property because my results are impacted by unresolved HIVE-3699.
But now that hive.optimize.ppd=false, the query from table1 partitionned by 
field 'day' is refused, even if the query has the partition predicate.

>From HIVE-830 it says that the partition pruner depends on it, so that may 
>explain why my queries are refused when Predictive Push Down is disabled.

The query is accepted and run if i set hive.mapred.mode=nonstrict. But from 
Hive-server2 logs and input path lists and also processing, all dataset 
regardless of partitions is deserialized !

    set hive.optimize.ppd = true;
    set hive.mapred.mode=strict;
    FROM table1
    INSERT OVERWRITE TABLE general1
    SELECT qs_cs_s_kid, day, qs_cs_s_aid, COUNT(*) WHERE qs_cs_s_cat='PRINT' 
AND (day>=20121015 and day<=20130103) GROUP BY qs_cs_s_kid, day, qs_cs_s_aid;
    -> Parsed fine, but wrong results when i add multiple 'insert overwrite'

    set hive.optimize.ppd = false;
    set hive.mapred.mode=strict;
    FROM table1
    INSERT OVERWRITE TABLE general1
    SELECT qs_cs_s_kid, day, qs_cs_s_aid, COUNT(*) WHERE qs_cs_s_cat='PRINT' 
AND (day>=20121015 and day<=20130103) GROUP BY qs_cs_s_kid, day, qs_cs_s_aid;
    Error: Error while processing statement: FAILED: SemanticException [Error 
10041]: No partition predicate found for Alias "table1" Table "table1" 
(state=42000,code=10041)
    Error: Error while processing statement: FAILED: SemanticException [Error 
10041]: No partition predicate found for Alias "table1" Table "table1" 
(state=42000,code=10041)

The table 'table1' was created like this, it is partitionned by 'day':

    CREATE TABLE table1
    PARTITIONED BY (day BIGINT)
    ROW FORMAT
    SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
    STORED AS
    INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
    TBLPROPERTIES ('avro.schema.literal'='{
      "type": "record",
      "name": "Log",
      "fields" : [
        {"name": "time_stamp", "type": ["null", "int"], "default": null},
        (...)
      ]
    }')
    ;


    set hive.optimize.ppd = false;
    set hive.mapred.mode=strict;
    FROM table1
    INSERT OVERWRITE TABLE general1
    SELECT qs_cs_s_kid, day, qs_cs_s_aid, COUNT(*) WHERE qs_cs_s_cat='PRINT' 
AND day=20121015 GROUP BY qs_cs_s_kid, day, qs_cs_s_aid;
    -> FAILS TOO

    set hive.optimize.ppd = false;
    INSERT OVERWRITE TABLE general1
    SELECT qs_cs_s_kid, day, qs_cs_s_aid, COUNT(*) FROM table1 WHERE 
qs_cs_s_cat='PRINT' AND day=20121015 GROUP BY qs_cs_s_kid, day, qs_cs_s_aid;
    -> FAILS TOO

    set hive.optimize.ppd = false;
    set hive.mapred.mode=nonstrict;
    FROM table1
    INSERT OVERWRITE TABLE general1
    SELECT qs_cs_s_kid, day, qs_cs_s_aid, COUNT(*) WHERE qs_cs_s_cat='PRINT' 
AND day=20121015 GROUP BY qs_cs_s_kid, day, qs_cs_s_aid;
    -> WORKS, but from Hive-server2 logs and input path lists and also 
processing, all dataset regardless of partitions is deserialized !


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to