Hi guys,
I've banged my head on this one all day and i need to surrender.  I have a
query that hangs (never returns). However, when i turn on logging to DEBUG
level it works.  I'm stumped.   I include here the query, the different
query plans (with the only thing different being the log level) and a
traceback on the query that hangs.

* the query

#!/bin/bash

sql="
explain SELECT *
FROM   raw_logs.fact_www_access_datekey_by_hour
WHERE  date_key = 20160828      -- partition key
and lower(http_string) = 'foo'  -- always evaluates to false
limit 1                         -- if i remove limit 1 it works.  I expect
an empty result set.
;
"

#hive -hiveconf hive.root.logger=ERROR,console -e "$sql" >stdout.bad
2>stderr.bad  #hangs
hive -hiveconf hive.root.logger=DEBUG,console -e "$sql" >stdout.good
2>stderr.good #works



* explain plan for the one that hangs (ERROR,console)

STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
    Fetch Operator
      limit: 1
      Processor Tree:
        TableScan
          alias: fact_www_access_datekey_by_hour
          Statistics: Num rows: 144355924 Data size: 158755813572 Basic
stats: COMPLETE Column stats: NONE
          Filter Operator
            predicate: (lower(http_string) = 'foo') (type: boolean)
            Statistics: Num rows: 72177962 Data size: 79377906786 Basic
stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: payload_year (type: int), payload_month (type:
int), payload_day (type: int), payload_time (type: string),
payload_gmt_offset (type: string), perf_time_micros (type: int),
http_string (type: string), http_type (type: string), http_version (type:
string), http_rc (type: int), http_size (type: int), referrer (type:
string), user_agent (type: string), bot_flag (type: string), ip_list (type:
array<string>), scalar_tags (type: map<string,string>), url_components
(type: array<string>), user_id (type: string), lt_session_id (type:
string), session_id (type: string), log_name (type: string), log_lineno
(type: int), log_line (type: string), web_server (type: string),
location_info (type: map<string,string>), dp_main_url_type (type: string),
dp_inventory_type (type: string), dp_main_url (type:
map<string,array<string>>), dp_tag_url (type: map<string,array<string>>),
dp_referrer_url (type: map<string,array<string>>), hour (type: int),
20160828 (type: int)
              outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5,
_col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15,
_col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24,
_col25, _col26, _col27, _col28, _col29, _col30, _col31
              Statistics: Num rows: 72177962 Data size: 79377906786 Basic
stats: COMPLETE Column stats: NONE
              Limit
                Number of rows: 1
                Statistics: Num rows: 1 Data size: 1099 Basic stats:
COMPLETE Column stats: NONE
                ListSink

* query plan for the one that works (DEBUG,console)

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: fact_www_access_datekey_by_hour
            Statistics: Num rows: 144355924 Data size: 158755813572 Basic
stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (lower(http_string) = 'foo') (type: boolean)
              Statistics: Num rows: 72177962 Data size: 79377906786 Basic
stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: payload_year (type: int), payload_month (type:
int), payload_day (type: int), payload_time (type: string),
payload_gmt_offset (type: string), perf_time_micros (type: int),
http_string (type: string), http_type (type: string), http_version (type:
string), http_rc (type: int), http_size (type: int), referrer (type:
string), user_agent (type: string), bot_flag (type: string), ip_list (type:
array<string>), scalar_tags (type: map<string,string>), url_components
(type: array<string>), user_id (type: string), lt_session_id (type:
string), session_id (type: string), log_name (type: string), log_lineno
(type: int), log_line (type: string), web_server (type: string),
location_info (type: map<string,string>), dp_main_url_type (type: string),
dp_inventory_type (type: string), dp_main_url (type:
map<string,array<string>>), dp_tag_url (type: map<string,array<string>>),
dp_referrer_url (type: map<string,array<string>>), hour (type: int),
20160828 (type: int)
                outputColumnNames: _col0, _col1, _col2, _col3, _col4,
_col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14,
_col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23,
_col24, _col25, _col26, _col27, _col28, _col29, _col30, _col31
                Statistics: Num rows: 72177962 Data size: 79377906786 Basic
stats: COMPLETE Column stats: NONE
                Limit
                  Number of rows: 1
                  Statistics: Num rows: 1 Data size: 1099 Basic stats:
COMPLETE Column stats: NONE
                  File Output Operator
                    compressed: true
                    Statistics: Num rows: 1 Data size: 1099 Basic stats:
COMPLETE Column stats: NONE
                    table:
                        input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
                        output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                        serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: 1
      Processor Tree:
        ListSink




* traceback when i press ctl-c on the client

^CFailed with exception java.io.IOException:java.io.IOException: Filesystem
closed
16/08/30 20:16:09 [62cf87a8-2b39-47d2-b234-53631b60d84e main]: ERROR
CliDriver: - Failed with exception java.io.IOException:java.io.IOException:
Filesystem closed
java.io.IOException: java.io.IOException: Filesystem closed
        at
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:517)
        at
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:424)
        at
org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:144)
        at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1885)
        at
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:252)
        at
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
        at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
        at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:335)
        at
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:742)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:483)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.io.IOException: Filesystem closed
        at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:794)
        at
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:833)
        at
org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:897)
        at
org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:700)
        at java.io.DataInputStream.readInt(DataInputStream.java:387)
        at
org.apache.hadoop.io.SequenceFile$Reader.readBlock(SequenceFile.java:2101)
        at
org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2508)
        at
org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:82)
        at
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:484)
        ... 16 more


on the one that hangs the job is never submitted to YARN.  On the one that
succeeds the job is submitted.  Ultimately though why does changing the log
level influence the query plan?

any insights greatly appreciated.

Thanks,
Stephen.

Reply via email to