Hi, Hive users:
Currently our Hadoop vendor comes with Hive 0.12. I know it is a kind of old
version, but upgrade still has some long path to go.
Right now, we are facing an issue in the Hive 0.12.
We have one ETL kind of steps implemented in Hive, and due to the data volume
in this step, we know that MAPJOIN is the right way to go, as one side of data
is very small, but the other size is much larger.
So below is the query example:
set hive.exec.compress.output=true;set parquet.compression=snappy;set
mapred.reduce.tasks=1;set mapred.reduce.child.java.opts=-Xms1560m -Xmx4096m;set
mapred.task.timeout=7200000;set
mapred.map.tasks.speculative.execution=false;set
hive.ignore.mapjoin.hint=false;set
hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table a(dt='${hiveconf:run_date}', source='ip')select /*+
MAPJOIN(trial_event) */xxxx
The above query can be finished daily around 10 minutes, which we are very
happy about it. But sometimes, the query will be hang hours in the ETL, until
we manually kill it.
I add the debug info in the Hive, and found the following message:
2016-03-11 09:11:52 Starting to launch local task to process map join; maximum
memory = 536870912SLF4J: Failed to load class
"org.slf4j.impl.StaticLoggerBinder".SLF4J: Defaulting to no-operation (NOP)
logger implementationSLF4J: See
http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.16/03/11
09:11:55 DEBUG ipc.Client: IPC Client (-1284813870) connection to
namenode/10.20.95.130:9000 from etl: closed16/03/11 09:11:55 DEBUG ipc.Client:
IPC Client (-1284813870) connection to namenode/10.20.95.130:9000 from etl:
stopped, remaining connections 0
Then there is no more log after that for hours.
If we don't use MAPJOIN, we won't face this issue, but the query will take 2.5
hours.
When this happens, I can see the NameNode works fine, I can run all kinds of
"HDFS" operation without any issue, while this query is hanging. What does this
"IPC Client remaining connections 0" mean? If we cannot upgrade our Hive
version as now, any workaround do we have?
Thanks
Yong