Hello, In my 5 node Hadoop 2.7.3 AWS EC2 instance cluster, things were running smooth before I submitted one query. I tried to create an ORC table using below query:
create table dummy_orc stored as orc tblproperties ("orc.compress"="Lz4") as select * from dummy; The job said, it would run 76 mappers and 0 reducers and job started. After some 10-12 minutes when the map % reached 100%, the job aborted and did not give output. Since number of records was large, I did not mind the large time it took initially.But then all my datanode daemons and nodemanager daemons died. The hdfs dfsadmin -report command gave 0 cluster capacity, 0 live datanodes, etc. I restarted the cluster completely. Restarted namenode, resource manager, datanode, nodemanager, zkfc services, quorumPeerMain, everything. After that the cluster capacity,etc is coming fine. I am able to fire normal non-mapreduce queries like select *. But mapreduce is not starting.Also spark jobs are running now. They are stuck at ACCEPTED state like MR jobs. MR is stuck for select count(1) from dummy at: Query ID = hadoopuser_20170728093320_b1875223-801e-466b-997f-4b58f0e90041 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1501233326257_0003, Tracking URL = http://dev-bigdatamaster1:8088/proxy/application_1501233326257_0003/ Kill Command = /home/hadoopuser/hadoop//bin/hadoop job -kill job_1501233326257_0003 Which log would give me better picture to resolve this error? And what went wrong?