Hi Nishant!

You should be able to look at the datanode and nodemanager log files to
find out why they died after you ran the 76 mappers. It is extremely
unusual (I haven't heard of a verified case for over 4-5 years) of a job
killing nodemanagers unless your cluster is configured poorly. Which
container-executor do you use? Which user is running the nodemanager and
datanode process? Which user does a MapTask run as?

Are you sure the cluster is fine? How many resources do you see available
in the ResourceManager? Are you submitting the application to a queue with
enough resources?

Ravi

On Fri, Jul 28, 2017 at 5:19 AM, Nishant Verma <nishant.verma0...@gmail.com>
wrote:

> Hello,
>
> In my 5 node Hadoop 2.7.3 AWS EC2 instance cluster, things were running
> smooth before I submitted one query. I tried to create an ORC table using
> below query:
>
> create table dummy_orc stored as orc tblproperties ("orc.compress"="Lz4")
> as select * from dummy;
>
> The job said, it would run 76 mappers and 0 reducers and job started.
> After some 10-12 minutes when the map % reached 100%, the job aborted and
> did not give output. Since number of records was large, I did not mind the
> large time it took initially.But then all my datanode daemons and
> nodemanager daemons died. The hdfs dfsadmin -report command gave 0 cluster
> capacity, 0 live datanodes, etc.
>
> I restarted the cluster completely. Restarted namenode, resource manager,
> datanode, nodemanager, zkfc services, quorumPeerMain, everything. After
> that the cluster capacity,etc is coming fine. I am able to fire normal
> non-mapreduce queries like select *.
>
> But mapreduce is not starting.Also spark jobs are running now. They are
> stuck at ACCEPTED state like MR jobs.
>
> MR is stuck for select count(1) from dummy at:
>
> Query ID = hadoopuser_20170728093320_b1875223-801e-466b-997f-4b58f0e90041
> Total jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=<number>
> Starting Job = job_1501233326257_0003, Tracking URL =
> http://dev-bigdatamaster1:8088/proxy/application_1501233326257_0003/
> Kill Command = /home/hadoopuser/hadoop//bin/hadoop job  -kill
> job_1501233326257_0003
>
> Which log would give me better picture to resolve this error? And what
> went wrong?
>

Reply via email to