Pig command hangs after Hadoop 2.x upgrade

๏̯͡๏ Thu, 13 Mar 2014 00:39:12 -0700

Hello,
I have a hadoop cluster upgraded to Hadoop 2.x and everything with it works
fine. (Runs M/R jobs, able to perform actions on HDFS).


When i run a pig script using pig grunt shell or pig -x mapreduce -f
'test.pig'.  In either case it connects to Hadoop cluster, starts M/R job,
the M/R job completes fine. However the shell hangs and never returns.


2014-03-13 00:17:04,341 [JobControl] INFO  org.apache.hadoop.mapreduce.Job
- The url to track the job:
https://apollo-jt.vip.org.com:50030/proxy/application_1394582929977_7433/
2014-03-13 00:17:04,342 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_1394582929977_7433
2014-03-13 00:17:04,342 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Processing aliases A,B,C
2014-03-13 00:17:04,342 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- detailed locations: M: A[4,4],C[6,4],B[5,4] C: C[6,4],B[5,4] R: C[6,4]
2014-03-13 00:17:04,365 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
2014-03-13 00:17:36,232 [main] INFO  org.apache.hadoop.ipc.Client -
Retrying connect to server: <AM_Host_Name>/<AM_Host_IP>:47718. Already
tried 0 time(s); maxRetries=45
2014-03-13 00:17:36,232 [main] INFO  org.apache.hadoop.ipc.Client -
Retrying connect to server: <AM_Host_Name>/<AM_Host_IP>:47718. Already
tried 1 time(s); maxRetries=45
2014-03-13 00:17:36,232 [main] INFO  org.apache.hadoop.ipc.Client -
Retrying connect to server: <AM_Host_Name>/<AM_Host_IP>:47718. Already
tried 2 time(s); maxRetries=45
..
...



On analyzing, i found that AM_Host_Name matches the Application Master of
the M/R job.
Question
1) Does the client machine attempts to connect to Application Master, in
order to get the status of M/R Job ?
2) If #1 is true, and since its Hadoop 2.x secure cluster, does it mean it
requires firewall to be open between client and application master (any
node in the cluster) and PORT ?
3) I assumed #1 and #2 are true and hence got the firewall to be opened
between client and all nodes in hadoop cluster (since anyone can be
application master) for port 47718. However to my surprise i found that
this 47718 port changed.
Is there a setting or a group of port numbers that are used to communicate
between client and AM in order to report status ? If yes where can i find
this list ?

4) How do i get the grunt shell back and see the status/progress of job
from client machine ?



-- 
Deepak

Pig command hangs after Hadoop 2.x upgrade

Reply via email to