Re: tracking remote reads in datanode logs

2015-02-25 Thread Igor Bogomolov
Thanks a lot!

Igor

On Tue, Feb 24, 2015 at 11:46 PM, Drake민영근 drake@nexr.com wrote:

 Hi, Igor

 The AM logs are in the Hdfs if you set log aggregation property.
 Otherwise, they are in the container log directory. See this:
 http://ko.hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/

 Thanks

 2015년 2월 25일 수요일, Igor Bogomolovigor.bogomo...@gmail.com님이 작성한 메시지:

 Hi Drake,

 Thanks for a pointer. AM log indeed have information about remote map
 tasks. But I'd like to have more low level details. Like on which node each
 map task was scheduled and how many bytes was read. That should be exactly
 in datanode log and I saw it for another job. But after I reinstall the
 cluster it's not there anymore :(

 Could you please tell the path where AM log is located (from which you
 copied the lines)? I found it in web interface but not as file on a disk.
 And nothing in /var/log/hadoop-*

 Thanks,
 Igor

 On Tue, Feb 24, 2015 at 1:51 AM, Drake민영근 drake@nexr.com wrote:

 I found this in the mapreduce am log.

 2015-02-23 11:22:45,576 INFO [RMCommunicator Allocator]
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before
 Scheduling: PendingReds:1 ScheduledMaps:5 ScheduledReds:0 AssignedMaps:0
 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0
 HostLocal:0 RackLocal:0
 ..
 2015-02-23 11:22:46,641 INFO [RMCommunicator Allocator]
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After
 Scheduling: PendingReds:1 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:5
 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:5 ContRel:0
 HostLocal:3 RackLocal:2
 ..

 The first line says Map tasks are 5 and second says HostLocal 3 and Rack
 Local 2. I think the Rack Local 2 are the remote map tasks as you mentioned
 before.


 Drake 민영근 Ph.D
 kt NexR

 On Tue, Feb 24, 2015 at 9:45 AM, Drake민영근 drake@nexr.com wrote:

 Hi, Igor

 Did you look at the mapreduce application master log? I think the local
 or rack local map tasks are logged in the MapReduce AM log.

 Good luck.

 Drake 민영근 Ph.D
 kt NexR

 On Tue, Feb 24, 2015 at 3:30 AM, Igor Bogomolov 
 igor.bogomo...@gmail.com wrote:

 Hi all,

 In a small cluster of 5 nodes that run CDH 5.3.0 (Hadoop 2.5.0) I
 want to know how many remote map tasks (ones that read input data from
 remote nodes) there are in a mapreduce job. For this purpose I took logs 
 of
 each datanode an looked for lines with op: HDFS_READ and cliID
 field that contains map task id.

 Surprisingly, 4 datanode logs does not contain lines with op: HDFS_READ.
 Another 1 has many lines with op: HDFS_READ but all cliID look like
 DFSClient_NONMAPREDUCE_* and does not contain any map task id.

 I concluded there are no remote map tasks but that does not look
 correct. Also even local reads are not logged (because there is no line
 where cliID field contains some map task id). Could anyone please
 explain what's wrong? Why logging is not working? (I use default 
 settings).

 Chris,

 Found HADOOP-3062 https://issues.apache.org/jira/browse/HADOOP-3062
 that you have implemented. Thought you might have an explanation.

 Best,
 Igor






 --
 Drake 민영근 Ph.D
 kt NexR




Re: java.net.UnknownHostException on one node only

2015-02-25 Thread tesm...@gmail.com
Thanks Varun,

Where shall I check to resolve it?


Regards,
Tariq

On Mon, Feb 23, 2015 at 4:07 AM, Varun Kumar varun@gmail.com wrote:

 Hi Tariq,

 Issues looks like DNS configuration issue.


 On Sun, Feb 22, 2015 at 3:51 PM, tesm...@gmail.com tesm...@gmail.com
 wrote:

 I am getting java.net.UnknownHost exception continuously on one node
 Hadoop MApReduce execution.

 That node is accessible via SSH. This node is shown in yarn node -list
 and hadfs dfsadmin -report queries.

 Below is the log from execution

 15/02/22 20:17:42 INFO mapreduce.Job: Task Id :
 attempt_1424622614381_0008_m_43_0, Status : FAILED
 Container launch failed for container_1424622614381_0008_01_16 :
 java.lang.IllegalArgumentException: *java.net.UnknownHostException:
 101-master10*
 at
 org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:373)
 at
 org.apache.hadoop.security.SecurityUtil.setTokenService(SecurityUtil.java:352)
 at
 org.apache.hadoop.yarn.util.ConverterUtils.convertFromYarn(ConverterUtils.java:237)
 at
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:218)
 at
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.init(ContainerManagementProtocolProxy.java:196)
 at
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:117)
 at
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403)
 at
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
 at
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 *Caused by: java.net.UnknownHostException: 101-master10*
 ... 12 more



 15/02/22 20:17:44 INFO

 Regards,
 Tariq




 --
 Regards,
 Varun Kumar.P



HDFS data after nodes become unavailable?

2015-02-25 Thread tesm...@gmail.com
Dear all,

I have transferred the data from local storage to HDFS in my 10 nodes
Hadoop cluster. The relication facotr is 3.

Some nodes, say 3,  are not available after some time. I can't use those
nodes for computation or storage of data.

What will happen to the data stored on HDFS of those nodes?

Do I need to remvoe all the data from HDFS and copy it again?

Regards,


NodeManager was not connected to its ResourceManager

2015-02-25 Thread Manoj Venkatesh
Dear Hadoop experts,

Firstly, thank you all for answering my previous question(s).

Now, I have a hadoop cluster of 8 nodes, we use YARN


The information transmitted in this email is intended only for the person or 
entity to which it is addressed, and may contain material confidential to Xoom 
Corporation, and/or its subsidiary, buyindiaonline.com Inc. Any review, 
retransmission, dissemination or other use of, or taking of any action in 
reliance upon, this information by persons or entities other than the intended 
recipient(s) is prohibited. If you received this email in error, please contact 
the sender and delete the material from your files.


Re: video stream as input to sequence files

2015-02-25 Thread daemeon reiydelle
Can you explain your use case?



*...*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Wed, Feb 25, 2015 at 4:01 PM, tesm...@gmail.com tesm...@gmail.com
wrote:

 Hi,

 How can I make my video data files as input for sequence file or to HDFS
 directly.


 Regards,
 Tariq



Re: Hadoop Handy Commands

2015-02-25 Thread Jagat Singh
This should be good start

https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CommandsManual.html


On Thu, Feb 26, 2015 at 9:19 AM, Krish Donald gotomyp...@gmail.com wrote:

 Hi,

 Does anybody ahs Hadoop related commands list which comes really handy
 when needed?

 If yes, please share.

 Thanks
 Krish



video stream as input to sequence files

2015-02-25 Thread tesm...@gmail.com
Hi,

How can I make my video data files as input for sequence file or to HDFS
directly.


Regards,
Tariq


Re: HDFS data after nodes become unavailable?

2015-02-25 Thread Rajesh Kartha
Do you know why the 3 nodes are down ? With replication,  the copy of data
that were hosted on those failed nodes will not be available. However, the
data will still be served by the hosts having the other 2 copies - so I
don't think you need to copy the data again.

Unless for some reason the 3 copies of some data ended up on these nodes,
in which case those will not be available

Maybe you could do a ' hadoop fsck /'  to confirm if the HDFS is healthy.

-Rajesh

On Wed, Feb 25, 2015 at 9:21 AM, tesm...@gmail.com tesm...@gmail.com
wrote:

 Dear all,

 I have transferred the data from local storage to HDFS in my 10 nodes
 Hadoop cluster. The relication facotr is 3.

 Some nodes, say 3,  are not available after some time. I can't use those
 nodes for computation or storage of data.

 What will happen to the data stored on HDFS of those nodes?

 Do I need to remvoe all the data from HDFS and copy it again?

 Regards,




Re: video stream as input to sequence files

2015-02-25 Thread tesm...@gmail.com
Dear Daemeon,

Thanks for your rpely. Here is my flow.

I am processing video frames using MapReduce. Presently, I convert the
video files to individual framess, make a sequence file out of them and
transfer the sequence file to HDFS.

This flow is not optimized and I need to optimize it.

On Thu, Feb 26, 2015 at 3:00 AM, daemeon reiydelle daeme...@gmail.com
wrote:

 Can you explain your use case?



 *...*






 *“Life should not be a journey to the grave with the intention of arriving
 safely in apretty and well preserved body, but rather to skid in broadside
 in a cloud of smoke,thoroughly used up, totally worn out, and loudly
 proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
 (+1) 415.501.0198London (+44) (0) 20 8144 9872*

 On Wed, Feb 25, 2015 at 4:01 PM, tesm...@gmail.com tesm...@gmail.com
 wrote:

 Hi,

 How can I make my video data files as input for sequence file or to HDFS
 directly.


 Regards,
 Tariq