Re: tracking remote reads in datanode logs
Thanks a lot! Igor On Tue, Feb 24, 2015 at 11:46 PM, Drake민영근 drake@nexr.com wrote: Hi, Igor The AM logs are in the Hdfs if you set log aggregation property. Otherwise, they are in the container log directory. See this: http://ko.hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/ Thanks 2015년 2월 25일 수요일, Igor Bogomolovigor.bogomo...@gmail.com님이 작성한 메시지: Hi Drake, Thanks for a pointer. AM log indeed have information about remote map tasks. But I'd like to have more low level details. Like on which node each map task was scheduled and how many bytes was read. That should be exactly in datanode log and I saw it for another job. But after I reinstall the cluster it's not there anymore :( Could you please tell the path where AM log is located (from which you copied the lines)? I found it in web interface but not as file on a disk. And nothing in /var/log/hadoop-* Thanks, Igor On Tue, Feb 24, 2015 at 1:51 AM, Drake민영근 drake@nexr.com wrote: I found this in the mapreduce am log. 2015-02-23 11:22:45,576 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:1 ScheduledMaps:5 ScheduledReds:0 AssignedMaps:0 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 HostLocal:0 RackLocal:0 .. 2015-02-23 11:22:46,641 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:1 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:5 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:5 ContRel:0 HostLocal:3 RackLocal:2 .. The first line says Map tasks are 5 and second says HostLocal 3 and Rack Local 2. I think the Rack Local 2 are the remote map tasks as you mentioned before. Drake 민영근 Ph.D kt NexR On Tue, Feb 24, 2015 at 9:45 AM, Drake민영근 drake@nexr.com wrote: Hi, Igor Did you look at the mapreduce application master log? I think the local or rack local map tasks are logged in the MapReduce AM log. Good luck. Drake 민영근 Ph.D kt NexR On Tue, Feb 24, 2015 at 3:30 AM, Igor Bogomolov igor.bogomo...@gmail.com wrote: Hi all, In a small cluster of 5 nodes that run CDH 5.3.0 (Hadoop 2.5.0) I want to know how many remote map tasks (ones that read input data from remote nodes) there are in a mapreduce job. For this purpose I took logs of each datanode an looked for lines with op: HDFS_READ and cliID field that contains map task id. Surprisingly, 4 datanode logs does not contain lines with op: HDFS_READ. Another 1 has many lines with op: HDFS_READ but all cliID look like DFSClient_NONMAPREDUCE_* and does not contain any map task id. I concluded there are no remote map tasks but that does not look correct. Also even local reads are not logged (because there is no line where cliID field contains some map task id). Could anyone please explain what's wrong? Why logging is not working? (I use default settings). Chris, Found HADOOP-3062 https://issues.apache.org/jira/browse/HADOOP-3062 that you have implemented. Thought you might have an explanation. Best, Igor -- Drake 민영근 Ph.D kt NexR
Re: java.net.UnknownHostException on one node only
Thanks Varun, Where shall I check to resolve it? Regards, Tariq On Mon, Feb 23, 2015 at 4:07 AM, Varun Kumar varun@gmail.com wrote: Hi Tariq, Issues looks like DNS configuration issue. On Sun, Feb 22, 2015 at 3:51 PM, tesm...@gmail.com tesm...@gmail.com wrote: I am getting java.net.UnknownHost exception continuously on one node Hadoop MApReduce execution. That node is accessible via SSH. This node is shown in yarn node -list and hadfs dfsadmin -report queries. Below is the log from execution 15/02/22 20:17:42 INFO mapreduce.Job: Task Id : attempt_1424622614381_0008_m_43_0, Status : FAILED Container launch failed for container_1424622614381_0008_01_16 : java.lang.IllegalArgumentException: *java.net.UnknownHostException: 101-master10* at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:373) at org.apache.hadoop.security.SecurityUtil.setTokenService(SecurityUtil.java:352) at org.apache.hadoop.yarn.util.ConverterUtils.convertFromYarn(ConverterUtils.java:237) at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:218) at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.init(ContainerManagementProtocolProxy.java:196) at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:117) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) *Caused by: java.net.UnknownHostException: 101-master10* ... 12 more 15/02/22 20:17:44 INFO Regards, Tariq -- Regards, Varun Kumar.P
HDFS data after nodes become unavailable?
Dear all, I have transferred the data from local storage to HDFS in my 10 nodes Hadoop cluster. The relication facotr is 3. Some nodes, say 3, are not available after some time. I can't use those nodes for computation or storage of data. What will happen to the data stored on HDFS of those nodes? Do I need to remvoe all the data from HDFS and copy it again? Regards,
NodeManager was not connected to its ResourceManager
Dear Hadoop experts, Firstly, thank you all for answering my previous question(s). Now, I have a hadoop cluster of 8 nodes, we use YARN The information transmitted in this email is intended only for the person or entity to which it is addressed, and may contain material confidential to Xoom Corporation, and/or its subsidiary, buyindiaonline.com Inc. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient(s) is prohibited. If you received this email in error, please contact the sender and delete the material from your files.
Re: video stream as input to sequence files
Can you explain your use case? *...* *“Life should not be a journey to the grave with the intention of arriving safely in apretty and well preserved body, but rather to skid in broadside in a cloud of smoke,thoroughly used up, totally worn out, and loudly proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872* On Wed, Feb 25, 2015 at 4:01 PM, tesm...@gmail.com tesm...@gmail.com wrote: Hi, How can I make my video data files as input for sequence file or to HDFS directly. Regards, Tariq
Re: Hadoop Handy Commands
This should be good start https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CommandsManual.html On Thu, Feb 26, 2015 at 9:19 AM, Krish Donald gotomyp...@gmail.com wrote: Hi, Does anybody ahs Hadoop related commands list which comes really handy when needed? If yes, please share. Thanks Krish
video stream as input to sequence files
Hi, How can I make my video data files as input for sequence file or to HDFS directly. Regards, Tariq
Re: HDFS data after nodes become unavailable?
Do you know why the 3 nodes are down ? With replication, the copy of data that were hosted on those failed nodes will not be available. However, the data will still be served by the hosts having the other 2 copies - so I don't think you need to copy the data again. Unless for some reason the 3 copies of some data ended up on these nodes, in which case those will not be available Maybe you could do a ' hadoop fsck /' to confirm if the HDFS is healthy. -Rajesh On Wed, Feb 25, 2015 at 9:21 AM, tesm...@gmail.com tesm...@gmail.com wrote: Dear all, I have transferred the data from local storage to HDFS in my 10 nodes Hadoop cluster. The relication facotr is 3. Some nodes, say 3, are not available after some time. I can't use those nodes for computation or storage of data. What will happen to the data stored on HDFS of those nodes? Do I need to remvoe all the data from HDFS and copy it again? Regards,
Re: video stream as input to sequence files
Dear Daemeon, Thanks for your rpely. Here is my flow. I am processing video frames using MapReduce. Presently, I convert the video files to individual framess, make a sequence file out of them and transfer the sequence file to HDFS. This flow is not optimized and I need to optimize it. On Thu, Feb 26, 2015 at 3:00 AM, daemeon reiydelle daeme...@gmail.com wrote: Can you explain your use case? *...* *“Life should not be a journey to the grave with the intention of arriving safely in apretty and well preserved body, but rather to skid in broadside in a cloud of smoke,thoroughly used up, totally worn out, and loudly proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872* On Wed, Feb 25, 2015 at 4:01 PM, tesm...@gmail.com tesm...@gmail.com wrote: Hi, How can I make my video data files as input for sequence file or to HDFS directly. Regards, Tariq