Hi, Please check the background in the trailing mails for more info.
When I'm executing a distcp command to copy from local file system to hdfs, I'm getting the following error : 15/06/01 13:53:37 INFO tools.DistCp: DistCp job-id: job_1431689151537_0003 15/06/01 13:53:37 INFO mapreduce.Job: Running job: job_1431689151537_0003 15/06/01 13:53:44 INFO mapreduce.Job: Job job_1431689151537_0003 running in uber mode : false 15/06/01 13:53:44 INFO mapreduce.Job: map 0% reduce 0% 15/06/01 13:53:47 INFO mapreduce.Job: Task Id : attempt_1431689151537_0003_m_000000_1000, Status : FAILED java.io.FileNotFoundException: File /opt/dev/sdb/hadoop/yarn/local/filecache does not exist 15/06/01 13:53:51 INFO mapreduce.Job: Task Id : attempt_1431689151537_0003_m_000000_1001, Status : FAILED java.io.FileNotFoundException: File /opt/dev/sdd/hadoop/yarn/local/filecache does not exist 15/06/01 13:53:55 INFO mapreduce.Job: Task Id : attempt_1431689151537_0003_m_000000_1002, Status : FAILED java.io.FileNotFoundException: File /opt/dev/sdh/hadoop/yarn/local/filecache does not exist 15/06/01 13:54:02 INFO mapreduce.Job: map 100% reduce 0% 15/06/01 13:54:02 INFO mapreduce.Job: Job job_1431689151537_0003 completed successfully 15/06/01 13:54:02 INFO mapreduce.Job: Counters: 34 File System Counters FILE: Number of bytes read=41194 FILE: Number of bytes written=118592 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=372 HDFS: Number of bytes written=41194 HDFS: Number of read operations=15 HDFS: Number of large read operations=0 HDFS: Number of write operations=4 Job Counters Failed map tasks=3 Launched map tasks=4 Other local map tasks=4 Total time spent by all maps in occupied slots (ms)=10090 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=10090 Total vcore-seconds taken by all map tasks=10090 Total megabyte-seconds taken by all map tasks=10332160 Map-Reduce Framework Map input records=1 Map output records=0 Input split bytes=114 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=31 CPU time spent (ms)=760 Physical memory (bytes) snapshot=223268864 Virtual memory (bytes) snapshot=2503770112 Total committed heap usage (bytes)=1171259392 File Input Format Counters Bytes Read=258 File Output Format Counters Bytes Written=0 org.apache.hadoop.tools.mapred.CopyMapper$Counter BYTESCOPIED=41194 BYTESEXPECTED=41194 COPY=1 Regards, Omkar Joshi From: Joshi Omkar Sent: den 15 maj 2015 13:37 To: user@ambari.apache.org Subject: FW: Change in the DataNode directories !!! The background is in the trailing mail. I went to Services -> HDFS -> Configs -> DataNode and I replaced the below values with the 7 disk paths with /opt/dev/sdb/hadoop/hdfs/data /opt/dev/sdc/hadoop/hdfs/data And so on. As expected, I got an alert for the HDFS service/namenode host : This service-level alert is triggered if the number of corrupt or missing blocks exceeds the configured critical threshold. The threshold values are in blocks. I have the below queries : 1. How can I ensure that now the 7 disks only will be used for the data storage(and NO other path like /, /var etc.)? Any more 'clean-ups' or config changes required ? Is there some doc. available(I couldn't find) 2. After I start data loading, how can I verify point 1 Regards, Omkar Joshi From: Joshi Omkar Sent: den 8 maj 2015 15:08 To: user@ambari.apache.org<mailto:user@ambari.apache.org> Subject: Change in the DataNode directories !!! Hi, I have installed the HDP 2.2 with Ambari 2.0. I'm facing a big issue - I had kept the DataNode directories as : /nsr/hadoop/hdfs/data,/opt/hadoop/hdfs/data,/usr/hadoop/hdfs/data,/usr/local/hadoop/hdfs/data,/var/hadoop/hdfs/data Now, I want several disks to be mounted on each node and I want the ALL THE DATA BLOCKS TO BE STORED ON THESE DISKS. There is not much data on the HDFS and the same can be deleted. Is it possible now? Will only changing the DataNode directories help or any more paths need to be changed ? Regards, Omkar Joshi