distcp failing to trigger MR

Joshi Omkar Mon, 01 Jun 2015 04:58:12 -0700

Hi,

Please check the background in the trailing mails for more info.


When I'm executing a distcp command to copy from local file system to hdfs, I'm 
getting the following error :

15/06/01 13:53:37 INFO tools.DistCp: DistCp job-id: job_1431689151537_0003
15/06/01 13:53:37 INFO mapreduce.Job: Running job: job_1431689151537_0003
15/06/01 13:53:44 INFO mapreduce.Job: Job job_1431689151537_0003 running in 
uber mode : false
15/06/01 13:53:44 INFO mapreduce.Job:  map 0% reduce 0%
15/06/01 13:53:47 INFO mapreduce.Job: Task Id : 
attempt_1431689151537_0003_m_000000_1000, Status : FAILED
java.io.FileNotFoundException: File /opt/dev/sdb/hadoop/yarn/local/filecache 
does not exist

15/06/01 13:53:51 INFO mapreduce.Job: Task Id : 
attempt_1431689151537_0003_m_000000_1001, Status : FAILED
java.io.FileNotFoundException: File /opt/dev/sdd/hadoop/yarn/local/filecache 
does not exist

15/06/01 13:53:55 INFO mapreduce.Job: Task Id : 
attempt_1431689151537_0003_m_000000_1002, Status : FAILED
java.io.FileNotFoundException: File /opt/dev/sdh/hadoop/yarn/local/filecache 
does not exist

15/06/01 13:54:02 INFO mapreduce.Job:  map 100% reduce 0%
15/06/01 13:54:02 INFO mapreduce.Job: Job job_1431689151537_0003 completed 
successfully
15/06/01 13:54:02 INFO mapreduce.Job: Counters: 34
        File System Counters
                FILE: Number of bytes read=41194
                FILE: Number of bytes written=118592
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=372
               HDFS: Number of bytes written=41194
                HDFS: Number of read operations=15
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=4
        Job Counters
                Failed map tasks=3
                Launched map tasks=4
                Other local map tasks=4
                Total time spent by all maps in occupied slots (ms)=10090
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=10090
                Total vcore-seconds taken by all map tasks=10090
                Total megabyte-seconds taken by all map tasks=10332160
        Map-Reduce Framework
                Map input records=1
                Map output records=0
                Input split bytes=114
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=31
                CPU time spent (ms)=760
                Physical memory (bytes) snapshot=223268864
                Virtual memory (bytes) snapshot=2503770112
                Total committed heap usage (bytes)=1171259392
        File Input Format Counters
                Bytes Read=258
        File Output Format Counters
                Bytes Written=0
        org.apache.hadoop.tools.mapred.CopyMapper$Counter
                BYTESCOPIED=41194
                BYTESEXPECTED=41194
                COPY=1


Regards,
Omkar Joshi
From: Joshi Omkar
Sent: den 15 maj 2015 13:37
To: user@ambari.apache.org
Subject: FW: Change in the DataNode directories !!!

The background is in the trailing mail.

I went to Services -> HDFS -> Configs -> DataNode and I replaced the below 
values with the 7 disk paths with
/opt/dev/sdb/hadoop/hdfs/data
/opt/dev/sdc/hadoop/hdfs/data

And so on.

As expected, I got an alert for the HDFS service/namenode host :

This service-level alert is triggered if the number of corrupt or missing 
blocks exceeds the configured critical threshold. The threshold values are in 
blocks.

I have the below queries :


1.      How can I ensure that now the 7 disks only will be used for the data 
storage(and NO other path like /, /var etc.)? Any more 'clean-ups' or config 
changes required ? Is there some doc. available(I couldn't find)

2.      After I start data loading, how can I verify point 1

Regards,
Omkar Joshi

From: Joshi Omkar
Sent: den 8 maj 2015 15:08
To: user@ambari.apache.org<mailto:user@ambari.apache.org>
Subject: Change in the DataNode directories !!!

Hi,

I have installed the HDP 2.2 with Ambari 2.0.

I'm facing a big issue - I had kept the DataNode directories as :

/nsr/hadoop/hdfs/data,/opt/hadoop/hdfs/data,/usr/hadoop/hdfs/data,/usr/local/hadoop/hdfs/data,/var/hadoop/hdfs/data

Now, I want several disks to be mounted on each node and I want the ALL THE 
DATA BLOCKS TO BE STORED ON THESE DISKS. There is not much data on the HDFS and 
the same can be deleted.

Is it possible now? Will only changing the DataNode directories help or any 
more paths need to be changed ?

Regards,
Omkar Joshi

distcp failing to trigger MR

Reply via email to