how many times did you test it? need to rule out aberrations.
On Oct 29, 2012, at 11:30 AM, Harsh J <ha...@cloudera.com> wrote: > On your second low-memory NM instance, did you ensure to lower the > yarn.nodemanager.resource.memory-mb property specifically to avoid > swapping due to excessive resource grants? The default offered is 8 GB > (>> 1.7 GB you have). > > On Mon, Oct 29, 2012 at 8:42 PM, Alexandre Fouche > <alexandre.fou...@cleverscale.com> wrote: >> Hi, >> >> Can someone give some insight on why a "distcp" of 600 files of a few >> hundred bytes from s3n:// to local hdfs is taking 46s when using a >> yarn-nodemanager EC2 instance with 16GB memory (which by the way i think is >> jokingly long), and taking 3mn30s when adding a second yarn-nodemanager (a >> small instance with 1.7GB memory) ? >> I would have expected it to be a bit faster, not 5xlonger ! >> >> I have the same issue when i stop the small instance nodemanager and restart >> it to join the processing after the big nodemanager instance was already >> submitted the job. >> >> I am using Cloudera latest Yarn+HDFS on Amazon (rebranded Centos 6) >> >> #Staging 14:58:04 root@datanode2:hadoop-yarn: rpm -qa |grep hadoop >> hadoop-hdfs-datanode-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64 >> hadoop-mapreduce-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64 >> hadoop-0.20-mapreduce-0.20.2+1261-1.cdh4.1.1.p0.4.el6.x86_64 >> hadoop-yarn-nodemanager-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64 >> hadoop-mapreduce-historyserver-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64 >> hadoop-hdfs-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64 >> hadoop-client-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64 >> hadoop-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64 >> hadoop-yarn-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64 >> >> >> #Staging 14:39:51 root@resourcemanager:hadoop-yarn: >> HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce time hadoop distcp -overwrite >> s3n://xxx:xxx@s3n.hadoop.cwsdev/* hdfs:///tmp/something/a >> >> 12/10/29 14:40:12 INFO tools.DistCp: Input Options: >> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, >> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', >> copyStrategy='uniformsize', sourceFileListing=null, >> sourcePaths=[s3n://xxx:xxx@s3n.hadoop.cwsdev/*], >> targetPath=hdfs:/tmp/something/a} >> 12/10/29 14:40:18 WARN conf.Configuration: io.sort.mb is deprecated. >> Instead, use mapreduce.task.io.sort.mb >> 12/10/29 14:40:18 WARN conf.Configuration: io.sort.factor is deprecated. >> Instead, use mapreduce.task.io.sort.factor >> 12/10/29 14:40:19 INFO mapreduce.JobSubmitter: number of splits:15 >> 12/10/29 14:40:19 WARN conf.Configuration: mapred.jar is deprecated. >> Instead, use mapreduce.job.jar >> 12/10/29 14:40:19 WARN conf.Configuration: >> mapred.map.tasks.speculative.execution is deprecated. Instead, use >> mapreduce.map.speculative >> 12/10/29 14:40:19 WARN conf.Configuration: mapred.reduce.tasks is >> deprecated. Instead, use mapreduce.job.reduces >> 12/10/29 14:40:19 WARN conf.Configuration: mapred.mapoutput.value.class >> is deprecated. Instead, use mapreduce.map.output.value.class >> 12/10/29 14:40:19 WARN conf.Configuration: mapreduce.map.class is >> deprecated. Instead, use mapreduce.job.map.class >> 12/10/29 14:40:19 WARN conf.Configuration: mapred.job.name is >> deprecated. Instead, use mapreduce.job.name >> 12/10/29 14:40:19 WARN conf.Configuration: mapreduce.inputformat.class >> is deprecated. Instead, use mapreduce.job.inputformat.class >> 12/10/29 14:40:19 WARN conf.Configuration: mapred.output.dir is >> deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir >> 12/10/29 14:40:19 WARN conf.Configuration: mapreduce.outputformat.class >> is deprecated. Instead, use mapreduce.job.outputformat.class >> 12/10/29 14:40:19 WARN conf.Configuration: mapred.map.tasks is >> deprecated. Instead, use mapreduce.job.maps >> 12/10/29 14:40:19 WARN conf.Configuration: mapred.mapoutput.key.class is >> deprecated. Instead, use mapreduce.map.output.key.class >> 12/10/29 14:40:19 WARN conf.Configuration: mapred.working.dir is >> deprecated. Instead, use mapreduce.job.working.dir >> 12/10/29 14:40:20 INFO mapred.ResourceMgrDelegate: Submitted application >> application_1351504801306_0015 to ResourceManager at >> resourcemanager.cwsdev.cleverscale.com/10.60.106.130:8032 >> 12/10/29 14:40:20 INFO mapreduce.Job: The url to track the job: >> http://ip-10-60-106-130.ec2.internal:8088/proxy/application_1351504801306_0015/ >> 12/10/29 14:40:20 INFO tools.DistCp: DistCp job-id: >> job_1351504801306_0015 >> 12/10/29 14:40:20 INFO mapreduce.Job: Running job: >> job_1351504801306_0015 >> 12/10/29 14:40:27 INFO mapreduce.Job: Job job_1351504801306_0015 running >> in uber mode : false >> 12/10/29 14:40:27 INFO mapreduce.Job: map 0% reduce 0% >> 12/10/29 14:40:42 INFO mapreduce.Job: map 6% reduce 0% >> 12/10/29 14:40:43 INFO mapreduce.Job: map 33% reduce 0% >> 12/10/29 14:40:44 INFO mapreduce.Job: map 40% reduce 0% >> 12/10/29 14:40:48 INFO mapreduce.Job: map 46% reduce 0% >> 12/10/29 14:43:04 INFO mapreduce.Job: map 56% reduce 0% >> 12/10/29 14:43:05 INFO mapreduce.Job: map 58% reduce 0% >> 12/10/29 14:43:08 INFO mapreduce.Job: map 62% reduce 0% >> 12/10/29 14:43:09 INFO mapreduce.Job: map 68% reduce 0% >> 12/10/29 14:43:15 INFO mapreduce.Job: map 75% reduce 0% >> 12/10/29 14:43:16 INFO mapreduce.Job: map 82% reduce 0% >> 12/10/29 14:43:25 INFO mapreduce.Job: map 85% reduce 0% >> 12/10/29 14:43:26 INFO mapreduce.Job: map 87% reduce 0% >> 12/10/29 14:43:29 INFO mapreduce.Job: map 90% reduce 0% >> 12/10/29 14:43:35 INFO mapreduce.Job: map 93% reduce 0% >> 12/10/29 14:43:37 INFO mapreduce.Job: map 96% reduce 0% >> 12/10/29 14:43:40 INFO mapreduce.Job: map 100% reduce 0% >> 12/10/29 14:43:40 INFO mapreduce.Job: Job job_1351504801306_0015 >> completed successfully >> 12/10/29 14:43:40 INFO mapreduce.Job: Counters: 35 >> File System Counters >> FILE: Number of bytes read=1800 >> FILE: Number of bytes written=1050895 >> FILE: Number of read operations=0 >> FILE: Number of large read operations=0 >> FILE: Number of write operations=0 >> HDFS: Number of bytes read=22157 >> HDFS: Number of bytes written=101379 >> HDFS: Number of read operations=519 >> HDFS: Number of large read operations=0 >> HDFS: Number of write operations=201 >> S3N: Number of bytes read=101379 >> S3N: Number of bytes written=0 >> S3N: Number of read operations=0 >> S3N: Number of large read operations=0 >> S3N: Number of write operations=0 >> Job Counters >> Launched map tasks=15 >> Other local map tasks=15 >> Total time spent by all maps in occupied slots (ms)=12531208 >> Total time spent by all reduces in occupied slots (ms)=0 >> Map-Reduce Framework >> Map input records=57 >> Map output records=0 >> Input split bytes=2010 >> Spilled Records=0 >> Failed Shuffles=0 >> Merged Map outputs=0 >> GC time elapsed (ms)=42324 >> CPU time spent (ms)=54890 >> Physical memory (bytes) snapshot=2923872256 >> Virtual memory (bytes) snapshot=12526301184 >> Total committed heap usage (bytes)=1618280448 >> File Input Format Counters >> Bytes Read=20147 >> File Output Format Counters >> Bytes Written=0 >> org.apache.hadoop.tools.mapred.CopyMapper$Counter >> BYTESCOPIED=101379 >> BYTESEXPECTED=101379 >> COPY=57 >> >> 6.90user 0.59system 3:29.17elapsed 3%CPU (0avgtext+0avgdata >> 819392maxresident)k >> 0inputs+344outputs (0major+62847minor)pagefaults 0swaps >> >> >> >> -- >> Alexandre Fouche > > > > -- > Harsh J >