Thanks Mahesh Till now I am not able to run the whole job in a limited time period. So I am looking for optimizations and resource utilization. May be I can try tweaking input split size if it helps.
Thanks for your help, It explains the behaviour -- Madhav Sharan On Tue, Aug 9, 2016 at 1:28 PM, Mahesh Balija <balijamahesh....@gmail.com> wrote: > Hi Madhav, > > The behaviour to me sounds normal. > If the Block Size is 128 MB there could possibly be ~24 Mappers (i.e., > containers used). > You cannot use entire cluster as the blocks could be only in the nodes > being used. > > You should not try using the entire cluster resources for following reason > > The time required to initialize the container vs the time required to > process the amount of data should be optimum to maximize the conainer > utilization, that is why the block size 128 MB is been choosen, in many > cases this InputSplit size is increased to optimize the containers > utilization depending on the workloads. > > Best, > Mahesh.B. > > > > On Tue, Aug 9, 2016 at 12:19 AM, Madhav Sharan <msha...@usc.edu> wrote: > >> Hi Hadoop users, >> >> I am running a m/r job with an input file of 23 million records. I can >> see all our files are not getting used. >> >> What can I change to utilize all nodes? >> >> >> Containers Mem Used Mem Avail Vcores used Vcores avail >> 8 11.25 GB 0 B 8 0 >> 0 0 B 11.25 GB 0 8 >> 0 0 B 11.25 GB 0 8 >> 8 11.25 GB 0 B 8 0 >> 8 11.25 GB 0 B 8 0 >> 7 11.25 GB 0 B 7 1 >> 5 7.03 GB 4.22 GB 5 3 >> 0 0 B 11.25 GB 0 8 >> 0 0 B 11.25 GB 0 8 >> >> >> My command looks like - >> >> hadoop jar target/pooled-time-series-1.0-SNAPSHOT-jar-with-dependencies.jar >> gov.nasa.jpl.memex.pooledtimeseries.MeanChiSquareDistanceCalculation >> /user/pts/output/MeanChiSquareAndSimilarityInput >> /user/pts/output/MeanChiSquaredCalcOutput >> >> Directory - */user/pts/output/MeanChiSquareAndSimilarityInput* have a >> input file of 23 m records. File size is ~3 GB >> >> Code - https://github.com/smadha/pooled_time_series/blob/master/src >> /main/java/gov/nasa/jpl/memex/pooledtimeseries/MeanChiSquare >> DistanceCalculation.java#L135 >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_smadha_pooled-5Ftime-5Fseries_blob_master_src_main_java_gov_nasa_jpl_memex_pooledtimeseries_MeanChiSquareDistanceCalculation.java-23L135&d=DQMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=DhBa2eLkbd4gAFB01lkNgg&m=ZQO-otgJ4EOvBzmchAV--4QdJcYvW3BYTxuPziQ53EM&s=tCPLOH7YJVRXRKfaD8HM3f-imDvx5VACqBiAXkK7S1w&e=> >> >> >> -- >> Madhav Sharan >> >> >