Historically, many applications/frameworks wanted to take advantage of just the 
resource management capabilities and failure handling of Hadoop (via 
JobTracker/TaskTracker), but were forced to used MapReduce even though they 
didn't have to. Obvious examples are graph processing (Giraph), BSP(Hama), 
storm/s4 and even a simple tool like DistCp.

There are issues even with map-only jobs.
 - You have to fake key-value processing, periodic pings, key-value outputs
 - You are limited to map slot capacity in the cluster
 - The number of tasks is static, so you cannot grow and shrink your job
 - You are forced to sort data all the time (even though this has changed 
recently)
 - You are tied to faking things like OutputCommit even if you don't need to.

That's just for starters. I can definitely think harder and list more ;)

YARN lets you move ahead without those limitations.

HTH
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/


On May 29, 2013, at 7:34 AM, Rahul Bhattacharjee wrote:

> Hi all,
> 
> I was going through the motivation behind Yarn. Splitting the responsibility 
> of JT is the major concern.Ultimately the base (Yarn) was built in a generic 
> way for building other generic distributed applications too.
> 
> I am not able to think of any other parallel processing use case that would 
> be useful to built on top of YARN. I though of a lot of use cases that would 
> be beneficial when run in parallel , but again ,we can do those using map 
> only jobs in MR.
> 
> Can someone tell me a scenario , where a application can utilize Yarn 
> features or can be built on top of YARN and at the same time , it cannot be 
> done efficiently using MRv2 jobs.
> 
> thanks,
> Rahul
> 
> 

Reply via email to