RE: persisent services in Hadoop

John Lilley Fri, 27 Jun 2014 08:15:13 -0700

Thanks Arun!
I do think we are on the bleeding edge of YARN, because everyone else in our 
application space generates MapReduce (Pig, Hive), or they have overlaid their 
legacy server-grid on Hadoop.
I will explore both resources you mentioned to see where the development 
community is headed.
Cheers,
john

From: Arun Murthy [mailto:a...@hortonworks.com]
Sent: Wednesday, June 25, 2014 11:50 PM
To: user@hadoop.apache.org
Subject: Re: persisent services in Hadoop

John,

 We are excited to see ISVs like you get value from YARN, and appreciate the 
patience you've already shown in the past to work through the teething issues 
of YARN & hadoop-2.x.

 W.r.t long-running services, the most straight-forward option is to go through 
Apache Slider (http://slider.incubator.apache.org/). Slider has already made 
good progress in supporting various long-running services such as Apache HBase, 
Apache Accumulo & Apache Storm. I'm very sure the Slider community would be 
very welcoming of your use-cases, suggestions etc. - particularly as they are 
gearing up to support various applications atop; and would love your feedback.

 Furthemore, there is work going on in YARN itself to better support your use 
case: https://issues.apache.org/jira/browse/YARN-896.
 Again, your feedback there is very, very welcome.

 Also, you might be interested in 
https://issues.apache.org/jira/browse/YARN-1530 which provides a generic 
framework for collecting application metrics for YARN applications.

 Hope that helps.

thanks,
Arun

On Wed, Jun 25, 2014 at 1:48 PM, John Lilley 
<john.lil...@redpoint.net<mailto:john.lil...@redpoint.net>> wrote:
We are an ISV that currently ships a data-quality/integration suite running as 
a native YARN application.  We are finding several use cases that would benefit 
from being able to manage a per-node persistent service.  MapReduce has its 
“shuffle auxiliary service”, but it isn’t straightforward to add auxiliary 
services because they cannot be loaded from HDFS, so we’d have to manage the 
distribution of JARs across nodes (please tell me if I’m wrong here…).  Given 
that, is there a preferred method for managing persistent services on a Hadoop 
cluster?  We could have an AM that creates a set of YARN tasks and just waits 
until YARN gives a task on each node, and restart any failed tasks, but it 
doesn’t really fit the AM/container structure very well.  I’ve also read about 
Slider, which looks interesting.  Other ideas?
--john

--

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader of 
this message is not the intended recipient, you are hereby notified that any 
printing, copying, dissemination, distribution, disclosure or forwarding of 
this communication is strictly prohibited. If you have received this 
communication in error, please contact the sender immediately and delete it 
from your system. Thank You.

RE: persisent services in Hadoop

Reply via email to