Hangjun,
  Does having Kafka in Yarn would be a big architectural change from where
it is now?  From what I have seen on most typical setup you want machines
optimized for Kafka, not just it on top of hdfs.
-Steve


On Tue, May 20, 2014 at 8:37 PM, Hangjun Ye <yehang...@gmail.com> wrote:

> Thanks Jun and Francois.
>
> We used Kafka 0.8.0 previously. We got some weird error when expanding
> cluster and it couldn't be finished.
> Now we use 0.8.1.1, I would have a try on cluster expansion sometime.
>
> I read the discussion on that jira issue and I agree with points raised
> there.
> HDFS was also improved a lot since then and many issues have been resolved
> (e.g. SPOF).
>
> We have a team for building and providing storage/computing platform for
> our company and we have already provided a Hadoop cluster.
> If Kafka has an option to store data on HDFS, we just need to allocate some
> space quota for it on our cluster (and increase it on demand) and it might
> reduce our operational cost a lot.
>
> Another (and maybe more aggressive) thought is about the deployment. Jun
> has a good point: "HDFS only provides data redundancy, but not
> computational redundancy". If Kafka could be deployed on YARN, it could
> offload some computational resource management to YARN and we don't have to
> allocate machines physically. Kafka still needs to take care of load
> balance and partition assignment among brokers by itself.
> Many computational frameworks like spark/samza have such an option and it's
> a big attractive point for us.
>
> Best,
> Hangjun
>
>
> 2014-05-20 21:00 GMT+08:00 François Langelier <f.langel...@gmail.com>:
>
> > Take a look at Camus <https://github.com/linkedin/camus/>
> >
> >
> >
> > François Langelier
> > Étudiant en génie Logiciel - École de Technologie
> > Supérieure<http://www.etsmtl.ca/>
> > Capitaine Club Capra <http://capra.etsmtl.ca/>
> > VP-Communication - CS Games <http://csgames.org> 2014
> > Jeux de Génie <http://www.jdgets.com/> 2011 à 2014
> > Argentier Fraternité du Piranha <http://fraternitedupiranha.com/>
> > 2012-2014
> > Comité Organisateur Olympiades ÉTS 2012
> > Compétition Québécoise d'Ingénierie 2012 - Compétition Senior
> >
> >
> > On 19 May 2014 05:28, Hangjun Ye <yehang...@gmail.com> wrote:
> >
> > > Hi there,
> > >
> > > I recently started to use Kafka for our data analysis pipeline and it
> > works
> > > very well.
> > >
> > > One problem to us so far is expanding our cluster when we need more
> > storage
> > > space.
> > > Kafka provides some scripts for helping do this but the process wasn't
> > > smooth.
> > >
> > > To make it work perfectly, seems Kafka needs to do some jobs that a
> > > distributed file system has already done.
> > > So just wondering if any thoughts to make Kafka work on top of HDFS?
> > Maybe
> > > make the Kafka storage engine pluggable and HDFS is one option?
> > >
> > > The pros might be that HDFS has already handled storage management
> > > (replication, corrupted disk/machine, migration, load balance, etc.)
> very
> > > well and it frees Kafka and the users from the burden, and the cons
> might
> > > be performance degradation.
> > > As Kafka does very well on performance, possibly even with some degree
> of
> > > degradation, it's still competitive for the most situations.
> > >
> > > Best,
> > > --
> > > Hangjun Ye
> > >
> >
>
>
>
> --
> Hangjun Ye
>

Reply via email to