Thanks Jun and Francois.

We used Kafka 0.8.0 previously. We got some weird error when expanding
cluster and it couldn't be finished.
Now we use 0.8.1.1, I would have a try on cluster expansion sometime.

I read the discussion on that jira issue and I agree with points raised
there.
HDFS was also improved a lot since then and many issues have been resolved
(e.g. SPOF).

We have a team for building and providing storage/computing platform for
our company and we have already provided a Hadoop cluster.
If Kafka has an option to store data on HDFS, we just need to allocate some
space quota for it on our cluster (and increase it on demand) and it might
reduce our operational cost a lot.

Another (and maybe more aggressive) thought is about the deployment. Jun
has a good point: "HDFS only provides data redundancy, but not
computational redundancy". If Kafka could be deployed on YARN, it could
offload some computational resource management to YARN and we don't have to
allocate machines physically. Kafka still needs to take care of load
balance and partition assignment among brokers by itself.
Many computational frameworks like spark/samza have such an option and it's
a big attractive point for us.

Best,
Hangjun


2014-05-20 21:00 GMT+08:00 François Langelier <f.langel...@gmail.com>:

> Take a look at Camus <https://github.com/linkedin/camus/>
>
>
>
> François Langelier
> Étudiant en génie Logiciel - École de Technologie
> Supérieure<http://www.etsmtl.ca/>
> Capitaine Club Capra <http://capra.etsmtl.ca/>
> VP-Communication - CS Games <http://csgames.org> 2014
> Jeux de Génie <http://www.jdgets.com/> 2011 à 2014
> Argentier Fraternité du Piranha <http://fraternitedupiranha.com/>
> 2012-2014
> Comité Organisateur Olympiades ÉTS 2012
> Compétition Québécoise d'Ingénierie 2012 - Compétition Senior
>
>
> On 19 May 2014 05:28, Hangjun Ye <yehang...@gmail.com> wrote:
>
> > Hi there,
> >
> > I recently started to use Kafka for our data analysis pipeline and it
> works
> > very well.
> >
> > One problem to us so far is expanding our cluster when we need more
> storage
> > space.
> > Kafka provides some scripts for helping do this but the process wasn't
> > smooth.
> >
> > To make it work perfectly, seems Kafka needs to do some jobs that a
> > distributed file system has already done.
> > So just wondering if any thoughts to make Kafka work on top of HDFS?
> Maybe
> > make the Kafka storage engine pluggable and HDFS is one option?
> >
> > The pros might be that HDFS has already handled storage management
> > (replication, corrupted disk/machine, migration, load balance, etc.) very
> > well and it frees Kafka and the users from the burden, and the cons might
> > be performance degradation.
> > As Kafka does very well on performance, possibly even with some degree of
> > degradation, it's still competitive for the most situations.
> >
> > Best,
> > --
> > Hangjun Ye
> >
>



-- 
Hangjun Ye

Reply via email to