Hi Guys!

Thanks for accepting my request. We're using flume to ingest massive amount
of data from a kafka source and we're not sure about how to configure a
flume cluster with HA. This is a brief:

1 - we use kafka to hold intermediate data about our users activity.
2- we use flume to ingest all that data and send it to avro files in hdfs.
3- we wan't to have high availability, that is, not a single agent but a
cluster of agents.
4- the thing is that we cannot have duplicates in the target files. If we
start several agents consuming from the same topic each one of them
potentially could receive the same events, which breaks out the former
constraint.

Is there a way to configure multiple sources such that Kafka see them as a
single one?

Thanks in advance,
-carlos.

Reply via email to