Hi

I try to set up flume high availability
From rsyslog comes same feed to two different servers s1 and s2.
In both servers are configured flume-agents to listen feed from rsyslog.
Both agents are writing feed to HDFS.
What I am getting into HDFS is different files with duplicated content.

Is there any best practice architecture how to use flume in situations like this. What I am trying to avoid is in situation when one server is down then syslog is forwarded into two servers and at least one can transport events to HDFS.

At the moment I thought I can clean after some time duplicates before hive will use directory.

--
Margus (margusja) Roo
http://margus.roo.ee
skype: margusja
+372 51 48 780

Reply via email to