You cannot have multiple processes writing concurrently to the same hdfs file. What you can do is have a topology where many agents forward to an agent that writes to hdfs but you need a channel that allows the single hdfs writer to lag behind without slowing the sources. A kafka channel might be a good choice.
Regards, Gonzalo On 26 November 2015 at 11:57, yogendra reddy <[email protected]> wrote: > Hi All, > > Here's my current flume setup for a hadoop cluster to collect service logs > > - Run flume agent in each of the nodes > - Configure flume sink to write to hdfs and the files end up in this way > > ..flume/events/node0logfile > ..flume/events/node1logfile > > ..flume/events/nodeNlogfile > > But I want to be able to write all the logs from multiple agents to a > single file in hdfs . How can I achieve this and what would the topology > look like. > can this be done via collector ? If yes, where can I run the collector and > how will this scale for a 1000+ node cluster. > > Thanks, > Yogendra >
