Thanks for the clarification. On Fri, Nov 27, 2015 at 2:15 PM, Gonzalo Herreros <[email protected]> wrote:
> Yes, the best way to consolidate multiple sources is to use an avro sinks > that forwards to the agent that writes to hdfs (which exposes an avro > source to listen to the other avro sinks). > > > On 27 November 2015 at 08:28, zaenal rifai <[email protected]> wrote: > >> sorry, i mean avro sink >> >> >> >> >> On 27 November 2015 at 14:52, Gonzalo Herreros <[email protected]> >> wrote: >> >>> Hi Zaenal, >>> >>> There is no "avro channel", Flume will write by default avro to any of >>> the channels. >>> The point is that a memory channel or even a file channel will very >>> quickly fill up because a single sink cannot keep up with the many sources. >>> >>> Regards, >>> Gonzalo >>> >>> On 27 November 2015 at 03:43, zaenal rifai <[email protected]> >>> wrote: >>> >>>> why not to use avro channel gonzalo ? >>>> >>>> On 26 November 2015 at 20:12, Gonzalo Herreros <[email protected]> >>>> wrote: >>>> >>>>> You cannot have multiple processes writing concurrently to the same >>>>> hdfs file. >>>>> What you can do is have a topology where many agents forward to an >>>>> agent that writes to hdfs but you need a channel that allows the single >>>>> hdfs writer to lag behind without slowing the sources. >>>>> A kafka channel might be a good choice. >>>>> >>>>> Regards, >>>>> Gonzalo >>>>> >>>>> On 26 November 2015 at 11:57, yogendra reddy <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi All, >>>>>> >>>>>> Here's my current flume setup for a hadoop cluster to collect service >>>>>> logs >>>>>> >>>>>> - Run flume agent in each of the nodes >>>>>> - Configure flume sink to write to hdfs and the files end up in this >>>>>> way >>>>>> >>>>>> ..flume/events/node0logfile >>>>>> ..flume/events/node1logfile >>>>>> >>>>>> ..flume/events/nodeNlogfile >>>>>> >>>>>> But I want to be able to write all the logs from multiple agents to a >>>>>> single file in hdfs . How can I achieve this and what would the topology >>>>>> look like. >>>>>> can this be done via collector ? If yes, where can I run the >>>>>> collector and how will this scale for a 1000+ node cluster. >>>>>> >>>>>> Thanks, >>>>>> Yogendra >>>>>> >>>>> >>>>> >>>> >>> >> >
