Got it Ashish! Thanks for your support and suggestions Cheers, JuanFra El 05/09/2014 13:28, "Ashish" <[email protected]> escribió:
> > > > On Fri, Sep 5, 2014 at 4:01 PM, JuanFra Rodriguez Cardoso < > [email protected]> wrote: > >> Thanks, both of you! >> >> @Ashish, Javi's thoughts are right. My use case is focused on sources for >> consuming SNMP traps. I came here from the already open discussion [1] with >> hopes anyone was facing with this problem. >> >> Your solution based on Async SNMP walker would help us to scale that >> thousands of agents, but it would reduce any sort of scenario to the same >> process: >> >> 1. Code a custom collector (async or not) for sending data to flume spool >> dir >> 2. Agents' sources would consume data from that dir. >> > > You might not need to code a Custom Collector, NMS system do that already. > So if you have an NMS system in place, may be it can do this polling for > you and dump records, where Flume can pick the same. > > If this is not an option, you need to write a custom collector, standalone > or within Flume Source. > I went through the SNMP Source, and have a suggestion. If PDU decoding can > be avoided, it would save a lot of CPU at collection tier. No action is > being taken in Source, so raw PDU can be offloaded to channel. I wrote SNMP > ping long back. Problem statement was similar, poll SNMP agents for > specific OID's. I didn't use SNMP lib directly, I just just it to encode > and decode packet and managed network layer myself. > > >> Don't you think it would be more suitable to include an option in >> flume.conf (path/to/list-of-thousands-sources) as Javi commented above? >> This way, agent's configuration would be easier to manage. >> > > I think I agreed to this option :) > > >> >> [1] https://issues.apache.org/jira/browse/FLUME-2039 >> >> Regards, >> --- >> JuanFra Rodriguez Cardoso >> >> >> 2014-09-05 10:20 GMT+02:00 Ashish <[email protected]>: >> >>> Jovi, >>> >>> I have NMS background, so understand your concern. >>> >>> Answers inline >>> >>> >>> On Fri, Sep 5, 2014 at 12:44 PM, Javi Roman <[email protected]> >>> wrote: >>> >>>> Hello, >>>> >>>> The scenario which is describing Juanfra is related with the question >>>> I made a few days ago [1]. >>>> >>> >>> I am not sure about this, Juanfra is a better person to comment. >>> >>> >>>> >>>> You can not install Flume agents in the SNMP managed devices, and you >>>> can not modify any software in the SNMP managed devide for use Flume >>>> client SDK (if I understand correctly your idea Ashish). There are two >>>> ways for SNMP data collection from SNMP devices using Flume (IMHO): >>>> >>> >>> Agreed. >>> >>> >>>> >>>> 1. To create a custom application which launches the SNMP queries to >>>> the thousand of devices, and log the answer into a file: In this case >>>> Flume can sniff this file with the "exec source" core plugin (tail). >>>> >>> >>> IMHO, would be preferred way for me. Create a Simple SNMP walker >>> which polls Nodes in parallel and writes responses in file. >>> Use Flume's Spool Dir Source and rest flow remains same. >>> I would avoid decoding Events unless it needs to be interpreted in Flume >>> chain >>> >>> >>>> >>>> 2. To use a flume-snmp-source plugin (similar to [2]), in other words, >>>> to shift the SNMP query custom application into a specialized Flume >>>> plugin. >>>> >>> >>> Possible, it's like running #1 solution inside Flume. >>> For both you would need to maintain which all Agents have been sent >>> requests. >>> Async SNMP walker would help you scale to thousands of Agents. >>> >>> >>>> >>>> Juanfra is talking about a scenario like the second point. In that >>>> case you have to handle a huge flume configuration file, with an entry >>>> for each managed device to query. For this situation I guess there are >>>> two possible solutions: >>>> >>>> 1. The flume-snmp-source plugin can use a file with a list of host to >>>> query as parameter: >>>> >>>> agent.sources.source1.host = /path/to/list-of-host-file >>>> >>>> However I guess this breaks the philosophy or the simplicity of other >>>> core plugins of Flume. >>>> >>>> 2. Create a little program to fill the flume configuration file with >>>> a template, or something similar. >>>> >>> >>> I would go with #1, it keep Flume config file simple. We still need to >>> distribute the file but on a small scale. >>> >>> >>>> >>>> >>>> Any other ideas? I guess this is a good discussion about a real world >>>> use case. >>>> >>>> >>>> [1] >>>> http://mail-archives.apache.org/mod_mbox/flume-user/201409.mbox/browser >>>> [2] https://github.com/javiroman/flume-snmp-source >>>> >>>> On Fri, Sep 5, 2014 at 4:56 AM, Ashish <[email protected]> wrote: >>>> > >>>> > Have a look at Flume Client SDK. One simple way would be to use Flume >>>> clients implementations to send Events to Flume Sources, this would >>>> significantly reduce the number of Sources you need to manage. >>>> > >>>> > HTH ! >>>> > >>>> > >>>> > On Thu, Sep 4, 2014 at 9:40 PM, JuanFra Rodriguez Cardoso < >>>> [email protected]> wrote: >>>> >> >>>> >> Thanks Andrew for your quick response. >>>> >> >>>> >> My sources (server PUD) can't put events into an agregation point. >>>> For this reason I'm following a PollingSource schema where my agent needs >>>> to be configured with thousands of sources. Any clues for use cases where >>>> data is injected considering a polling process? >>>> >> >>>> >> Regards! >>>> >> --- >>>> >> JuanFra Rodriguez Cardoso >>>> >> >>>> >> >>>> >> 2014-09-04 17:41 GMT+02:00 Andrew Ehrlich <[email protected]>: >>>> >>> >>>> >>> One way to avoid managing so many sources would be to have an >>>> aggregation point between the data generators the flume sources. For >>>> example, maybe you could have the data generators put events into a message >>>> queue(s), then have flume consume from there? >>>> >>> >>>> >>> Andrew >>>> >>> >>>> >>> ---- On Thu, 04 Sep 2014 08:29:04 -0700 JuanFra Rodriguez Cardoso< >>>> [email protected]> wrote ---- >>>> >>> >>>> >>> Hi all: >>>> >>> >>>> >>> Considering an environment with thousands of sources, which are the >>>> best practices for managing the agent configuration (flume.conf)? Is it >>>> recommended to create a multi-layer topology where each agent takes control >>>> of a subset of sources? >>>> >>> >>>> >>> In that case, a conf mgmg server (such as Puppet) would be >>>> responsible for editing flume.conf with parameters 'agent.sources' from >>>> source1 to source3000 (assuming we have 3000 sources machines). >>>> >>> >>>> >>> Are my thoughts aligned with that scenarios of large scale data >>>> ingest? >>>> >>> >>>> >>> Thanks a lot! >>>> >>> --- >>>> >>> JuanFra >>>> >>> >>>> >>> >>>> >> >>>> > >>>> > >>>> > >>>> > -- >>>> > thanks >>>> > ashish >>>> > >>>> > Blog: http://www.ashishpaliwal.com/blog >>>> > My Photo Galleries: http://www.pbase.com/ashishpaliwal >>>> >>> >>> >>> >>> -- >>> thanks >>> ashish >>> >>> Blog: http://www.ashishpaliwal.com/blog >>> My Photo Galleries: http://www.pbase.com/ashishpaliwal >>> >> >> > > > -- > thanks > ashish > > Blog: http://www.ashishpaliwal.com/blog > My Photo Galleries: http://www.pbase.com/ashishpaliwal >
