Jovi, I have NMS background, so understand your concern.
Answers inline On Fri, Sep 5, 2014 at 12:44 PM, Javi Roman <[email protected]> wrote: > Hello, > > The scenario which is describing Juanfra is related with the question > I made a few days ago [1]. > I am not sure about this, Juanfra is a better person to comment. > > You can not install Flume agents in the SNMP managed devices, and you > can not modify any software in the SNMP managed devide for use Flume > client SDK (if I understand correctly your idea Ashish). There are two > ways for SNMP data collection from SNMP devices using Flume (IMHO): > Agreed. > > 1. To create a custom application which launches the SNMP queries to > the thousand of devices, and log the answer into a file: In this case > Flume can sniff this file with the "exec source" core plugin (tail). > IMHO, would be preferred way for me. Create a Simple SNMP walker which polls Nodes in parallel and writes responses in file. Use Flume's Spool Dir Source and rest flow remains same. I would avoid decoding Events unless it needs to be interpreted in Flume chain > > 2. To use a flume-snmp-source plugin (similar to [2]), in other words, > to shift the SNMP query custom application into a specialized Flume > plugin. > Possible, it's like running #1 solution inside Flume. For both you would need to maintain which all Agents have been sent requests. Async SNMP walker would help you scale to thousands of Agents. > > Juanfra is talking about a scenario like the second point. In that > case you have to handle a huge flume configuration file, with an entry > for each managed device to query. For this situation I guess there are > two possible solutions: > > 1. The flume-snmp-source plugin can use a file with a list of host to > query as parameter: > > agent.sources.source1.host = /path/to/list-of-host-file > > However I guess this breaks the philosophy or the simplicity of other > core plugins of Flume. > > 2. Create a little program to fill the flume configuration file with > a template, or something similar. > I would go with #1, it keep Flume config file simple. We still need to distribute the file but on a small scale. > > > Any other ideas? I guess this is a good discussion about a real world use > case. > > > [1] > http://mail-archives.apache.org/mod_mbox/flume-user/201409.mbox/browser > [2] https://github.com/javiroman/flume-snmp-source > > On Fri, Sep 5, 2014 at 4:56 AM, Ashish <[email protected]> wrote: > > > > Have a look at Flume Client SDK. One simple way would be to use Flume > clients implementations to send Events to Flume Sources, this would > significantly reduce the number of Sources you need to manage. > > > > HTH ! > > > > > > On Thu, Sep 4, 2014 at 9:40 PM, JuanFra Rodriguez Cardoso < > [email protected]> wrote: > >> > >> Thanks Andrew for your quick response. > >> > >> My sources (server PUD) can't put events into an agregation point. For > this reason I'm following a PollingSource schema where my agent needs to be > configured with thousands of sources. Any clues for use cases where data is > injected considering a polling process? > >> > >> Regards! > >> --- > >> JuanFra Rodriguez Cardoso > >> > >> > >> 2014-09-04 17:41 GMT+02:00 Andrew Ehrlich <[email protected]>: > >>> > >>> One way to avoid managing so many sources would be to have an > aggregation point between the data generators the flume sources. For > example, maybe you could have the data generators put events into a message > queue(s), then have flume consume from there? > >>> > >>> Andrew > >>> > >>> ---- On Thu, 04 Sep 2014 08:29:04 -0700 JuanFra Rodriguez Cardoso< > [email protected]> wrote ---- > >>> > >>> Hi all: > >>> > >>> Considering an environment with thousands of sources, which are the > best practices for managing the agent configuration (flume.conf)? Is it > recommended to create a multi-layer topology where each agent takes control > of a subset of sources? > >>> > >>> In that case, a conf mgmg server (such as Puppet) would be responsible > for editing flume.conf with parameters 'agent.sources' from source1 to > source3000 (assuming we have 3000 sources machines). > >>> > >>> Are my thoughts aligned with that scenarios of large scale data ingest? > >>> > >>> Thanks a lot! > >>> --- > >>> JuanFra > >>> > >>> > >> > > > > > > > > -- > > thanks > > ashish > > > > Blog: http://www.ashishpaliwal.com/blog > > My Photo Galleries: http://www.pbase.com/ashishpaliwal > -- thanks ashish Blog: http://www.ashishpaliwal.com/blog My Photo Galleries: http://www.pbase.com/ashishpaliwal
