Re: Collecting thousands of sources

Ashish Fri, 05 Sep 2014 01:52:28 -0700

Jovi,

I have NMS background, so understand your concern.


Answers inline


On Fri, Sep 5, 2014 at 12:44 PM, Javi Roman <[email protected]> wrote:

> Hello,
>
> The scenario which is describing Juanfra is related with the question
> I made a few days ago [1].
>

I am not sure about this, Juanfra is a better person to comment.


>
> You can not install Flume agents in the SNMP managed devices, and you
> can not modify any software in the SNMP managed devide for use Flume
> client SDK (if I understand correctly your idea Ashish). There are two
> ways for SNMP data collection from SNMP devices using Flume (IMHO):
>

Agreed.


>
> 1. To create a custom application which launches the SNMP queries to
> the thousand of devices, and log the answer into a file: In this case
> Flume can sniff this file with the "exec source" core plugin (tail).
>

IMHO, would be preferred way for me. Create a Simple SNMP walker
which polls Nodes in parallel and writes responses in file.
Use Flume's Spool Dir Source and rest flow remains same.
I would avoid decoding Events unless it needs to be interpreted in Flume
chain


>
> 2. To use a flume-snmp-source plugin (similar to [2]), in other words,
> to shift the SNMP query custom application into a specialized Flume
> plugin.
>

Possible, it's like running #1 solution inside Flume.
For both you would need to maintain which all Agents have been sent
requests.
Async SNMP walker would help you scale to thousands of Agents.


>
> Juanfra is talking about a scenario like the second point. In that
> case you have to handle a huge flume configuration file, with an entry
> for each managed device to query. For this situation I guess there are
> two possible solutions:
>
> 1. The flume-snmp-source plugin can use a file with a list of host to
> query as parameter:
>
> agent.sources.source1.host = /path/to/list-of-host-file
>
> However I guess this breaks the philosophy or the simplicity of other
> core plugins of Flume.
>
> 2.  Create a little program to fill the flume configuration file with
> a template, or something similar.
>

I would go with #1, it keep Flume config file simple. We still need to
distribute the file but on a small scale.


>
>
> Any other ideas? I guess this is a good discussion about a real world use
> case.
>
>
> [1]
> http://mail-archives.apache.org/mod_mbox/flume-user/201409.mbox/browser
> [2] https://github.com/javiroman/flume-snmp-source
>
> On Fri, Sep 5, 2014 at 4:56 AM, Ashish <[email protected]> wrote:
> >
> > Have a look at Flume Client SDK. One simple way would be to use Flume
> clients implementations to send Events to Flume Sources, this would
> significantly reduce the number of Sources you need to manage.
> >
> > HTH !
> >
> >
> > On Thu, Sep 4, 2014 at 9:40 PM, JuanFra Rodriguez Cardoso <
> [email protected]> wrote:
> >>
> >> Thanks Andrew for your quick response.
> >>
> >> My sources (server PUD) can't put events into an agregation point. For
> this reason I'm following a PollingSource schema where my agent needs to be
> configured with thousands of sources. Any clues for use cases where data is
> injected considering a polling process?
> >>
> >> Regards!
> >> ---
> >> JuanFra Rodriguez Cardoso
> >>
> >>
> >> 2014-09-04 17:41 GMT+02:00 Andrew Ehrlich <[email protected]>:
> >>>
> >>> One way to avoid managing so many sources would be to have an
> aggregation point between the data generators the flume sources. For
> example, maybe you could have the data generators put events into a message
> queue(s), then have flume consume from there?
> >>>
> >>> Andrew
> >>>
> >>> ---- On Thu, 04 Sep 2014 08:29:04 -0700 JuanFra Rodriguez Cardoso<
> [email protected]> wrote ----
> >>>
> >>> Hi all:
> >>>
> >>> Considering an environment with thousands of sources, which are the
> best practices for managing the agent configuration (flume.conf)? Is it
> recommended to create a multi-layer topology where each agent takes control
> of a subset of sources?
> >>>
> >>> In that case, a conf mgmg server (such as Puppet) would be responsible
> for editing flume.conf  with parameters 'agent.sources' from source1 to
> source3000 (assuming we have 3000 sources machines).
> >>>
> >>> Are my thoughts aligned with that scenarios of large scale data ingest?
> >>>
> >>> Thanks a lot!
> >>> ---
> >>> JuanFra
> >>>
> >>>
> >>
> >
> >
> >
> > --
> > thanks
> > ashish
> >
> > Blog: http://www.ashishpaliwal.com/blog
> > My Photo Galleries: http://www.pbase.com/ashishpaliwal
>



-- 
thanks
ashish

Blog: http://www.ashishpaliwal.com/blog
My Photo Galleries: http://www.pbase.com/ashishpaliwal

Re: Collecting thousands of sources

Reply via email to