Re: Flume-ng 1.6 reliable setup

Simone Roselli Mon, 19 Oct 2015 05:40:34 -0700

Hi,

.. because a Kafka channel will lead me to the same problem, no?


I have 200 nodes, each one with a Flume-ng agent aboard. I cannot lose a
single event.

With a memory/file channel, in case Kafka is down/broken/bugged, I could
still take care of events (Spillable memory, File roll, other sinks..). In
case of Kafka Channel (another separated Kafka cluster) I would exclusively
rely on the Kafka cluster, which was my initial non-ideal situation, having
it as a Sink.


Thanks
Simone





On Mon, Oct 19, 2015 at 11:28 AM, Gonzalo Herreros <[email protected]>
wrote:

> Why don't you use a Kafka channel?
> It would be simpler and it would meet your initial requirement of having
> channel fail tolerance.
>
> Regards,
> Gonzalo
>
> On 19 October 2015 at 10:23, Simone Roselli <[email protected]>
> wrote:
>
>> However,
>>
>> since the arrive order on Kafka (main sink) is not a particular problem
>> to me, my current solution would be:
>>
>>  * memory channel
>>  * sinkgroup with 2 sinks:
>>    ** Kafka
>>    ** File_roll (write events on '/data/x' directory,  in case Kafka is
>> down)
>>  * periodically check the presence of files in '/data/x' and, in the
>> case, re-push them to Kafka
>>
>> I still don't know whether it is possible to re-push File-roll files on
>> Kafka using bin/flume-ng
>>
>> Whatever hints would be appreciated.
>>
>> Many thanks
>>
>> On Fri, Oct 16, 2015 at 4:32 PM, Simone Roselli <
>> [email protected]> wrote:
>>
>>> Hi Phil,
>>>
>>> thanks for your reply.
>>>
>>> Yes, setting up a file-channel configuration is consuming CPU up to
>>> 80/90%
>>>
>>> My settings:
>>> # Channel configuration
>>> agent1.channels.ch1.type = file
>>> agent1.channels.ch1.checkpointDir = /opt/flume-ng/chekpoint
>>> agent1.channels.ch1.dataDirs = /opt/flume-ng/data
>>> agent1.channels.ch1.capacity = 1000000
>>> agent1.channels.ch1.transactionCapacity = 10000
>>>
>>> # flume-env.sh
>>> export JAVA_OPTS="-Xms512m -Xmx2048m"
>>>
>>> # top
>>> 22079 flume-ng  20   0 6924752 785536  17132 S  83.7%  2.4   3:53.19 java
>>>
>>> Do you have any tuning for the GC ?
>>>
>>> Thanks
>>> Simone
>>>
>>>
>>>
>>> On Thu, Oct 15, 2015 at 7:59 PM, Phil Scala <[email protected]>
>>> wrote:
>>>
>>>> Hi Simone
>>>>
>>>>
>>>>
>>>> I wonder why you’re seeing 90% CPU use when you use a file channel.  I
>>>> would expect high disk I/O.  To counter, I have on a single server 4 spool
>>>> dir sources, each going to a separate file channel.  Also on an SSD based
>>>> server.   I do not see any CPU or even disk IO utilization.  I am pushing
>>>> about 10 million events per day across all 4 sources and has been running
>>>> reliably for 2 years now.
>>>>
>>>>
>>>>
>>>> I would always use a file channel, any memory channel runs the risk of
>>>> data loss if the node were to fail.  I would be as worried about the local
>>>> node failing seeing that a 3 node kafka cluster losing 2 nodes before it
>>>> would lose quorum.
>>>>
>>>>
>>>>
>>>> Not sure what your data source is, if you can add more flume nodes of
>>>> course that would help.
>>>>
>>>>
>>>>
>>>> Have you given ample heap space, seeing maybe GC’s causing the high CPU?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Phil
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *From:* Simone Roselli [mailto:[email protected]]
>>>> *Sent:* Friday, October 09, 2015 12:33 AM
>>>> *To:* [email protected]
>>>> *Subject:* Flume-ng 1.6 reliable setup
>>>>
>>>>
>>>>
>>>> Hi,
>>>>
>>>>
>>>>
>>>> I'm currently plan to migrate from Flume 0.9 to Flume-ng 1.6, but I'm
>>>> having troubles trying to find a reliable setup for this one.
>>>>
>>>>
>>>>
>>>> My sink is a 3 nodes Kafka cluster. I must avoid *to lose events in
>>>> case the main sink is down*, broken or unreachable for a while.
>>>>
>>>>
>>>>
>>>> In Flume 0.9, I use a memory channel with the *store on failure *feature,
>>>> which starts writing events on the local disk in case the target sink is
>>>> not available.
>>>>
>>>>
>>>>
>>>> In Flume-ng 1.6 the same behaviour would be accomplished by setting up
>>>> a *Spillable memory channel, *but the problem with this solution is
>>>> written in the end of the channel's description: "*This channel is
>>>> currently experimental and not recommended for use in production."*
>>>>
>>>>
>>>>
>>>> In Flume-ng 1.6, it's possible to setup a pool of *Failover sinks*.
>>>> So, I was thinking to hypothetically configure a *File Roll *as
>>>> Secondary sink in case the Primary is down. However, once the Primary sink
>>>> would be back online, the data placed on the Secondary sink (local disk)
>>>> won't be automatically pushed on the Primary one.
>>>>
>>>>
>>>>
>>>> Another option would be setting up a *file channel*: write each event
>>>> on the disk and then sink. Without mentioning that I don't love the idea to
>>>> write/delete each single event continuously on a SSD, this setup is taking
>>>> 90% of CPU. The same exactly configuration but using a memory channel takes
>>>> 3%.
>>>>
>>>>
>>>>
>>>> Other solutions to evaluate ?
>>>>
>>>>
>>>>
>>>> Simone
>>>>
>>>>
>>>>
>>>
>>>
>>
>

Re: Flume-ng 1.6 reliable setup

Reply via email to