Re: Flume-ng 1.6 reliable setup

Gonzalo Herreros Mon, 19 Oct 2015 06:15:20 -0700

I see. Maybe you need more kafka nodes and less Flume agents (I have the
same of each)


All the solutions you mention will not survive a disk crash.
I would rather rely on Kafka to guarantee no message losses.

Gonzalo

On 19 October 2015 at 13:39, Simone Roselli <[email protected]>
wrote:

> Hi,
>
> .. because a Kafka channel will lead me to the same problem, no?
>
> I have 200 nodes, each one with a Flume-ng agent aboard. I cannot lose a
> single event.
>
> With a memory/file channel, in case Kafka is down/broken/bugged, I could
> still take care of events (Spillable memory, File roll, other sinks..). In
> case of Kafka Channel (another separated Kafka cluster) I would exclusively
> rely on the Kafka cluster, which was my initial non-ideal situation, having
> it as a Sink.
>
>
> Thanks
> Simone
>
>
>
>
>
> On Mon, Oct 19, 2015 at 11:28 AM, Gonzalo Herreros <[email protected]>
> wrote:
>
>> Why don't you use a Kafka channel?
>> It would be simpler and it would meet your initial requirement of having
>> channel fail tolerance.
>>
>> Regards,
>> Gonzalo
>>
>> On 19 October 2015 at 10:23, Simone Roselli <[email protected]>
>> wrote:
>>
>>> However,
>>>
>>> since the arrive order on Kafka (main sink) is not a particular problem
>>> to me, my current solution would be:
>>>
>>>  * memory channel
>>>  * sinkgroup with 2 sinks:
>>>    ** Kafka
>>>    ** File_roll (write events on '/data/x' directory,  in case Kafka is
>>> down)
>>>  * periodically check the presence of files in '/data/x' and, in the
>>> case, re-push them to Kafka
>>>
>>> I still don't know whether it is possible to re-push File-roll files on
>>> Kafka using bin/flume-ng
>>>
>>> Whatever hints would be appreciated.
>>>
>>> Many thanks
>>>
>>> On Fri, Oct 16, 2015 at 4:32 PM, Simone Roselli <
>>> [email protected]> wrote:
>>>
>>>> Hi Phil,
>>>>
>>>> thanks for your reply.
>>>>
>>>> Yes, setting up a file-channel configuration is consuming CPU up to
>>>> 80/90%
>>>>
>>>> My settings:
>>>> # Channel configuration
>>>> agent1.channels.ch1.type = file
>>>> agent1.channels.ch1.checkpointDir = /opt/flume-ng/chekpoint
>>>> agent1.channels.ch1.dataDirs = /opt/flume-ng/data
>>>> agent1.channels.ch1.capacity = 1000000
>>>> agent1.channels.ch1.transactionCapacity = 10000
>>>>
>>>> # flume-env.sh
>>>> export JAVA_OPTS="-Xms512m -Xmx2048m"
>>>>
>>>> # top
>>>> 22079 flume-ng  20   0 6924752 785536  17132 S  83.7%  2.4   3:53.19
>>>> java
>>>>
>>>> Do you have any tuning for the GC ?
>>>>
>>>> Thanks
>>>> Simone
>>>>
>>>>
>>>>
>>>> On Thu, Oct 15, 2015 at 7:59 PM, Phil Scala <[email protected]
>>>> > wrote:
>>>>
>>>>> Hi Simone
>>>>>
>>>>>
>>>>>
>>>>> I wonder why you’re seeing 90% CPU use when you use a file channel.  I
>>>>> would expect high disk I/O.  To counter, I have on a single server 4 spool
>>>>> dir sources, each going to a separate file channel.  Also on an SSD based
>>>>> server.   I do not see any CPU or even disk IO utilization.  I am pushing
>>>>> about 10 million events per day across all 4 sources and has been running
>>>>> reliably for 2 years now.
>>>>>
>>>>>
>>>>>
>>>>> I would always use a file channel, any memory channel runs the risk of
>>>>> data loss if the node were to fail.  I would be as worried about the local
>>>>> node failing seeing that a 3 node kafka cluster losing 2 nodes before it
>>>>> would lose quorum.
>>>>>
>>>>>
>>>>>
>>>>> Not sure what your data source is, if you can add more flume nodes of
>>>>> course that would help.
>>>>>
>>>>>
>>>>>
>>>>> Have you given ample heap space, seeing maybe GC’s causing the high
>>>>> CPU?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Phil
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *From:* Simone Roselli [mailto:[email protected]]
>>>>> *Sent:* Friday, October 09, 2015 12:33 AM
>>>>> *To:* [email protected]
>>>>> *Subject:* Flume-ng 1.6 reliable setup
>>>>>
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>>
>>>>> I'm currently plan to migrate from Flume 0.9 to Flume-ng 1.6, but I'm
>>>>> having troubles trying to find a reliable setup for this one.
>>>>>
>>>>>
>>>>>
>>>>> My sink is a 3 nodes Kafka cluster. I must avoid *to lose events in
>>>>> case the main sink is down*, broken or unreachable for a while.
>>>>>
>>>>>
>>>>>
>>>>> In Flume 0.9, I use a memory channel with the *store on failure *feature,
>>>>> which starts writing events on the local disk in case the target sink is
>>>>> not available.
>>>>>
>>>>>
>>>>>
>>>>> In Flume-ng 1.6 the same behaviour would be accomplished by setting up
>>>>> a *Spillable memory channel, *but the problem with this solution is
>>>>> written in the end of the channel's description: "*This channel is
>>>>> currently experimental and not recommended for use in production."*
>>>>>
>>>>>
>>>>>
>>>>> In Flume-ng 1.6, it's possible to setup a pool of *Failover sinks*.
>>>>> So, I was thinking to hypothetically configure a *File Roll *as
>>>>> Secondary sink in case the Primary is down. However, once the Primary sink
>>>>> would be back online, the data placed on the Secondary sink (local disk)
>>>>> won't be automatically pushed on the Primary one.
>>>>>
>>>>>
>>>>>
>>>>> Another option would be setting up a *file channel*: write each event
>>>>> on the disk and then sink. Without mentioning that I don't love the idea 
>>>>> to
>>>>> write/delete each single event continuously on a SSD, this setup is taking
>>>>> 90% of CPU. The same exactly configuration but using a memory channel 
>>>>> takes
>>>>> 3%.
>>>>>
>>>>>
>>>>>
>>>>> Other solutions to evaluate ?
>>>>>
>>>>>
>>>>>
>>>>> Simone
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Flume-ng 1.6 reliable setup

Reply via email to