I see. Maybe you need more kafka nodes and less Flume agents (I have the same of each)
All the solutions you mention will not survive a disk crash. I would rather rely on Kafka to guarantee no message losses. Gonzalo On 19 October 2015 at 13:39, Simone Roselli <[email protected]> wrote: > Hi, > > .. because a Kafka channel will lead me to the same problem, no? > > I have 200 nodes, each one with a Flume-ng agent aboard. I cannot lose a > single event. > > With a memory/file channel, in case Kafka is down/broken/bugged, I could > still take care of events (Spillable memory, File roll, other sinks..). In > case of Kafka Channel (another separated Kafka cluster) I would exclusively > rely on the Kafka cluster, which was my initial non-ideal situation, having > it as a Sink. > > > Thanks > Simone > > > > > > On Mon, Oct 19, 2015 at 11:28 AM, Gonzalo Herreros <[email protected]> > wrote: > >> Why don't you use a Kafka channel? >> It would be simpler and it would meet your initial requirement of having >> channel fail tolerance. >> >> Regards, >> Gonzalo >> >> On 19 October 2015 at 10:23, Simone Roselli <[email protected]> >> wrote: >> >>> However, >>> >>> since the arrive order on Kafka (main sink) is not a particular problem >>> to me, my current solution would be: >>> >>> * memory channel >>> * sinkgroup with 2 sinks: >>> ** Kafka >>> ** File_roll (write events on '/data/x' directory, in case Kafka is >>> down) >>> * periodically check the presence of files in '/data/x' and, in the >>> case, re-push them to Kafka >>> >>> I still don't know whether it is possible to re-push File-roll files on >>> Kafka using bin/flume-ng >>> >>> Whatever hints would be appreciated. >>> >>> Many thanks >>> >>> On Fri, Oct 16, 2015 at 4:32 PM, Simone Roselli < >>> [email protected]> wrote: >>> >>>> Hi Phil, >>>> >>>> thanks for your reply. >>>> >>>> Yes, setting up a file-channel configuration is consuming CPU up to >>>> 80/90% >>>> >>>> My settings: >>>> # Channel configuration >>>> agent1.channels.ch1.type = file >>>> agent1.channels.ch1.checkpointDir = /opt/flume-ng/chekpoint >>>> agent1.channels.ch1.dataDirs = /opt/flume-ng/data >>>> agent1.channels.ch1.capacity = 1000000 >>>> agent1.channels.ch1.transactionCapacity = 10000 >>>> >>>> # flume-env.sh >>>> export JAVA_OPTS="-Xms512m -Xmx2048m" >>>> >>>> # top >>>> 22079 flume-ng 20 0 6924752 785536 17132 S 83.7% 2.4 3:53.19 >>>> java >>>> >>>> Do you have any tuning for the GC ? >>>> >>>> Thanks >>>> Simone >>>> >>>> >>>> >>>> On Thu, Oct 15, 2015 at 7:59 PM, Phil Scala <[email protected] >>>> > wrote: >>>> >>>>> Hi Simone >>>>> >>>>> >>>>> >>>>> I wonder why you’re seeing 90% CPU use when you use a file channel. I >>>>> would expect high disk I/O. To counter, I have on a single server 4 spool >>>>> dir sources, each going to a separate file channel. Also on an SSD based >>>>> server. I do not see any CPU or even disk IO utilization. I am pushing >>>>> about 10 million events per day across all 4 sources and has been running >>>>> reliably for 2 years now. >>>>> >>>>> >>>>> >>>>> I would always use a file channel, any memory channel runs the risk of >>>>> data loss if the node were to fail. I would be as worried about the local >>>>> node failing seeing that a 3 node kafka cluster losing 2 nodes before it >>>>> would lose quorum. >>>>> >>>>> >>>>> >>>>> Not sure what your data source is, if you can add more flume nodes of >>>>> course that would help. >>>>> >>>>> >>>>> >>>>> Have you given ample heap space, seeing maybe GC’s causing the high >>>>> CPU? >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Phil >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> *From:* Simone Roselli [mailto:[email protected]] >>>>> *Sent:* Friday, October 09, 2015 12:33 AM >>>>> *To:* [email protected] >>>>> *Subject:* Flume-ng 1.6 reliable setup >>>>> >>>>> >>>>> >>>>> Hi, >>>>> >>>>> >>>>> >>>>> I'm currently plan to migrate from Flume 0.9 to Flume-ng 1.6, but I'm >>>>> having troubles trying to find a reliable setup for this one. >>>>> >>>>> >>>>> >>>>> My sink is a 3 nodes Kafka cluster. I must avoid *to lose events in >>>>> case the main sink is down*, broken or unreachable for a while. >>>>> >>>>> >>>>> >>>>> In Flume 0.9, I use a memory channel with the *store on failure *feature, >>>>> which starts writing events on the local disk in case the target sink is >>>>> not available. >>>>> >>>>> >>>>> >>>>> In Flume-ng 1.6 the same behaviour would be accomplished by setting up >>>>> a *Spillable memory channel, *but the problem with this solution is >>>>> written in the end of the channel's description: "*This channel is >>>>> currently experimental and not recommended for use in production."* >>>>> >>>>> >>>>> >>>>> In Flume-ng 1.6, it's possible to setup a pool of *Failover sinks*. >>>>> So, I was thinking to hypothetically configure a *File Roll *as >>>>> Secondary sink in case the Primary is down. However, once the Primary sink >>>>> would be back online, the data placed on the Secondary sink (local disk) >>>>> won't be automatically pushed on the Primary one. >>>>> >>>>> >>>>> >>>>> Another option would be setting up a *file channel*: write each event >>>>> on the disk and then sink. Without mentioning that I don't love the idea >>>>> to >>>>> write/delete each single event continuously on a SSD, this setup is taking >>>>> 90% of CPU. The same exactly configuration but using a memory channel >>>>> takes >>>>> 3%. >>>>> >>>>> >>>>> >>>>> Other solutions to evaluate ? >>>>> >>>>> >>>>> >>>>> Simone >>>>> >>>>> >>>>> >>>> >>>> >>> >> >
