Hallo Roshan, Thanks for response. Bit I am now confused. If I have 120 files, do I need to configure 120 sinks/sources/channels separately? Or I have missed something in the docs. Maybe I should use Fan out flow? But then again I must set 120 params.
Best regards. On Nov 5, 2013 8:47 PM, "Roshan Naik" <[email protected]> wrote: > yes. to avoid them clobbering each other's writes. > > > On Tue, Nov 5, 2013 at 4:34 AM, Bojan Kostić <[email protected]>wrote: > >> Sorry for late response. But I lost this email somehow. >> >> Thanks for the read, it is nice start even it is old. >> And the numbers are really promising. >> >> I'm testing memory chanel, there is like 20 data sources(log servers) >> with 60 different files each. >> >> My RPC client app is basic like in examples. But it have load balancing >> for two flume agents which are writing data to hdfs. >> >> I think I read somewhere that you should have one sink per file. Is that >> true? >> >> Best regards, and sorry again for late response. >> On Oct 22, 2013 8:50 AM, "Juhani Connolly" < >> [email protected]> wrote: >> >>> Hi Bojan, >>> >>> This is pretty old, but Mike did some testing on performance about an >>> year and a half ago: >>> >>> https://cwiki.apache.org/confluence/display/FLUME/ >>> Flume+NG+Syslog+Performance+Test+2012-04-30 >>> >>> He was getting a max of 70k events/sec on a single machine. >>> >>> Thing is, this is a result of a huge number of variables: >>> - Parallelization of flows allows better parallel processing >>> - Use of memory channel as opposed to a slower consistent channel. >>> - Possibly the source. I have no idea how you wrote your app >>> - Batching of events is important. Also are all events written to one >>> file? Or are they split over many? Every file is separately processed. >>> - Network congestion, your hdfs setup >>> >>> Reaching 100k events per second is definitely possible. The resources >>> you need for it will vary significantly depending on how your setup is. The >>> more HA type features you use, the slower delivery is likely to become. On >>> the flipside, allowing fairly lax conditions that have a small potential >>> for data loss(on crash for example memory channel contents are gone) will >>> allow for close to 100k even on a single machine. >>> >>> On 10/14/2013 09:00 PM, Bojan Kostić wrote: >>> >>>> Hi, this is my first post here. But i play with flume for some time now. >>>> My question is how well flume scale? >>>> Can Flume ingest +100k events per seccond? Has anyone tried something >>>> like this? >>>> >>>> I created simple test and results are really slow. >>>> I wrote simple app with rpc client with fallback using flume sdk which >>>> is reading dummy log file. >>>> In the end i have two flume agents which are writing to hdfs. >>>> rollInterval = 60 >>>> And in hdfs i get files with ~12MB. >>>> >>>> Do i need to use some complex topology with 3 tier? >>>> How many flume agents should write to hdfs? >>>> >>>> Best regards. >>>> >>> >>> > > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity > to which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You.
