It was late when i wrote last mail, and my explanation was not clear. I will illustrate: 20 servers, every one with 60 different log files. I was thinking that I could have this kind of structure on hdfs: /logs/server0/logstat0.log /logs/server0/logstat1.log . . . /logs/server20/logstat0.log . . .
But from your info I see that I can't do that. I could try to add server id column in every file and then aggregate files from all files servers to one file /logs/logstat0.log /logs/logstat1.log . . . But again I should have 60 sinks. On Nov 6, 2013 2:02 AM, "Roshan Naik" <[email protected]> wrote: > I assume you mean you have 120 source files to be streamed into HDFS. > There is not a 1-1 correspondence between source files and destination > hdfs files. If they are on the same host, you can have them all picked up > through one source, once channel and one hdfs sink... winding up in a > single hdfs file. > > In case you have a config with multiple HDFS sinks (part of a single agent > or spanning multiple agents) you want to ensure each HDFS sink writes to a > separate file in HDFS. > > > On Tue, Nov 5, 2013 at 4:41 PM, Bojan Kostić <[email protected]>wrote: > >> Hallo Roshan, >> >> Thanks for response. >> Bit I am now confused. If I have 120 files, do I need to configure 120 >> sinks/sources/channels separately? Or I have missed something in the docs. >> Maybe I should use Fan out flow? But then again I must set 120 params. >> >> Best regards. >> On Nov 5, 2013 8:47 PM, "Roshan Naik" <[email protected]> wrote: >> >>> yes. to avoid them clobbering each other's writes. >>> >>> >>> On Tue, Nov 5, 2013 at 4:34 AM, Bojan Kostić <[email protected]>wrote: >>> >>>> Sorry for late response. But I lost this email somehow. >>>> >>>> Thanks for the read, it is nice start even it is old. >>>> And the numbers are really promising. >>>> >>>> I'm testing memory chanel, there is like 20 data sources(log servers) >>>> with 60 different files each. >>>> >>>> My RPC client app is basic like in examples. But it have load balancing >>>> for two flume agents which are writing data to hdfs. >>>> >>>> I think I read somewhere that you should have one sink per file. Is >>>> that true? >>>> >>>> Best regards, and sorry again for late response. >>>> On Oct 22, 2013 8:50 AM, "Juhani Connolly" < >>>> [email protected]> wrote: >>>> >>>>> Hi Bojan, >>>>> >>>>> This is pretty old, but Mike did some testing on performance about an >>>>> year and a half ago: >>>>> >>>>> https://cwiki.apache.org/confluence/display/FLUME/ >>>>> Flume+NG+Syslog+Performance+Test+2012-04-30 >>>>> >>>>> He was getting a max of 70k events/sec on a single machine. >>>>> >>>>> Thing is, this is a result of a huge number of variables: >>>>> - Parallelization of flows allows better parallel processing >>>>> - Use of memory channel as opposed to a slower consistent channel. >>>>> - Possibly the source. I have no idea how you wrote your app >>>>> - Batching of events is important. Also are all events written to one >>>>> file? Or are they split over many? Every file is separately processed. >>>>> - Network congestion, your hdfs setup >>>>> >>>>> Reaching 100k events per second is definitely possible. The resources >>>>> you need for it will vary significantly depending on how your setup is. >>>>> The >>>>> more HA type features you use, the slower delivery is likely to become. On >>>>> the flipside, allowing fairly lax conditions that have a small potential >>>>> for data loss(on crash for example memory channel contents are gone) will >>>>> allow for close to 100k even on a single machine. >>>>> >>>>> On 10/14/2013 09:00 PM, Bojan Kostić wrote: >>>>> >>>>>> Hi, this is my first post here. But i play with flume for some time >>>>>> now. >>>>>> My question is how well flume scale? >>>>>> Can Flume ingest +100k events per seccond? Has anyone tried something >>>>>> like this? >>>>>> >>>>>> I created simple test and results are really slow. >>>>>> I wrote simple app with rpc client with fallback using flume sdk >>>>>> which is reading dummy log file. >>>>>> In the end i have two flume agents which are writing to hdfs. >>>>>> rollInterval = 60 >>>>>> And in hdfs i get files with ~12MB. >>>>>> >>>>>> Do i need to use some complex topology with 3 tier? >>>>>> How many flume agents should write to hdfs? >>>>>> >>>>>> Best regards. >>>>>> >>>>> >>>>> >>> >>> CONFIDENTIALITY NOTICE >>> NOTICE: This message is intended for the use of the individual or entity >>> to which it is addressed and may contain information that is confidential, >>> privileged and exempt from disclosure under applicable law. If the reader >>> of this message is not the intended recipient, you are hereby notified that >>> any printing, copying, dissemination, distribution, disclosure or >>> forwarding of this communication is strictly prohibited. If you have >>> received this communication in error, please contact the sender immediately >>> and delete it from your system. Thank You. >> >> > > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity > to which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You.
