Hello, It looks like you had acks turned on in the config you posted for your netcat source. You might want to try turning them off:
agent1.sources.netcatSource.ack-every-event = false We've gotten up around 1400 events per second on a single netcat source feeding 2 HDFS sinks without any issues (using a memory channel). This is on a live network so we've never tested above that as that's the max throughput of the events we're storing. Best, Ed On Fri, Mar 28, 2014 at 5:58 AM, Asim Zafir <[email protected]> wrote: > How much data are ingesting per minute or second bases? > How many source we are taking here ? > What kind of channel are you using currently and what is the memory > /storage footprint on the source as well as sink? > is it a uniform distribution of traffic? if not, what is the max peak of > the data throughput you you expect from a given source? > > > > On Thu, Mar 27, 2014 at 11:07 AM, Andrew Ehrlich <[email protected]>wrote: > >> What about having more than one flume agent? >> >> You could have two agents that read the small messages and sink to HDFS, >> or two agents that read the messages, serialize them, and send them to a >> third agent which sinks them into HDFS. >> >> >> On Thu, Mar 27, 2014 at 9:43 AM, Chris Schneider < >> [email protected]> wrote: >> >>> I have a fair bit of data continually being created in the form of >>> smallish messages (a few hundred bytes), which needs to enter flume, and >>> eventually sink into HDFS. >>> >>> I need to be sure that the data lands in persistent storage and won't be >>> lost, but otherwise throughput isn't important. It just needs to be fast >>> enough to not back up. >>> >>> I'm running into a bottleneck in the initial ingestion of data. >>> >>> I've tried the netcat source, and the thrift source but both have capped >>> out at a thousand or so records per second. >>> >>> Batching up the thrift api items into sets of 10 and using appendBatch >>> is a pretty large speedup, but still not enough. >>> >>> Here's a gist of my ruby test script, and some example runs, and my >>> config. >>> >>> https://gist.github.com/cschneid/9792305 >>> >>> >>> 1. Are there any obvious performance changes I can do to speed up >>> ingestion? >>> 2. How fast can flume reasonably go? Should I switch my source to be >>> something else that's faster? What? >>> 3. Is there a better tool for this kind of task? (rapid, safe ingestion >>> small messages). >>> >>> Thanks! >>> Chris >>> >> >> >
