Re: flume and hadoop append

2014-04-10 Thread Brock Noland
On Wed, Apr 9, 2014 at 12:54 PM, Pritchard, Charles X. -ND charles.x.pritchard@disney.com wrote: On Apr 9, 2014, at 8:06 AM, Brock Noland br...@cloudera.com wrote: Hi Charles, Exploring the idea of using append instead of creating new files with HDFS every few minutes. ... it's

Re: flume and hadoop append

2014-04-09 Thread Brock Noland
Hi Charles, Exploring the idea of using append instead of creating new files with HDFS every few minutes. I wonder if this is doable by setting rollCount to 0 and then using rollInterval (or alternatively rollSize)? There's certainly a history of append with HDFS, mainly, earlier versions

Re: JMS source and full file channel

2014-03-17 Thread Brock Noland
When the channel is full, the JMSSource returns BACKOFF to PollableSourceRunner which implements backoff logic which can be seen here: https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/PollableSourceRunner.java#L139 On Mon, Mar 17, 2014 at 7:38 PM,

Re: JMS Source on 1.3.0?

2014-03-13 Thread Brock Noland
I think it should work to run the 1.4 JMS source in 1.3. Out of curiosity, which vendor isn't shipping 1.4? Brock On Thu, Mar 13, 2014 at 4:43 PM, Christopher Shannon cshannon...@gmail.comwrote: I know that the JMS source is new in version 1.4, but has anyone tried running this particular

Re: File Channel Capacity vs transactionCapacity

2014-03-05 Thread Brock Noland
transactionCapacity is a limit on events consumed or produced. We have removed it in trunk. On Wed, Mar 5, 2014 at 12:47 PM, Jimmy jimmyj...@gmail.com wrote: perhaps strange question, but looking at the documentation for flume 1.4 under FILE CHANNEL transactionCapacity 1000The maximum size

Re: Flume Start and Stop Script

2014-02-14 Thread Brock Noland
at 10:25 PM, Brock Noland br...@cloudera.com wrote: Yes. Note that CDH3 is deprecated as CDH4 has recently had it's 5th maintenance release (CDH 4.5). On Sun, Feb 9, 2014 at 9:12 PM, Upender Nimbekar nimbekar.upen...@gmail.com wrote: Does Cloudera CDH3U6 (I know its old) come up with Start

Re: Ordering of messages in flume-ng

2014-02-12 Thread Brock Noland
Hi, In the cast of no failures with a single source, single channel and single sink you will see ordering. However, I believe when there is a failure file channel will change ordering on rollback. If strict ordering is required it's advisable to assign sequence numbers upstream and then re-order

Re: Flume Start and Stop Script

2014-02-08 Thread Brock Noland
Flume itself just provides a script to start flume in the foreground. Apache BigTop provides init scripts for flume. On Feb 8, 2014 12:22 PM, Upender Nimbekar nimbekar.upen...@gmail.com wrote: Does anyone know how to stop and start flume using one script in clustered environment. It looks there

RE: checkpoint lifecycle

2014-01-30 Thread Brock Noland
How large is your heap? You will likely want two data directories per disk. Also with a channel that large I strongly recommend using back up checkpoints. Additionally https://issues.apache.org/jira/browse/FLUME-2155 will be very useful to you as well. On Jan 30, 2014 4:21 AM, Umesh Telang

Re: Writing custom source

2014-01-30 Thread Brock Noland
I am guessing you want to write a Spooling Directory Source deserializer. http://flume.apache.org/FlumeUserGuide.html#event-deserializers On Thu, Jan 30, 2014 at 3:58 AM, Chhaya Vishwakarma chhaya.vishwaka...@lntinfotech.com wrote: Hi, I want to write my own custom source to handle

Re: flume use quite a lot of direct momery and blast

2014-01-22 Thread Brock Noland
on ubuntu and what confused me is not about the virtual address space but NIO Direct Memory increasing boundlessly. It will finally cause the machine to report swap is low. On Wed, Jan 22, 2014 at 1:08 AM, Brock Noland br...@cloudera.com wrote: Looks like it's arena allocation. Basically it's

Re: flume use quite a lot of direct momery and blast

2014-01-21 Thread Brock Noland
Looks like it's arena allocation. Basically it's nothing to worry about since virtual address space on 64bit machines is not in short supply. You could limit it with a parameter like so:

Re: flume agent not starting

2014-01-15 Thread Brock Noland
I'd check the hbase sink and serializer config. It's too quiet when it's misconfigured. On Wed, Jan 15, 2014 at 6:35 AM, Chhaya Vishwakarma chhaya.vishwaka...@lntinfotech.com wrote: Hi, This is my configuration file for pushing data using flume to hbase host1.sources = src1

Re: issues with configuration updates and clean shutdowns

2014-01-15 Thread Brock Noland
Could you share some more information? For example: * What version you are using * Configuration file * Log On Wed, Jan 15, 2014 at 8:24 AM, Josh Marcus jmar...@meetup.com wrote: Hey folks, I didn't get any response at all to this, so let me ask more generally: does automatic reload upon

Re: Better sink/source/interceptor lists?

2014-01-08 Thread Brock Noland
+1 On Wed, Jan 8, 2014 at 12:56 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, Do others think it would be valuable to Flume to: 1) more clearly list all available sources, sinks, interceptors Compare: http://flume.apache.org/FlumeUserGuide.html to:

Re: Handling malformed data when using custom AvroEventSerializer and HDFS Sink

2014-01-02 Thread Brock Noland
On Thu, Jan 2, 2014 at 10:25 AM, Brock Noland br...@cloudera.com wrote: On Tue, Dec 31, 2013 at 8:34 PM, ed edor...@gmail.com wrote: Hello, We are using Flume v1.4 to load JSON formatted log data into HDFS as Avro. Our flume setup looks like this: NXLog == (FlumeHTTPSource - HDFSSink w

Re: Handling malformed data when using custom AvroEventSerializer and HDFS Sink

2014-01-02 Thread Brock Noland
: 412-256-8556 | www.rdx.com On Thu, Jan 2, 2014 at 12:27 PM, Brock Noland br...@cloudera.com wrote: Jimmy, great to hear that method is working for you! Devin, regarding the morphlines question. Since ML can have arbitrary java plugins it *can* do just about anything. I generally

Re: Event breaking in flume

2013-12-30 Thread Brock Noland
Yes, it is possible to handle multi-line events and handling stack traces is very common place. However, using exec source is going to be limiting. The correct solution is: 1) Use spooling directory source 2) Write a little deserializer to handle your format. Another solution is: 1) replace

Re: Flume uses high Virtual memory

2013-12-30 Thread Brock Noland
Hi, The error: *java.lang.OutOfMemoryError: unable to create new native thread* doesn't have anything todo with virtual address space in a 64bit process. It's highly likely that your nproc setting is too low. I would increase this. For example Cloudera Manager sets this to 32K which is a much

Re: hbase as sink

2013-12-26 Thread Brock Noland
Please share your sink configuration. On Thu, Dec 26, 2013 at 4:05 AM, Chhaya Vishwakarma chhaya.vishwaka...@lntinfotech.com wrote: Hi, I want to parse a log file to get the highlighted text. Regular expression used to parse (\.\w+)(?=Exception) But table in hbase shows no

Re: File Channel Best Practice

2013-12-18 Thread Brock Noland
Hi Devin, Please find my response below. On Wed, Dec 18, 2013 at 12:24 PM, Devin Suiter RDX dsui...@rdx.com wrote: So, if I understand your position on sizing the source properly, you are saying that the fsync operation is the costly part - it locks the device it is flushing to until the

Re: File Channel Best Practice

2013-12-18 Thread Brock Noland
, 2013 at 2:23 PM, Brock Noland br...@cloudera.com wrote: Hi Devin, Please find my response below. On Wed, Dec 18, 2013 at 12:24 PM, Devin Suiter RDX dsui...@rdx.comwrote: So, if I understand your position on sizing the source properly, you are saying that the fsync operation is the costly

Re: RELP support?

2013-12-17 Thread Brock Noland
I assume you can then configure rsyslog to use RELP when sending to Flume? If do I think it'd be great. On Dec 17, 2013 12:07 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, I was looking at Rsyslog and RELP the other day and wondering whether a RELP Source would make sense for

Re: file channel read performance impacted by write rate

2013-12-17 Thread Brock Noland
(4), see agent2.conf. The results are the same. Thanks for looking into this, Jan On 11/14/2013 05:08 PM, Brock Noland wrote: On Thu, Nov 14, 2013 at 2:50 AM, Jan Van Besien ja...@ngdata.com mailto:ja...@ngdata.com wrote: On 11/13/2013 03:04 PM, Brock Noland wrote

Re: File Channel Read Performance

2013-12-17 Thread Brock Noland
Same questions: Can you take and share 8-10 thread dumps while the sink is taking events slowly? Can you share your machine configuration? On Dec 17, 2013 7:57 AM, David Sinclair dsincl...@chariotsolutions.com wrote: Hi, I am using a File Channel connected to an AMQP Source and an HDFS Sink.

Re: file channel read performance impacted by write rate

2013-12-17 Thread Brock Noland
again. thanks for looking into this. On Tue, Dec 17, 2013 at 8:54 PM, Brock Noland br...@cloudera.com wrote: Can you take and share 8-10 thread dumps while the sink is taking events slowly? Can you share your machine and file channel configuration? On Dec 17, 2013 6:28 AM, Shangan Chen

Re: File Channel Best Practice

2013-12-17 Thread Brock Noland
Hi, I'd also add the biggest issue I see with the file channel is batch size at the source. Long story short is that file channel was written to guarantee no data loss. In order to do that when a transaction is committed we need to perform a fsync on the disk the transaction was written to.

Re: Flume uses high Virtual memory

2013-12-14 Thread Brock Noland
Additionally I'd note that worrying about virtual memory on 64 bit machines is probably not worth your time. The newer versions of malloc() do arena allocation and reserve virtual memory for each thread. This does not however, actually consume memory. On Sat, Dec 14, 2013 at 10:49 AM, Matt Wise

Re: issue about flume monitor

2013-11-28 Thread Brock Noland
service On Tue, Nov 26, 2013 at 1:17 PM, ch huang justlo...@gmail.com wrote: yes,that's what i wanted On Tue, Nov 26, 2013 at 11:50 AM, Brock Noland br...@cloudera.com wrote: It sounds like you want to run both the http and ganglia monitoring at the same time. Is that correct? On Monday

Re: java.lang.OutOfMemoryError: unable to create new native thread

2013-11-27 Thread Brock Noland
Sounds like the nproc ulimit... On Tue, Nov 26, 2013 at 8:50 PM, Jeff Lord jl...@cloudera.com wrote: Can you provide the logfile and config? On Tue, Nov 26, 2013 at 12:20 PM, Cochran, David david.coch...@bsee.gov wrote: I've got a pretty good sized box collecting logs for a number of

Re: The size of data folder continue grow

2013-11-26 Thread Brock Noland
On Tue, Nov 26, 2013 at 7:41 AM, GuoWei wei@wbkit.com wrote: Dear all, I use flume and custom base sink to put data to HBase. In flume, I use file channel. the file channel data put in the following folder. razor.channels.c_error.dataDirs = /var/lib/flume-ng/data/error But I see the

Re: Flume File Channel Filling Up The Disk With Transaction Log, Any Way To Prevent It

2013-11-25 Thread Brock Noland
Lower the maxFileSize. On Mon, Nov 25, 2013 at 2:41 PM, Ritesh Adval riteshad...@gaikai.com wrote: Hi, We are running two flume 1.4 agents each with 2 file channel on a VM of size 15GB. Is VM recommded to run flume or do we need bare metal boxes? Every week or so we are running into

Re: Flume File Channel Filling Up The Disk With Transaction Log, Any Way To Prevent It

2013-11-25 Thread Brock Noland
maxFileSize. Do we know how many max log files it will keep in flume 1.4 ? Ritesh On Mon, Nov 25, 2013 at 12:50 PM, Brock Noland br...@cloudera.com wrote: Lower the maxFileSize. On Mon, Nov 25, 2013 at 2:41 PM, Ritesh Adval riteshad...@gaikai.com wrote: Hi, We are running two

Re: Flume File Channel Filling Up The Disk With Transaction Log, Any Way To Prevent It

2013-11-25 Thread Brock Noland
the channel capacity, currently it is set to 1M Ritesh On Mon, Nov 25, 2013 at 1:00 PM, Brock Noland br...@cloudera.com wrote: It will keep any tx log that has a corresponding event in the channel + 2 per data directory. On Mon, Nov 25, 2013 at 2:55 PM, Ritesh Adval riteshad...@gaikai.com

Re: issue about flume monitor

2013-11-25 Thread Brock Noland
It sounds like you want to run both the http and ganglia monitoring at the same time. Is that correct? On Monday, November 25, 2013, ch huang wrote: hi,maillist: i want to write a nagios plugin for flume monitor and alert,the script need get input from the command curl

Re: file channel read performance impacted by write rate

2013-11-14 Thread Brock Noland
On Thu, Nov 14, 2013 at 2:50 AM, Jan Van Besien ja...@ngdata.com wrote: On 11/13/2013 03:04 PM, Brock Noland wrote: The file channel uses a WAL which sits on disk. Each time an event is committed an fsync is called to ensure that data is durable. Without this fsync there is no durability

Re: Unable to deliver event. Exception follows. java.lang.NullPointerException

2013-10-30 Thread Brock Noland
wrote: Hi Brock, Yes, I think morphline interceptor should be something I am looking for. I am studying it now. Thank you, George On Tue, Oct 29, 2013 at 12:56 PM, Brock Noland br...@cloudera.comwrote: In a very simple demo you could use the static interceptor: http

Re: Spooling Directory Source Stuck in Exception [Serializer has been closed]

2013-10-30 Thread Brock Noland
Under certain circumstances mv will actually be a copy + delete. In that case the file size will change during the copy phase. I'd recommend copying it in with an extension which is ignored via excludePattern and then renaming it. On Wed, Oct 30, 2013 at 12:49 PM, Snehal Nagmote

Re: [ANNOUNCE] New Flume Committer - Roshan Naik

2013-09-25 Thread Brock Noland
Congratulations! On Wednesday, September 25, 2013, Juhani Connolly wrote: ** Congratulations Roshan, nice job! On 09/25/2013 02:52 PM, Arvind Prabhakar wrote: Congratulations Roshan! Regards, Arvind Prabhakar On Tue, Sep 24, 2013 at 4:05 PM, Mike Percy

Re: 4 times disk consumption?

2013-09-10 Thread Brock Noland
reach the minimumRequiredSpace), together they use more than the default 2G of this parameter. On Tue, Sep 10, 2013 at 8:23 AM, Brock Noland br...@cloudera.com wrote: If you are concerned about disk space consumption you should lower the max log size on the file channel. The exact parameter

Re: 4 times disk consumption?

2013-09-09 Thread Brock Noland
If you are concerned about disk space consumption you should lower the max log size on the file channel. The exact parameter is in the docs. On Tue, Sep 10, 2013 at 12:17 AM, Anat Rozenzon a...@viber.com wrote: After leaving flume to run in this state (sink is not sending the events), the disk

Re: sleep() in script doesn't work when called by exec Source

2013-08-19 Thread Brock Noland
In your case I would look at the spooling directory source. On Sun, Aug 18, 2013 at 9:29 PM, Wang, Yongkun | Yongkun | BDD yongkun.w...@mail.rakuten.com wrote: Hi, I am testing with apache-flume-1.4.0-bin. I made a naive python script for exec source to do throttling by calling sleep()

Re: Flume is replaying log for hours now

2013-08-08 Thread Brock Noland
use-fast-replay would help but you'd need 4-5GB of heap per channel. With heaps that large you use be using dual checkpointing to avoid this. Here is the thread doing the replay: lifecycleSupervisor-1-0 prio=10 tid=0x7f040472c800 nid=0x1332b runnable [0x7f03f84ce000]

Re: Flume is replaying log for hours now

2013-08-08 Thread Brock Noland
collector.channels.mc1.transactionCapacity=1 collector.channels.mc1.use-fast-replay=true On Thu, Aug 8, 2013 at 3:19 PM, Brock Noland br...@cloudera.com wrote: use-fast-replay would help but you'd need 4-5GB of heap per channel. With heaps that large you use be using dual checkpointing

Re: Flume is replaying log for hours now

2013-08-08 Thread Brock Noland
it is. Should I just set: useDualCheckpoints=true backupCheckpointDir=/some/dir On Thu, Aug 8, 2013 at 4:05 PM, Brock Noland br...@cloudera.com wrote: Note the dual and backup checkpoint configs here: http://flume.apache.org/FlumeUserGuide.html#file-channel On Thu, Aug 8, 2013 at 7:37

Re: Flume service stopped automatically

2013-06-01 Thread Brock Noland
Weird... It could have been the linux oom killer? You'd see something in /var/log/message if that was the case. On Sat, Jun 1, 2013 at 2:47 AM, Lenin Raj emaille...@gmail.com wrote: Hello, I have a flume service which pulls twitter data and sinks to HDFS. I started it last night at 8 PM. It

Re: NullPointerException : (Channel closed) on writeCheckpoint

2013-05-13 Thread Brock Noland
I'd be highly surprised if not setting batchSize caused the NPE. Here is a good tuning article: https://blogs.apache.org/flume/entry/flume_performance_tuning_part_1 On Mon, May 13, 2013 at 10:58 AM, Jay Vyas jayunit...@gmail.com wrote: Alas - i cannot reproduce this bug anymore., maybe as a

Re: Lightweight workaround for FLUME-1137, Hadoop classes required for File channel

2013-05-02 Thread Brock Noland
You only need the hadoop-core jar. Thanks for pointing out 1137. Work is progressing on this issue in FLUME-1285https://issues.apache.org/jira/browse/FLUME-1285 . On Thu, May 2, 2013 at 8:15 AM, brian.h...@bt.com wrote: Hi I'm running up against FLUME-1137 (

Re: High CPU usage when using file channel.

2013-04-29 Thread Brock Noland
Weird. Do you feel comfortable using a java profiler? If not could you take 4-5 thread dumps while that channel is using that much CPU. -- Brock Noland Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Monday, April 29, 2013 at 2:48 AM, Pranav Sharma wrote: Hi, I see a very

Re: HBase Sink Reliability

2013-04-22 Thread Brock Noland
Also, the below doesn't make sense unless you are generating the config yourself. The file channel doesn't do the kind of path manipulation that the hdfs sink does... collector.channels.fileChannel.checkpointDir=~/.flume/file-channel/checkpoint_%{data_type}

Re: Log Events get Lost - flume 1.3

2013-04-16 Thread Brock Noland
Hi, There are two issues with your configuration: 1) batch size of 1 with file channel is anti-pattern. This will result in extremely poor performance because the file channel will have to do an fsync() (expensive disk operation required to ensure no data loss) for each event. Your batch size

Re: FileChannel on Windows

2013-04-10 Thread Brock Noland
I think we just need to copy 4-5 classes over from hadoop. Someone uploaded a patch I just haven't had time to work with it. On Wed, Apr 10, 2013 at 3:41 PM, Hari Shreedharan hshreedha...@cloudera.com wrote: Yes. We will need to remove the format upgrade code and the old format code from

Re: : write-timeout value tuning

2013-04-08 Thread Brock Noland
, Brock Noland br...@cloudera.com wrote: There is no harm in setting write-timeout to something like 30 seconds. In fact it probably makes sense to increase the default to 30 seconds. On Mon, Apr 8, 2013 at 1:38 PM, Madhu Gmail madhu.munag...@gmail.comwrote: Hello, ** ** I am getting

Re: FileChannel error

2013-03-29 Thread Brock Noland
How large is /local/flume/file-channel/flume-log-sink-dev/data/log-884? Would you be willing to share the file with me (off list) so I could take a look at the corruption? Brock On Fri, Mar 29, 2013 at 1:02 PM, Andrew Jones andrew+fl...@andrew-jones.com wrote: Hi, I restarted my flume

Re: Channel not starting.

2013-03-21 Thread Brock Noland
Strange... Can you share 1) JVM 2) OS and 3) Flume config file? On Thu, Mar 21, 2013 at 7:54 AM, JR mailj...@gmail.com wrote: Hello, Am a newbie, trying to set up acvro source, channel memory and hdfs sink. Am encountering the following error: Not sure what to do. *Is there a way

Re: process failed - java.lang.OutOfMemoryError

2013-03-02 Thread Brock Noland
Try turning on HeapDumpOnOutOfMemoryError so we can peek at the heap dump. -- Brock Noland Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Friday, March 1, 2013 at 5:57 PM, Denis Lowe wrote: process failed - java.lang.OutOfMemoryError We observed the following error: 01 Mar

Re: compression over-the-wire with 1.3.1 ?

2013-02-25 Thread Brock Noland
move a tremendous amount of data between clusters and use compression and batching. https://issues.cloudera.org/browse/FLUME-559 looks like I will need to spend time working with both versions. Jim From: Brock Noland br...@cloudera.com Reply-To: user@flume.apache.org Date: Sat, 23

Re: compression over-the-wire with 1.3.1 ?

2013-02-23 Thread Brock Noland
Hi, The spool dir source does not have built in functionality to read compressed files. One could be built, but I think it would either require a subclass of https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/serialization/EventDeserializer.javaor changes to

Re: Flume performance tuning

2013-02-21 Thread Brock Noland
Can you share more about your tests? How many events you sent? How those times were calculated, etc. -- Brock Noland Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Wednesday, February 20, 2013 at 9:42 PM, Ryo.Hongo wrote: Hi. I did benchmark of the flume. 3〜8ms in one

Re: Help with spooling directory source

2013-02-19 Thread Brock Noland
Hi, The spooling fir source expects immutable, uniquely named files as described here: http://flume.apache.org/FlumeUserGuide.html#spooling-directory-source As such you should log to a separate directory and then on roll move the file (uniquely named) into the spooling dir source. Brock On

Re: build flume trunk failed

2013-02-18 Thread Brock Noland
Hi, Try doing a mvn install -DskipTests before executing your commands. On Mon, Feb 18, 2013 at 4:14 AM, 周梦想 abloz...@gmail.com wrote: hello, my env: [zhouhh@Hadoop48 flume]$ cat /etc/redhat-release CentOS release 5.9 (Final) [zhouhh@Hadoop48 flume]$ uname -a Linux Hadoop48

Re: Flume-NG : HBase sink : Could not retrieve login configuration: java.lang.SecurityException:

2013-02-15 Thread Brock Noland
Hi, Check to make sure you spelled your principal correctly and that your key tab has the correct permissions. On Thursday, February 14, 2013, Madhusudhan Reddy Munagala wrote: Hello, I am trying to configure the FlumeNG with HBase. We have Zoo keeper service for the Hbase. The Flume

Re: Flume NG and zookeeper

2013-02-07 Thread Brock Noland
NG has no zookeeper integration. What are you trying to do? There is one open JIRA on this https://issues.apache.org/jira/browse/FLUME-1491 On Thu, Feb 7, 2013 at 8:36 PM, 吳瑞琳 rlwu...@gmail.com wrote: Hi all, I am trying to integrate Flume NG and zookeeper. However, I did not find any

Re: Security between Avro-source and Avro-sink

2013-02-04 Thread Brock Noland
See this discussion: http://mail-archives.apache.org/mod_mbox/flume-user/201301.mbox/%3C7475E65732997042ABDCE90B7B4EB286091644%40OZWEX0201N2.msad.ms.com%3E On Mon, Feb 4, 2013 at 10:28 AM, Rahul Ravindran rahu...@yahoo.com wrote: Re..sending. -- *From:* Rahul

Re: FileChannel error on Flume 1.3.1

2013-02-02 Thread Brock Noland
Hi, That isn't a good error message. Could you share your entire log file? Post it on a JIRA or pastebin. Did you delete the checkpoint before starting the channel after changing capacity? Capacity is fixed for the file channel as such the checkpoint must be deleted to change the capacity.

Re: HDFS Test Failure

2013-01-27 Thread Brock Noland
takes a long time but succeeds. TestHBaseSink, however, fails after a while when it times out. How can I get this to work without running in 'sudo' mode, and why might the TestHBaseSink be hanging for just me? - Connor On Sat, Jan 19, 2013 at 3:06 PM, Brock Noland br...@cloudera.com wrote

Re: log4jappender hang's

2013-01-25 Thread Brock Noland
Also, Mike Percy has a working example here: https://github.com/mpercy/flume-log4j-example On Fri, Jan 25, 2013 at 11:10 AM, Hari Shreedharan hshreedha...@cloudera.com wrote: Have you set up the appender correctly? The log4j appender class is

Re: HDFS Test Failure

2013-01-19 Thread Brock Noland
I think there is/was a bug in HDFS which caused a NPE due to umask. My guess is it's 0002 where as it needs to be 0022. On Sat, Jan 19, 2013 at 2:56 PM, Connor Woodson cwoodson@gmail.com wrote: Running mvn test on the latest Flume code, I get a test failure in

Re: Log4J Appender in Flume

2013-01-19 Thread Brock Noland
Hi, Do I understand this correctly, you are going to use the flume log4j appender to collect flume logs? If so, I don't see how you'd avoid the feedback loop. Brock On Fri, Jan 18, 2013 at 11:13 AM, Connor Woodson cwoodson@gmail.com wrote: I just ran into an unfortunate configuration

Re: Uncaught Exception When Using Spooling Directory Source

2013-01-17 Thread Brock Noland
Hi, Would you mind turning logging to debug and then posting your full log/config? Brock On Thu, Jan 17, 2013 at 8:24 PM, Henry Ma henry.ma.1...@gmail.com wrote: Hi, When using Spooling Directory Source in Flume NG 1.3.1, this exception happens: 13/01/18 11:37:09 ERROR

Re: Uncaught Exception When Using Spooling Directory Source

2013-01-17 Thread Brock Noland
, Brock Noland br...@cloudera.com wrote: Hi, Would you mind turning logging to debug and then posting your full log/config? Brock On Thu, Jan 17, 2013 at 8:24 PM, Henry Ma henry.ma.1...@gmail.com wrote: Hi, When using Spooling Directory Source in Flume NG 1.3.1, this exception happens

Re: Need for UDP / Multicast Source

2013-01-16 Thread Brock Noland
Hi, I would use memory channel for now as opposed to file channel. For file channel to keep up with that you'd need multiple disks. Also your checkpoint period is super-low which will cause lots of checkpoints and slow things down. However, I think the biggest issue is probably batch size. With

Re: Need for UDP / Multicast Source

2013-01-16 Thread Brock Noland
. I'm still not doing so great though. I'm getting about 300 Mb per minute in my HDFS files. I should be getting about 300G. That's better than before though. I've got 10% of the data this time, rather than 0.14% :) On Jan 16, 2013, at 4:36 PM, Brock Noland br...@cloudera.com wrote

Re: Question about gzip compression when using Flume Ng

2013-01-14 Thread Brock Noland
/debug$ zcat collector102.ngpipes.sac.ngmoco.com.1358204406896.gz | wc -l gzip: collector102.ngpipes.sac.ngmoco.com.1358204406896.gz: decompression OK, trailing garbage ignored 100 This should be about 50,000 events for the 5 min window!! Sagar On Mon, Jan 14, 2013 at 3:16 PM, Brock Noland

Re: flume-ng data recovery

2013-01-14 Thread Brock Noland
Roy, You deleted the checkpoint files AND the inflights and you got the same error? Brock On Mon, Jan 14, 2013 at 4:10 PM, Camp, Roy rc...@ebay.com wrote: I somehow got one of my instances into a bad state where I continue to get the following error. I have two data log files with about 2GB

Re: HDFSsink failover error

2013-01-14 Thread Brock Noland
This is https://issues.apache.org/jira/browse/FLUME-1779 On Mon, Jan 14, 2013 at 4:25 PM, Connor Woodson cwoodson@gmail.com wrote: Oh alright, found it. What is happening is that the HDFS sink does not throw an exception for this write error, but instead returns a Status.BACKOFF, and as

Re: Question about gzip compression when using Flume Ng

2013-01-14 Thread Brock Noland
[Log-BackgroundWorker-channel2] INFO org.apache.flume.channel.file.LogFileV3 - Updating log-35.meta currentPosition = 217919486, logWriteOrderID = 1358209947324 Sagar On Mon, Jan 14, 2013 at 3:38 PM, Brock Noland br...@cloudera.com wrote: Hmm, could you try and updated version of Hadoop

Re: flume-ng data recovery

2013-01-14 Thread Brock Noland
Hi, OK. I would increase the capacity of the channel to say 200 with the original unmodified files. I would also upgrade to the latest 1.3.1 since there are many file channel fixes in 1.3.0 and 1.3.1. On Mon, Jan 14, 2013 at 5:33 PM, Camp, Roy rc...@ebay.com wrote: When I deleted both,

Re: New blog post on Flume performance tuning

2013-01-11 Thread Brock Noland
Nice post! On Fri, Jan 11, 2013 at 12:13 PM, Mike Percy mpe...@apache.org wrote: Hi folks, I just posted to the Apache blog on how to do performance tuning with Flume. I plan on following it up with a post about using the Flume monitoring capabilities while tuning. Feedback is welcome.

Re: weird hbase sink behavior

2013-01-08 Thread Brock Noland
payloadColumn or incrementColumn must be specified. This should give you a better message and the open JIRA is https://issues.apache.org/jira/browse/FLUME-1757 Brock On Tue, Jan 8, 2013 at 10:19 PM, Xu (Simon) Chen xche...@gmail.com wrote: Hi folks, I am using flume to sink into hbase. I had

Re: keep event header when dump into hbase sink?

2013-01-07 Thread Brock Noland
I would subclass HbaseEventSerializer for this purpose. On Mon, Jan 7, 2013 at 10:33 AM, Xu (Simon) Chen xche...@gmail.com wrote: Hi all, Which hbase sink keeps the event header, and how to configure it to do so? The SimpleHbaseEventSerializer would certainly discard the header:

Re: post-processing

2012-12-21 Thread Brock Noland
I wouldn't modify the files while flume is also modifying them. It might work but also might be a complete mess. If you need to modify the events before being written interceptors are the correct solution. After the file is done from a flume perspective, modify all you wish! On Fri, Dec 21, 2012

Re: Configuring failover sync in Flume-ng

2012-12-19 Thread Brock Noland
Just as a side note: File Channel has received significantly more testing than JDBC Channel. On Wed, Dec 19, 2012 at 8:05 AM, Alexander Alten-Lorenz wget.n...@gmail.com wrote: Yes, why not luring the wiki :) https://cwiki.apache.org/confluence/display/FLUME/Home Stable:

Re: Flume 1.3.0 - NFS + File Channel Performance

2012-12-19 Thread Brock Noland
whether the patch in FLUME-1794 fixes this. Thanks, Rudolf -Original Message- From: Brock Noland [mailto:br...@cloudera.com] Sent: Tuesday, December 18, 2012 10:09 PM To: user@flume.apache.org Subject: Re: Flume 1.3.0 - NFS + File Channel Performance Hi, If you do have a chance

Re: Flume 1.3 package

2012-12-19 Thread Brock Noland
Hi, Yes Flume 1.3. will be included in CDH 4.2. Brock On Wed, Dec 19, 2012 at 11:34 AM, Rahul Ravindran rahu...@yahoo.com wrote: Hi, Is Flume 1.3 part of CDH4? Is Flume 1.3 part of any debian repo for installation? I have the link for http://flume.apache.org/download.html which gives me

Re: Question about the Setting up Eclipse wiki page

2012-12-18 Thread Brock Noland
Hi, I use the method mentioned on that page: $ mkdir apache-flume $ cd apache-flume $ git clone https://git-wip-us.apache.org/repos/asf/flume.git $ cd flume $ mvn install -DskipTests $ mvn eclipse:eclipse -DdownloadSources -DdownloadJavadocs Add $HOME/.m2/repository to the my classpath in

Re: Flume 1.3.0 - NFS + File Channel Performance

2012-12-18 Thread Brock Noland
Hi, Hmm, yes in general performance is not going to be great over NFS, but there haven't been any FC changes that stick out here. Could you take 10 thread dumps of the agent running the file channel and 10 thread dumps of the agent sending data to the agent with the file channel? (You can

Re: Flume 1.3.0 - NFS + File Channel Performance

2012-12-18 Thread Brock Noland
We'd need those thread dumps to help confirm but I bet that FLUME-1609 results in a NFS call on each operation on the channel. If that is true, that would explain why it works well on local disk. Brock On Tue, Dec 18, 2012 at 10:17 AM, Brock Noland br...@cloudera.com wrote: Hi, Hmm, yes

Re: Flume 1.3.0 - NFS + File Channel Performance

2012-12-18 Thread Brock Noland
. -- Hari Shreedharan On Tuesday, December 18, 2012 at 8:43 AM, Brock Noland wrote: We'd need those thread dumps to help confirm but I bet that FLUME-1609 results in a NFS call on each operation on the channel. If that is true, that would explain why it works well on local disk. Brock On Tue

Re: Flume 1.3.0 - NFS + File Channel Performance

2012-12-18 Thread Brock Noland
Hi, If you do have a chance, it would great to hear if the patch attached to this JIRA (https://issues.apache.org/jira/browse/FLUME-1794) fixes the performance problem. Brock On Tue, Dec 18, 2012 at 11:25 AM, Brock Noland br...@cloudera.com wrote: Yeah I think we should do that check

Re: Flume-ng Avro RPC and Python

2012-12-18 Thread Brock Noland
Hi, This is because Flume uses the NettyTransceiver and pyton avro only supports HTTPTransciever. This is not using avro, but you should be able to send JSON events to the HTTPSource (http://flume.apache.org/FlumeUserGuide.html#http-source). Brock On Tue, Dec 18, 2012 at 3:13 PM, John Michaels

Re: Flume to stream logs live

2012-12-14 Thread Brock Noland
Hi, FWIW, I was sending log data from Windows I would write a little Windows Log Agent and send the data to the HTTP Source. Brock On Fri, Dec 14, 2012 at 8:47 AM, Kartashov, Andy andy.kartas...@mpac.ca wrote: Flummers, Loved working with Flume 1.2 – very easy and simple configuration, it

Re: Flume/HDFS Encoding

2012-12-14 Thread Brock Noland
½[DATA_HERE]×ùÎ0ÆÜ9Ig::¬ ;0 :[DATA_HERE] I was hoping to avoid the and spaces (I'm assuming they're characters that are encoded such that -cat won't show them). Any thoughts? Thanks again, Chris -Original Message- From: Brock Noland [mailto:br...@cloudera.com] Sent: Friday

Re: Recommendation of parameters for better performance with File Channel

2012-12-12 Thread Brock Noland
Hi, Why not try increasing the batch size on the source and sink to 10,000? Brock On Wed, Dec 12, 2012 at 4:08 AM, Jagadish Bihani jagadish.bih...@pubmatic.com wrote: I am using latest release of flume. (Flume 1.3.0) and hadoop 1.0.3. On 12/12/2012 03:35 PM, Jagadish Bihani wrote: Hi

Re: Getting weird error - 2012-12-10 01:01:42,190 INFO com.cloudera.flume.watchdog.Watchdog: Subprocess exited with value 1

2012-12-10 Thread Brock Noland
Hi, Just to echo Nitin, Flume 1.X is what the community is moving forward with so you will likely find more support from the community on that version. I'd highly recommend upgrading. Cheers, Brock On Mon, Dec 10, 2012 at 3:13 AM, Nitin Pawar nitinpawar...@gmail.com wrote: I just hope you are

Re: AVRO_EVENT problem

2012-12-06 Thread Brock Noland
Hi, Hopefully someone will be able to answer the AVRO issue, in order to help them, what version of Flume are you running? Brock On Thu, Dec 6, 2012 at 8:59 AM, DeCarlo, Thom tdeca...@mitre.org wrote: Hi, I'm just getting started with flume, so I apologize if this is an already known

Re: Host Interceptor

2012-12-06 Thread Brock Noland
I should probably move my changes to my own local class, but this works for now. -- Thom DeCarlo On Tue, 04 Dec 2012 17:15:06 GMT, Brock Noland br...@cloudera.com wrote: Hi, The host interceptor adds the host to the headers, not the body. You'd need to create a new one to add

Re: AVRO_EVENT problem

2012-12-06 Thread Brock Noland
built as 1.4-SNAPSHOT. -- Thom DeCarlo -Original Message- From: Brock Noland [mailto:br...@cloudera.com] Sent: Thursday, December 06, 2012 10:06 AM To: user@flume.apache.org Subject: Re: AVRO_EVENT problem Hi, Hopefully someone will be able to answer the AVRO issue, in order

Re: AVRO_EVENT problem

2012-12-06 Thread Brock Noland
in the JDBC Channel section. -- Thom DeCarlo -Original Message- From: Brock Noland [mailto:br...@cloudera.com] Sent: Thursday, December 06, 2012 1:57 PM To: user@flume.apache.org Subject: Re: AVRO_EVENT problem It seems to me like the object you are trying to write doesn't match

Re: AVRO_EVENT problem

2012-12-06 Thread Brock Noland
OK, I don't really understand how Avro is working here, but I think you should try FileChannel or maybe MemoryChannel for simplicity to see if that works. IE, I think the problem is JDBCChannel. Can you let me know how it turns out? On Thu, Dec 6, 2012 at 1:45 PM, Brock Noland br...@cloudera.com

  1   2   >