I too had the same problem (in flume 1.4).
We had checked that the input data is actually utf-8.
When we used input charset as 'unicode' it worked.
By "worked" I mean, it didn't give this exception.
At the destination that data was garbage for us?
Is it a known thing or are we missing anything?
Hi
I am using spooling directory source with apache flume 1.4.0 and having
the problem
that *same configuration works on some machines and doesn't work on some
machines.*
Configuration used to work with flume 1.3.1. (Only the properties
related to deserializer are changed).
*Configuration
Hi
Based on our observations on our production setup in flume:
We have seen file roll sink delivering almost 1% events greater than those
delivered by HDFS sink per day.
(We have replicating setup and two different
file channels for the sinks).
Configuration :
Flume version:1.3.1
Flume
zer to
modify the event.
-Jeff
On Tue, Apr 16, 2013 at 11:12 PM, Jagadish Bihani
mailto:jagadish.bih...@pubmatic.com>> wrote:
Hi
If anybody has any inputs on this that will surely help.
Regards,
Jagadish
Hi
If anybody has any inputs on this that will surely help.
Regards,
Jagadish
On 04/16/2013 12:06 PM, Jagadish Bihani wrote:
Hi
We have a use case in which
1. spooling source reads data.
2. It needs to write events into multiple channels. It should apply
interceptor only when putting into
Hi
We have a use case in which
1. spooling source reads data.
2. It needs to write events into multiple channels. It should apply
interceptor only when putting into one channel and should put
the event as it is while putting into another channel.
Possible approach we have thought:
1. Create 2
than what is written in point 1).
Regards,
Jagadish
On 02/01/2013 06:45 PM, Alexander Alten-Lorenz wrote:
Ah, I missed your response. Inline
On Jan 30, 2013, at 3:43 PM, Jagadish Bihani
wrote:
Hi
Thanks Alexander for the reply.
I have added my thoughts in line.
On 01/30/2013 11:56 A
,
Jagadish
On 01/30/2013 08:13 PM, Jagadish Bihani wrote:
Hi
Thanks Alexander for the reply.
I have added my thoughts in line.
On 01/30/2013 11:56 AM, Alexander Alten-Lorenz wrote:
Hi,
If the agents (Tier 1) have access to HDFS, each single client can
put data into HDFS. But this doesn't
n then its data wont reach HDFS.
Similarly in 2 tier scenario : if a node from 1st tier goes down then
its data
wont reach HDFS.
Could you please elaborate if I am missing something?
Cheers,
Alex
On Jan 30, 2013, at 7:05 AM, Jagadish Bihani
wrote:
Hi
In our scenario there are around 3
Hi
In our scenario there are around 30 machines from which we want to put
data into HDFS.
Now the approach we thought of initially was:
1. First tier : Agent which collect data from source then pass it to
avro sink.
2. Second tier: Lets call those agents 'collectors' which collect data
fr
Hi
I was thinking of 2 possible approaches for this :
Approach 1. Deduplication at destination- Using spooling dir source
-file channel - hdfs sink combination:
===
-- After the HDFS sink has written to the HDFS directory. We can run
7:36 AM, Brock Noland wrote:
Hi,
Why not try increasing the batch size on the source and sink to 10,000?
Brock
On Wed, Dec 12, 2012 at 4:08 AM, Jagadish Bihani
mailto:jagadish.bih...@pubmatic.com>>
wrote:
I am using latest release of flume. (Flume 1.3.0) and hadoop 1.0.3.
On 12/12/201
I am using latest release of flume. (Flume 1.3.0) and hadoop 1.0.3.
On 12/12/2012 03:35 PM, Jagadish Bihani wrote:
Hi
I am able to write maximum 1.5 MB/sec data to HDFS (without compression)
using File Channel. Are there any recommendations to improve the
performance?
Has anybody achieved
Hi
I am able to write maximum 1.5 MB/sec data to HDFS (without compression)
using File Channel. Are there any recommendations to improve the
performance?
Has anybody achieved around 10 MB/sec with file channel ? If yes please
share the
configuration like (Hardware used, RAM allocated and batch
container. The latter two are splittable
and properly handle several compression codecs, including Snappy,
which is a great way to go if you can do it.
Regards,
Mike
On Fri, Nov 2, 2012 at 12:50 AM, Jagadish Bihani
mailto:jagadish.bih...@pubmatic.com>>
wrote:
Hi
Any inputs
Hi
Any inputs on this?
It looks like a basic thing which, I guess, must have been handled in flume
On 10/30/2012 10:31 PM, Jagadish Bihani wrote:
Text.
Few updates on that:
-- It looks like some header issue.
-- When I copyToLocal the file and then again copy it back to HDFS,
map reduce job
/2012 09:15 PM, Brock Noland wrote:
What kind of files is your sink writing out? Text, Sequence, etc?
On Fri, Oct 26, 2012 at 8:02 AM, Jagadish Bihani
wrote:
Same thing happens even for gzip.
Regards,
Jagadish
On 10/26/2012 04:30 PM, Jagadish Bihani wrote:
Hi
I have a very peculiar scenario
Does anyone have any inputs about why below mentioned behaviour might
have happened?
On 10/26/2012 06:32 PM, Jagadish Bihani wrote:
Same thing happens even for gzip.
Regards,
Jagadish
On 10/26/2012 04:30 PM, Jagadish Bihani wrote:
Hi
I have a very peculiar scenario.
1. My HDFS sink
Same thing happens even for gzip.
Regards,
Jagadish
On 10/26/2012 04:30 PM, Jagadish Bihani wrote:
Hi
I have a very peculiar scenario.
1. My HDFS sink creates a bz2 file. File is perfectly fine I can
decompress it and
read it. It has 0.2 million records.
2. Now I give that file to map
Hi
I have a very peculiar scenario.
1. My HDFS sink creates a bz2 file. File is perfectly fine I can
decompress it and
read it. It has 0.2 million records.
2. Now I give that file to map-reduce job (hadoop 1.0.3) and
surprisingly it only
reads first 100 records.
3. I then decompress the sam
on, Oct 22, 2012 at 8:59 AM, Brock Noland <mailto:br...@cloudera.com>> wrote:
Which version? 1.2 or trunk?
On Monday, October 22, 2012 at 8:18 AM, Jagadish Bihani wrote:
Hi
This is the simplistic configuration with which I am getting
lower performance.
Even wi
riting in this
case).
Hope useful for you.
PS : I heard that OS has demon thread to flush page cache to
disk asynchronously with second latency, does it's effective for
amount of data with tolerant loss?
-Regards
Denny Ye
2012/10/22 Jagadish Bihani <mailto:jagadish.bih...@pubm
int iterations = len/PAGESIZE;
int i;
struct timeval t0,t1;
for(i=0;i
Hi,
On Wed, Oct 10, 2012 at 11:22 AM, Jagadish Bihani
<mailto:jagadish.bih...@pubmatic.com>
wrote:
Hi Brock
I will surely look into 'fsync lie
16 core processors, 16 GB RAM etc.)
Regards,
Jagadish
On 10/10/2012 11:30 PM, Brock Noland wrote:
Hi,
On Wed, Oct 10, 2012 at 11:22 AM, Jagadish Bihani
wrote:
Hi Brock
I will surely look into 'fsync lies'.
But as per my experiments I think "file channel" is causing t
what
I am referring to.
A spinning disk can do 100 fsync operations per second (this is done
at the end of every batch). That is how I estimated your event size,
40KB/second is doing 40KB / 100 = 409 bytes.
Once again, if you want increased performance, you should increase the
batch size.
Brock
Hi
Yes. It is around 480 - 500 bytes.
On 10/10/2012 09:24 PM, Brock Noland wrote:
How big are your events? Average about 400 bytes?
Brock
On Wed, Oct 10, 2012 at 5:11 AM, Jagadish Bihani
wrote:
Hi
Thanks for the inputs Brock. After doing several experiments
eventually problem boiled down
our data is
actually written to the drive. If you search for "fsync lies" you'll
find more information on this.
You probably want to increase the batch size to get better performance.
Brock
On Tue, Oct 9, 2012 at 2:46 AM, Jagadish Bihani
wrote:
Hi
My flume setup is:
Source Agent
Hi
My flume setup is:
Source Agent : cat source - File Channel - Avro Sink
Dest Agent : avro source - File Channel - HDFS Sink.
There is only 1 source agent and 1 destination agent.
I measure throughput as amount of data written to HDFS per second.
( I have rolling interval 30 sec; so If 6
Hi
What is the implication of this property "hdfs.callTimeout". What adverse
effect it may have if I change it ?
I am getting timeout exception as:
Noted checkpoint for file: /home/hadoop/flume_channel/dataDir15/log-21,
id: 21, checkpoint position: 1576210481
12/10/03 23:19:45 INFO file.LogFile
en go with 1000, depending on
the application you may want to go lower or higher.
Regards,
Mike
On Wed, Sep 26, 2012 at 8:23 PM, Jagadish Bihani
mailto:jagadish.bih...@pubmatic.com>>
wrote:
Hi
I had few doubts about HDFS sink Bucketwriter :
-- How does HDFS's bu
Hi
I had few doubts about HDFS sink Bucketwriter :
-- How does HDFS's bucketwriter works? What criteria does it use to create
another bucket?
-- Creation of a file in HDFS is function of how many parameters ? Initially
I thought it is function of only rolling parameter(interval/size). But
appa
But there exists already a different property called batchSize.
-Harish
On Wed, Sep 26, 2012 at 7:30 AM, Brock Noland <mailto:br...@cloudera.com>> wrote:
A better name for that property would be batchSize.
Brock
On Wed, Sep 26, 2012 at 5:13 AM, Jagadish Bihani
mailto
Hi
What is the significance of this property?
I think because of this property almost 100 files are being created within
a particular rolling interval instead of 1.
If I set it to 1; what performance penalty it may cause?
Regards,
Jagadish
Hi
In my flume agent with "Avro source-File Channel-HDFS sink"; I am
getting following exception:
*java.io.IOException: Failed to obtain lock for writing to the log. Try
increasing the log write timeout value or disabling it by setting it to
0. [channel=fileChannel]**
*
when I set my file r
he default limit of 1024 is too low for nearly every modern
application.
In regards to the rolling, can you paste you config and describe in
more detail the unexpected behavior you are seeing?
Brock
On Tue, Sep 18, 2012 at 7:08 AM, Jagadish Bihani
mailto:jagadish.bih...@pubmatic.com>>
seconds.
Is it something to do with HDFS batch size?
Regards,
Jagadish
Original Message
Subject:HDFS file rolling behaviour
Date: Thu, 13 Sep 2012 14:26:56 +0530
From: Jagadish Bihani
To: user@flume.apache.org
Hi
I use two flume agents:
1. flume_agent 1 which is
Hi
I use two flume agents:
1. flume_agent 1 which is a source with (exec source -file channel -avro
sink)
2. flume_agent 2 which is a dest with (avro source -file channel - HDFS
sink)
I have observed that for HDFS sink with rolling by *file size/number of
events* it
creates a lot of simulta
to do that in roughly 4 spots throughout the script. That
cleared it up for me. I'd love to hear of a better way to do that
though :)
Chris
On Mon, Sep 10, 2012 at 9:26 AM, Jagadish Bihani
mailto:jagadish.bih...@pubmatic.com>>
wrote:
Hi
My flume 1.2.0 setup is working f
Hi
My flume 1.2.0 setup is working fine on one machine.
But when I ran it on another machine it gave me syntax error while starting
agent :
"bin/flume-ng: line 81: syntax error in conditional expression:
unexpected token `('
bin/flume-ng: line 81: syntax error near `^java\.library\.path=(.'
bin
;m not entirely
sure as I can't dive deeply into the source right now. I wouldn't be
surprised if it's some kind of congestion problem and lack of
logging(or your log levels are just too high, try switching them to
INFO or DEBUG?) that will be resolved once you get the throughput up.
Hi
I encountered an problem in my scenario with netcat source. Setup is
Host A: Netcat source -file channel -avro sink
Host B: Avro source - file channel - HDFS sink
But to simplify it I have created a single agent with "Netcat Source"
and "file roll sink"*
*It is *:
*Host A: Netcat source - fi
Thanks ,
Jagadish
On 08/10/2012 03:30 PM, Jagadish Bihani wrote:
Hi
Thanks all for the inputs. After the ini
adish
On 08/10/2012 03:30 PM, Jagadish Bihani wrote:
Hi
Thanks all for the inputs. After the initial problem I was able
to start flume except in one scenario in
which I use HDFS as sink.
I have a production machine
x27;t attached logs or error
messages it's hard to say what happen.
best
- Alex
Jagadish Bihani wrote:
Hi
I have downloaded the tarball of latest flume-ng 1.2.0.
I have JAVA_HOME properly set.
To begin with I have followed the instructions in "
https://cwiki.apache.org/FLUME/getting-started.html";
as it is. And even for that basic example:
My flume agent stucks printing the following output an
Hi
In Flume-ng is there any way using exec (tail -F) as the source to get
only the new lines which are being added to the log file ?
(i.e. there is a growing log file and we want to transfer all the logs
using flume
without duplication of logs)
I understand if something fails and as tail does
to check configuration property io.compression.codecs for
inclusion of org.apache.hadoop.io.compress.BZip2Codec.
Jarcec
On Jul 25, 2012, at 12:20 PM, Jagadish Bihani wrote:
Hi
I have downloaded and deployed latest flume code from repository
"https://svn.apache.org/repos/asf/flume/trunk/&
Hi
I have downloaded and deployed latest flume code from repository
"https://svn.apache.org/repos/asf/flume/trunk/";
In the conf file I am using following properties for the hdfs sink:
agent.sinks.hdfsSink.type = hdfs
agent.sinks.hdfsSink.hdfs.path=
agent.sinks.hdfsSink.hdfs.fileType =Compres
Hi
We want to deploy flume-ng in the production environment in our
organization.
Here is the following scenario for which I am not able to find the answer:
1. We receive logs using 'tail -f' source.
2. Now the agent process gets killed.
3. We restart it.
4. How would the restarted agent will
49 matches
Mail list logo