I want to compress the log file using lzo, whether I just only need to change
the flume configure file such as :
property
nameflume.collector.dfs.compress.codec/name
valueLzopCodec/value
/property
or I need others, if you had some idea or experience about this , please
Tony,
On Wed, Jun 6, 2012 at 3:11 AM, Tony Dean tony.d...@sas.com wrote:
dfs.umaskmode = umask (I believe this should be used in lieu of dfs.umask) –
it appears to set the permissions for files created in hadoop fs (minus
execute permission).
why was dffs.umask deprecated? what’s
check HADOOP_DATANODE_OPTS in hadoop-env.sh . It should have something like
HADOOP_DATANODE_OPTS=-Xmx1024m -Dsecurity.audit.logger=ERROR,DRFAS
From: Tony Dean tony.d...@sas.com
To: core-u...@hadoop.apache.org core-u...@hadoop.apache.org
Sent: Sunday, June
what is dfs.https.address set to?
- Original Message -
From: ramon@accenture.com ramon@accenture.com
To: core-u...@hadoop.apache.org
Cc:
Sent: Monday, June 4, 2012 4:07 AM
Subject: SecondaryNameNode not connecting to NameNode :
PriviledgedActionException
Hello. I'm
Now is pointing correctly, Rajive. That was the problem. Thx for your help.
De: rajive [rajiv...@yahoo.com]
Enviado el: miércoles, 06 de junio de 2012 14:01
Para: common-user@hadoop.apache.org; Pin, Ramón
Asunto: Re: SecondaryNameNode not connecting to
We have continuous flow of data into the sequence file. I am wondering what
would be the ideal file size before file gets rolled over. I know too many
small files are not good but could someone tell me what would be the ideal
size such that it doesn't overload NameNode.
It does not matter what the file size is because the file size is
split into blocks which is what the NN tracks.
For larger deployments you can go with a large block size like 256MB
or even 512MB. Generally the bigger the file the better split
calculation is very input format dependent however.
The block size and file roll size values depend on a few items here:
- Rate at which the data is getting written.
- Frequency of your processing layer that is expected to run over
these files (sync() can help here though).
- The way by which you'll be processing these (MR/etc.).
Too many small
Thanks Harsh!
And is this the right source code for the shuffling that is done in the reduce
task?
http://search-hadoop.com/c/Hadoop:/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Shuffle.java%7C%7Cshuffle+sort
Sean,
Yes thats the one for the shuffles that happen on reduce side (pull
model), you can drill down from that class onwards into seeing how
fetchers operate, etc.
On Wed, Jun 6, 2012 at 9:54 PM, Barry, Sean F sean.f.ba...@intel.com wrote:
Thanks Harsh!
And is this the right source code for
If I type 'http://localhost:50070' or 'http://localhost:9000' to see the
nodes,my browser shows me nothing I think it can't connect to the server. I
tested my hadoop with this command:
hadoop jar hadoop-*test*.jar TestDFSIO -write -nrFiles 10 -fileSize 1000
but too didn't work and it tries to
So I'm assuming that there is a push side also? Is it part of the map output?
-sb
-Original Message-
From: Harsh J [mailto:ha...@cloudera.com]
Sent: Wednesday, June 06, 2012 9:33 AM
To: common-user@hadoop.apache.org
Subject: Re: Shuffle/sort
Sean,
Yes thats the one for the shuffles
No (sorry if I confused) the outputs are pulled from TaskTrackers'
HTTP server, which access the local (mapred.local.dir) file outputs
from maps, and serve it to the requester (reduce process). There is no
'push' in MR in this phase.
On Wed, Jun 6, 2012 at 10:06 PM, Barry, Sean F
Babak,
Probably, your namenode is not up. Check the logs of namenode first.
Also, please specify the version? the mode in which you are running hadoop?
On Wed, Jun 6, 2012 at 9:35 AM, Babak Bastan babak...@gmail.com wrote:
If I type 'http://localhost:50070' or 'http://localhost:9000' to see
Many factors to consider than just the size of the file. . How long can
you wait before you *have to* process the data? 5 minutes? 5 hours? 5
days? If you want good timeliness, you need to roll-over faster. The
longer you wait:
1. the lesser the load on the NN.
2. but the poorer the
By default the logs are in /var/log/hadoop or /var/logs/hadoop
Which mode are u running? Standalone? Pseudo distributed? Distributed?
Best Regards,
Anil
On Jun 6, 2012, at 9:45 AM, Babak Bastan babak...@gmail.com wrote:
Thank you for your answer,where is the log of namemode?How can I control
On Wed, Jun 6, 2012 at 9:48 AM, M. C. Srivas mcsri...@gmail.com wrote:
Many factors to consider than just the size of the file. . How long can
you wait before you *have to* process the data? 5 minutes? 5 hours? 5
days? If you want good timeliness, you need to roll-over faster. The
longer
On a similar note have there been any standalone java apps that you know of
that implement MapReduce with shuffle/sort without using a distributed system.
Maybe just used for benchmark purposes
-Original Message-
From: Harsh J [mailto:ha...@cloudera.com]
Sent: Wednesday, June 06, 2012
Hi
I am using version 1.0.1 and the so called reduce hang problem had to do with
my screw up in cluster configuration, which i have since fixed, or so i think.
However, this raised some other questions, hence this email.
- I have a bunch of MR jobs that run daily and i noticed that one of them
Hi,
Within my map reduce programs; I am using an external JAVA library to help
parse my raw files. When I am submitting my map reduce program, I am getting
errors as the external class being referenced is not identified. Later, I
explicitly specified the external jar being referenced with the
Hello Karan,
Did you read this article
http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
You can have external jars in lib directory in your jar while packaging
your jar
or you can use
21 matches
Mail list logo