flume+lzo

2012-06-06 Thread yingnan.ma
I want to compress the log file using lzo, whether I just only need to change the flume configure file such as : property nameflume.collector.dfs.compress.codec/name valueLzopCodec/value /property or I need others, if you had some idea or experience about this , please

Re: hadoop file permission 1.0.3 (security)

2012-06-06 Thread Harsh J
Tony, On Wed, Jun 6, 2012 at 3:11 AM, Tony Dean tony.d...@sas.com wrote: dfs.umaskmode = umask (I believe this should be used in lieu of dfs.umask) – it appears to set the permissions for files created in hadoop fs (minus execute permission). why was dffs.umask deprecated?  what’s

Re: datanode security (v 1.0.3)

2012-06-06 Thread Rajiv Chittajallu
check HADOOP_DATANODE_OPTS in hadoop-env.sh . It should have something like HADOOP_DATANODE_OPTS=-Xmx1024m -Dsecurity.audit.logger=ERROR,DRFAS From: Tony Dean tony.d...@sas.com To: core-u...@hadoop.apache.org core-u...@hadoop.apache.org Sent: Sunday, June

Re: SecondaryNameNode not connecting to NameNode : PriviledgedActionException

2012-06-06 Thread rajive
what is dfs.https.address set to? - Original Message - From: ramon@accenture.com ramon@accenture.com To: core-u...@hadoop.apache.org Cc: Sent: Monday, June 4, 2012 4:07 AM Subject: SecondaryNameNode not connecting to NameNode : PriviledgedActionException Hello. I'm

RE: SecondaryNameNode not connecting to NameNode : PriviledgedActionException

2012-06-06 Thread ramon.pin
Now is pointing correctly, Rajive. That was the problem. Thx for your help. De: rajive [rajiv...@yahoo.com] Enviado el: miércoles, 06 de junio de 2012 14:01 Para: common-user@hadoop.apache.org; Pin, Ramón Asunto: Re: SecondaryNameNode not connecting to

Ideal file size

2012-06-06 Thread Mohit Anchlia
We have continuous flow of data into the sequence file. I am wondering what would be the ideal file size before file gets rolled over. I know too many small files are not good but could someone tell me what would be the ideal size such that it doesn't overload NameNode.

Re: Ideal file size

2012-06-06 Thread Edward Capriolo
It does not matter what the file size is because the file size is split into blocks which is what the NN tracks. For larger deployments you can go with a large block size like 256MB or even 512MB. Generally the bigger the file the better split calculation is very input format dependent however.

Re: Ideal file size

2012-06-06 Thread Harsh J
The block size and file roll size values depend on a few items here: - Rate at which the data is getting written. - Frequency of your processing layer that is expected to run over these files (sync() can help here though). - The way by which you'll be processing these (MR/etc.). Too many small

RE: Shuffle/sort

2012-06-06 Thread Barry, Sean F
Thanks Harsh! And is this the right source code for the shuffling that is done in the reduce task? http://search-hadoop.com/c/Hadoop:/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Shuffle.java%7C%7Cshuffle+sort

Re: Shuffle/sort

2012-06-06 Thread Harsh J
Sean, Yes thats the one for the shuffles that happen on reduce side (pull model), you can drill down from that class onwards into seeing how fetchers operate, etc. On Wed, Jun 6, 2012 at 9:54 PM, Barry, Sean F sean.f.ba...@intel.com wrote: Thanks Harsh! And is this the right source code for

unable to check nodes on hadoop

2012-06-06 Thread Babak Bastan
If I type 'http://localhost:50070' or 'http://localhost:9000' to see the nodes,my browser shows me nothing I think it can't connect to the server. I tested my hadoop with this command: hadoop jar hadoop-*test*.jar TestDFSIO -write -nrFiles 10 -fileSize 1000 but too didn't work and it tries to

RE: Shuffle/sort

2012-06-06 Thread Barry, Sean F
So I'm assuming that there is a push side also? Is it part of the map output? -sb -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: Wednesday, June 06, 2012 9:33 AM To: common-user@hadoop.apache.org Subject: Re: Shuffle/sort Sean, Yes thats the one for the shuffles

Re: Shuffle/sort

2012-06-06 Thread Harsh J
No (sorry if I confused) the outputs are pulled from TaskTrackers' HTTP server, which access the local (mapred.local.dir) file outputs from maps, and serve it to the requester (reduce process). There is no 'push' in MR in this phase. On Wed, Jun 6, 2012 at 10:06 PM, Barry, Sean F

Re: unable to check nodes on hadoop

2012-06-06 Thread anil gupta
Babak, Probably, your namenode is not up. Check the logs of namenode first. Also, please specify the version? the mode in which you are running hadoop? On Wed, Jun 6, 2012 at 9:35 AM, Babak Bastan babak...@gmail.com wrote: If I type 'http://localhost:50070' or 'http://localhost:9000' to see

Re: Ideal file size

2012-06-06 Thread M. C. Srivas
Many factors to consider than just the size of the file. . How long can you wait before you *have to* process the data? 5 minutes? 5 hours? 5 days? If you want good timeliness, you need to roll-over faster. The longer you wait: 1. the lesser the load on the NN. 2. but the poorer the

Re: unable to check nodes on hadoop

2012-06-06 Thread Anil Gupta
By default the logs are in /var/log/hadoop or /var/logs/hadoop Which mode are u running? Standalone? Pseudo distributed? Distributed? Best Regards, Anil On Jun 6, 2012, at 9:45 AM, Babak Bastan babak...@gmail.com wrote: Thank you for your answer,where is the log of namemode?How can I control

Re: Ideal file size

2012-06-06 Thread Mohit Anchlia
On Wed, Jun 6, 2012 at 9:48 AM, M. C. Srivas mcsri...@gmail.com wrote: Many factors to consider than just the size of the file. . How long can you wait before you *have to* process the data? 5 minutes? 5 hours? 5 days? If you want good timeliness, you need to roll-over faster. The longer

RE: Shuffle/sort

2012-06-06 Thread Barry, Sean F
On a similar note have there been any standalone java apps that you know of that implement MapReduce with shuffle/sort without using a distributed system. Maybe just used for benchmark purposes -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: Wednesday, June 06, 2012

Reduce task does not time out if one the mapper hosts is not reachable

2012-06-06 Thread Giridharan Anantharaman
Hi I am using version 1.0.1 and the so called reduce hang problem had to do with my screw up in cluster configuration, which i have since fixed, or so i think. However, this raised some other questions, hence this email. - I have a bunch of MR jobs that run daily and i noticed that one of them

MapReduce - Libjars

2012-06-06 Thread karanveer.singh
Hi, Within my map reduce programs; I am using an external JAVA library to help parse my raw files. When I am submitting my map reduce program, I am getting errors as the external class being referenced is not identified. Later, I explicitly specified the external jar being referenced with the

Re: MapReduce - Libjars

2012-06-06 Thread Jagat Singh
Hello Karan, Did you read this article http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/ You can have external jars in lib directory in your jar while packaging your jar or you can use