Re: Best practices to recover from Corrupt Namenode

2012-01-20 Thread praveenesh kumar
Thanks a lot guys, for such illustrative explanation. I will go through the links you send and will get back with any doubts I have. Thanks, Praveenesh On Thu, Jan 19, 2012 at 2:17 PM, Sameer Farooqui wrote: > Hey Praveenesh, > > Here's a good article on HDFS by some senior Yahoo!, Facebook, Hor

Regarding Hadoop 1.0.0 release

2012-01-20 Thread renuka
down vote favorite share [fb] share [tw] Hadoop 1.0.0 is released in dec 2011. And its in Beta version. As per the below link security feature•Security (strong authentication via Kerberos authentication protocol) is added in hadoop 1.0.0 release. http://www.infoq.com/news/2012/01/apache-had

RE: Issues during setting up hadoop security cluster

2012-01-20 Thread Emma Lin
After remove the upper-case, the problem disappeared. Now I get node manager connected to resource manager successfully. Thank you Vinod. But now, I get another issue to connect Name Node from Data Node. The log in Name Node is as following: 2012-01-20 18:17:02,127 WARN ipc.Server (Server.java:

FUSE and Hadoop 0.23

2012-01-20 Thread Dejan Menges
Hi, Does anyone has some production ready experience with latest HDFS and one of FUSE-like products? Tried until now fuse-dfs and hdfs-fuse, but looks like it's not ported for newer Hadoop installations, and lot of issues with Java library compatibility. What's, from your experience, best way to

Issues with reduce writing results back to HDFS

2012-01-20 Thread Stu Teasdale
Hi all, I have a fairly simple hadoop streaming map-reduce task which takes a batch of 60ishlog files (each around 5-10MB gzip compressed) and processes them using a cdh3u2 based cluster. The map stage of the job finishes successfully and reasonably swiftly, but the reduce is taking hours to c

HDFS Log File Sizes

2012-01-20 Thread Eli Finkelshteyn
Hi Folks, I'm currently working on implementing a logging system for a new Hadoop cluster I've setup. The way I've always seen these setup in the past was logging split off by days with individual files sharded off at around 10x HDFS block size. I haven't had any problems with this methodology

Problems with timeout when a Hadoop job generates a large number of key-value pairs

2012-01-20 Thread Steve Lewis
We have been having problems with mappers timing out after 600 sec when the mapper writes many more, say thousands of records for every input record - even when the code in the mapper is small and fast. I have no idea what could cause the system to be so slow and am reluctant to raise the 600 sec l

Re: Issues during setting up hadoop security cluster

2012-01-20 Thread Vinod Kumar Vavilapalli
You are on the right path for sure. Where are you updating the JCE policy jar? (I know the RM-NM case is working after this, so just checking) May be the datanodes are not using the same JRE that you updated with the new policy jar? Can you check that? jsvc shouldn't cause any more issues, it sho

Re: Problems with timeout when a Hadoop job generates a large number of key-value pairs

2012-01-20 Thread Raj V
Steve There seems to be something wrong with either networking or storage. Why does it take "hours" to generate 4GB text file? Raj > > From: Steve Lewis >To: common-user ; Josh Patterson > >Sent: Friday, January 20, 2012 9:16 AM >Subject: Problems with timeo

Re: Problems with timeout when a Hadoop job generates a large number of key-value pairs

2012-01-20 Thread Michel Segel
Steve, If you want me to debug your code, I'll be glad to set up a billable contract... ;-) What I am willing to do is to help you to debug your code... Did you time how long it takes in the Mapper.map() method? The reason I asked this is to first confirm that you are failing within a map() met

Re: Problems with timeout when a Hadoop job generates a large number of key-value pairs

2012-01-20 Thread Alex Kozlov
Hi Steve, I ran your job on our cluster and it does not timeout. I noticed that each mapper runs for a long time: one way to avoid a timeout is to update a user counter. As long as this counter is updated within 10 minutes, the task should not timeout (as MR knows that something is being done).

Re: Problems with timeout when a Hadoop job generates a large number of key-value pairs

2012-01-20 Thread Vinod Kumar Vavilapalli
Every so often, you should do a context.progress() so that the framework knows that this map is doing useful work. That will prevent the framework from killing it after 10 mins. The framework automatically does this every time you do a context.write()/context.setStatus(), but if the map is stuck fo

Re: Problems with timeout when a Hadoop job generates a large number of key-value pairs

2012-01-20 Thread Steve Lewis
Well - I am running the job over a vpn so I am not on a fast network to the cluster. The job runs fine for small input files - we did not run into issues until the input file got in the multi gigabyte range On Fri, Jan 20, 2012 at 11:29 AM, Raj V wrote: > Steve > > There seems to be something wr

Re: Problems with timeout when a Hadoop job generates a large number of key-value pairs

2012-01-20 Thread Steve Lewis
On Fri, Jan 20, 2012 at 12:18 PM, Michel Segel wrote: > Steve, > If you want me to debug your code, I'll be glad to set up a billable > contract... ;-) > > What I am willing to do is to help you to debug your code.. The code seems to work well for small input files and is basically a standard sa

Re: Problems with timeout when a Hadoop job generates a large number of key-value pairs

2012-01-20 Thread Steve Lewis
One thing I can say for sure is that generateSubStrings() is not slow - Every input line in my sample is 100 characters and the timing should be very similar from one run to the next. This sample is a simplification of a more complex real problem where we see timeouts when a map generates signifi

Re: Problems with timeout when a Hadoop job generates a large number of key-value pairs

2012-01-20 Thread Steve Lewis
Good catch on the Configured - In my tests is extends my subclass of Configured but a I took out any dependencies on my environment. Interesting - I strongly suspect a disk IO or network problem since my code is very simple and very fast. If you add lines to generateSubStrings to limit String le

Re: Problems with timeout when a Hadoop job generates a large number of key-value pairs

2012-01-20 Thread Steve Lewis
Interesting - I strongly suspect a disk IO or network problem since my code is very simple and very fast. If you add lines to generateSubStrings to limit String length to 100 characters (I think it is always that but this makes su public static String[] generateSubStrings(String inp, int minLeng

Re: Problems with timeout when a Hadoop job generates a large number of key-value pairs

2012-01-20 Thread Michael Segel
Steve, Ok, first your client connection to the cluster is a non issue. If you go in to /etc/Hadoop/conf That supposed to be a little h but my iPhone knows what's best... Look and see what you have set for your bandwidth... I forget which parameter but there are only a couple that deal with ban

Re: Problems with timeout when a Hadoop job generates a large number of key-value pairs

2012-01-20 Thread Paul Ho
I think the balancing bandwidth property you are looking for is in hdfs-site.xml: dfs.balance.bandwidthPerSec 402653184 Set the value that makes most sense for your NIC. But I thought this is only for balancing. On Jan 20, 2012, at 3:43 PM, Michael Segel wrote: > Ste

Re: Problems with timeout when a Hadoop job generates a large number of key-value pairs

2012-01-20 Thread Michael Segel
Thats the one ... Sent from my iPhone On Jan 20, 2012, at 6:28 PM, "Paul Ho" wrote: > I think the balancing bandwidth property you are looking for is in > hdfs-site.xml: > > >dfs.balance.bandwidthPerSec >402653184 > > > Set the value that makes most sense for your NIC