Re: Hadoop hardware specs

2008-11-04 Thread Brian Bockelman
Hey Arjit, We use all internal SATA drives in our cluster, which is about 110TB today; if we grow it to our planned 350TB, it will be a healthy mix of worker nodes w/ SATA, large internal chases (12 - 48TB), SCSI attached vaults, and fibre channel vaults. Brian On Nov 4, 2008, at 4:16

Re: Status FUSE-Support of HDFS

2008-11-04 Thread Robert Krüger
Thanks! This is good news. So it's fast enough for our purposes if it turns out to be the same order of magnitude on our systems. Have you used this with rsync? If so, any known issues with that (reading or writing)? Thanks in advance, Robert Pete Wyckoff wrote: Reads are 20-30% slower

Re: Hadoop hardware specs

2008-11-04 Thread Allen Wittenauer
On 11/4/08 2:16 AM, Arijit Mukherjee [EMAIL PROTECTED] wrote: * 1-5 TB external storage I'm curious to find out what sort of specs do people use normally. Is the external storage essential or will the individual disks on each node be sufficient? Why would you need an external storage in

RE: Hadoop hardware specs

2008-11-04 Thread Zhang, Roger
Brian, You seem to have a pretty large cluster. How do you think about the overall performance? Is your implementation on Open-SSH or SSH2? I'm new to this and trying to setup a 20 node cluster. But our Linux boxes enforced F-secure SSH2 already, which I found HDFS 0.18 does not support right

Problem while starting Hadoop

2008-11-04 Thread srikanth . bondalapati
Hi, I am trying to use hadoop 0.18.1. After I start the hadoop, I am able to see namenode running on the master. But, datanode on the client machine is unable to connect to the namenode. I use 2 machines with hostnames lca2-s3-pc01 and lca2-s3-pc04 respectively. It shows the following

[ANNOUNCE] Hadoop release 0.18.2 available

2008-11-04 Thread Nigel Daley
Release 0.18.2 fixes many critical bugs in 0.18.1. For Hadoop release details and downloads, visit: http://hadoop.apache.org/core/releases.html Hadoop 0.18.2 Release Notes are at http://hadoop.apache.org/core/docs/r0.18.2/releasenotes.html Thanks to all who contributed to this release! Nigel

Missing blocks from bin/hadoop text but fsck is all right

2008-11-04 Thread Sagar Naik
Hi, We have a strange problem on getting out some of our files bin/hadoop dfs -text dir/* gives me missing block exceptions. 0/8/11/04 10:45:09 [main] INFO dfs.DFSClient: Could not obtain block blk_6488385702283300787_1247408 from any node: java.io.IOException: No live nodes contain current

Re: SecondaryNameNode on separate machine

2008-11-04 Thread Tomislav Poljak
Konstantin, it works, thanks a lot! Tomislav On Mon, 2008-11-03 at 11:13 -0800, Konstantin Shvachko wrote: You can either do what you just described with dfs.name.dir = dirX or you can start name-node with -importCheckpoint option. This is an automation for copying image files from

Hadoop hardware specs

2008-11-04 Thread Arijit Mukherjee
Hi All We're thinking of setting up a Hadoop cluster which will be used to create a prototype system for analyzing telecom data. The wiki page on machine scaling (http://wiki.apache.org/hadoop/MachineScaling) gives an overview of the node specs and from the Hadoop primer I found the following

HDFS Login Security

2008-11-04 Thread Wasim Bari
Hi, Do we have any Java class for Login purpose to HDFS programmatically like traditional UserName/Password mechanism ? or we can have only system user or user who started NameNode ? Thanks, Wasim

_temporary directories not deleted

2008-11-04 Thread Nathan Marz
Hello all, Occasionally when running jobs, Hadoop fails to clean up the _temporary directories it has left behind. This only appears to happen when a task is killed (aka a speculative execution), and the data that task has outputted so far is not cleaned up. Is this a known issue in

RE: Hadoop hardware specs

2008-11-04 Thread Arijit Mukherjee
One correction - the number 5 in the mail below is my estimation of the number of nodes we might need. Can this be too small a cluster? Arijit Dr. Arijit Mukherjee Principal Member of Technical Staff, Level-II Connectiva Systems (I) Pvt. Ltd. J-2, Block GP, Sector V, Salt Lake Kolkata 700 091,

Re: Hadoop hardware specs

2008-11-04 Thread Brian Bockelman
Hey Roger, SSH is only needed to start and stop daemons - it's not really needed for running Hadoop itself. Currently, we do this through custom site mechanisms, and not through SSH. Brian On Nov 4, 2008, at 10:36 AM, Zhang, Roger wrote: Brian, You seem to have a pretty large cluster.

Re: Seeking Someone to Review Hadoop Article

2008-11-04 Thread Milind Bhandarkar
Tom, Please consider adding it to: http://wiki.apache.org/hadoop/HadoopArticles Thanks, - milind On 11/2/08 5:57 PM, Tom Wheeler [EMAIL PROTECTED] wrote: The article I've written about Hadoop has just been published: http://www.ociweb.com/jnb/jnbNov2008.html I'd like to again thank

Re: Seeking Someone to Review Hadoop Article

2008-11-04 Thread Tom Wheeler
On Tue, Nov 4, 2008 at 3:46 PM, Milind Bhandarkar [EMAIL PROTECTED] wrote: Please consider adding it to: http://wiki.apache.org/hadoop/HadoopArticles Great suggestion -- I've just linked it there as you request. -- Tom Wheeler http://www.tomwheeler.com/

Re: HDFS Login Security

2008-11-04 Thread Alex Loddengaard
Look at the hadoop.job.ugi configuration option. You can manually set a user and the groups that user is a part of. Alex On Tue, Nov 4, 2008 at 1:42 PM, Wasim Bari [EMAIL PROTECTED] wrote: Hi, Do we have any Java class for Login purpose to HDFS programmatically like traditional

Re: Seeking Someone to Review Hadoop Article

2008-11-04 Thread Ravion
Dear Tom, Here is one more writtien by our core data warehouse team. Appreciate if you can add it in Hadoop article links, so that community is more benefitted:- http://www.javaworld.com/javaworld/jw-09-2008/jw-09-hadoop.html Best, Ravion - Original Message - From: Tom Wheeler

Re: Seeking Someone to Review Hadoop Article

2008-11-04 Thread Tom Wheeler
Done. I also added a link to the article that Amit Kumar Saha wrote just a few weeks ago for linux.com. On Tue, Nov 4, 2008 at 4:37 PM, Ravion [EMAIL PROTECTED] wrote: Dear Tom, Here is one more writtien by our core data warehouse team. Appreciate if you can add it in Hadoop article links,

Re: Seeking Someone to Review Hadoop Article

2008-11-04 Thread Ravion
Great and thank you!! Best Ravion - Original Message - From: Tom Wheeler [EMAIL PROTECTED] To: core-user@hadoop.apache.org Sent: Wednesday, November 05, 2008 5:47 AM Subject: Re: Seeking Someone to Review Hadoop Article Done. I also added a link to the article that Amit Kumar Saha

too many open files? Isn't 4K enough???

2008-11-04 Thread Yuri Pradkin
Hi, I'm running current snapshot (-r709609), doing a simple word count using python over streaming. I'm have a relatively moderate setup of 17 nodes. I'm getting this exception: java.io.FileNotFoundException:

Re: Missing blocks from bin/hadoop text but fsck is all right

2008-11-04 Thread Sagar Naik
Hi, We were hitting file descriptor limits :). Increased it and got solved. Thanks Jason -Sagar Sagar Naik wrote: Hi, We have a strange problem on getting out some of our files bin/hadoop dfs -text dir/* gives me missing block exceptions. 0/8/11/04 10:45:09 [main] INFO dfs.DFSClient:

Re: Seeking Someone to Review Hadoop Article

2008-11-04 Thread Amit k. Saha
On Wed, Nov 5, 2008 at 3:17 AM, Tom Wheeler [EMAIL PROTECTED] wrote: Done. I also added a link to the article that Amit Kumar Saha wrote just a few weeks ago for linux.com. Thanks you Tom :-) -Amit -- Amit Kumar Saha http://blogs.sun.com/amitsaha/ http://amitsaha.in.googlepages.com/ Skype:

Re: _temporary directories not deleted

2008-11-04 Thread Amareshwari Sriramadasu
Nathan Marz wrote: Hello all, Occasionally when running jobs, Hadoop fails to clean up the _temporary directories it has left behind. This only appears to happen when a task is killed (aka a speculative execution), and the data that task has outputted so far is not cleaned up. Is this a

Q, Find a counter

2008-11-04 Thread Edward J. Yoon
Hi, I'd like to find a counter of REDUCE_OUTPUT_RECORDS after job done. BTW, org.apache.hadoop.mapred.Task.Counter is not visible and, a findCounter(String, int, String) is deprecated. What is the best code? -- Best regards, Edward J. Yoon @ NHN, corp. [EMAIL PROTECTED] http://blog.udanax.org