RE: Hadoop configuration

2011-12-23 Thread Uma Maheswara Rao G
Hi Humayun , Lets assume you have JT, TT1, TT2, TT3 Now you should configure the \etc\hosts like below examle 10.18.xx.1 JT 10.18.xx.2 TT1 10.18.xx.3 TT2 10.18.xx.4 TT3 Configure the same set in all the machines, so that all task trackers can talk each other wi

Re: Problems with Rumen in Hadoop-0.21.0

2011-12-23 Thread FujizawaMiyuki
Finally I've found why we can't use rumen in Hadoop-0.21, I've just seen the logs of hadoop-0.20 and compared them with those of 0.21, then I found there is a BIG difference between them in the format, that's why we can't use rumen in 0.21. BTW, the logs' format in 0.23 has also changed a little b

Wierd problem in installing hadoop on 2 machines.

2011-12-23 Thread praveenesh kumar
Hello people, So I am trying to install hadoop .20.205 on 2 machines Individually I am able to run hadoop on each machines. Now when I am configuring one machine as slave and other as master, and tryin to start hadoop, its not able to even execute hadoop-run commands on slave machine I am getting

Does hadoop installations need to be at same locations in cluster ?

2011-12-23 Thread praveenesh kumar
When installing hadoop on slave machines, do we have to install hadoop at same locations on each machine ? Can we have hadoop installation at different location on different machines at same cluster ? If yes, what things we have to take care in that case Thanks, Praveenesh

Re: Does hadoop installations need to be at same locations in cluster ?

2011-12-23 Thread Michael Segel
Sure, You could do that, but in doing so, you will make your life a living hell. Literally. Think about it... You will have to manually manage each nodes config files... So if something goes wrong you will have a hard time diagnosing the issue. Why make life harder? Why not just do the simple t

Re: Hadoop configuration

2011-12-23 Thread Michael Segel
Class project due? Sorry, second set of questions on setting up a 2 node cluster... Sent from my iPhone On Dec 22, 2011, at 3:25 AM, "Humayun kabir" wrote: > someone please help me to configure hadoop such as core-site.xml, > hdfs-site.xml, mapred-site.xml etc. > please provide some example. it

Re: Does hadoop installations need to be at same locations in cluster ?

2011-12-23 Thread praveenesh kumar
What I mean to say is, Does hadoop internally assumes that all installations on each nodes need to be in same location. I was having hadoop installed on different location on 2 different nodes. I configured hadoop config files to be a part of same cluster. But when I started hadoop on master, I sa

Task process exit with nonzero status of 134

2011-12-23 Thread anthony garnier
Hi folks, I've just done a fresh install of Hadoop, Namenode and datanode are up, Task/job Tracker also up, but when I run the Map reduce worcount exemple I got this error on Task tracker: 2011-12-23 15:11:52,679 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201112231511_0001_m_-165367

Re: Task process exit with nonzero status of 134

2011-12-23 Thread alo alt
Hi, take a look into the logs for the failed attempt at your Tasktracker. Also check the system logs with dmesg or /var/log/kern*. Could be a syskill (segfault). - Alex On Fri, Dec 23, 2011 at 3:32 PM, anthony garnier wrote: > > Hi folks, > > I've just done a fresh install of Hadoop, Namenode a

cassandra data to hadoop

2011-12-23 Thread ravikumar visweswara
Hello All, I have a situation to dump cassandra data to hadoop cluster for further analytics. Lot of other relevant data which is not present in cassandra is already available in hdfs for analysis. Both are independent clusters right now. Is there a suggested way to get the data periodically or co

Re: cassandra data to hadoop

2011-12-23 Thread Sanjeev Verma
Hey Ravi: Hadoop newbie here, so pardon me if I am pointing out the obvious - have you taken a look at this link - http://wiki.apache.org/cassandra/HadoopSupport Looks like Cassandra 0.6 onwards supports output to mapreduce. Regards Sanjeev On Fri, 2011-12-23 at 07:13 -0800, ravikumar visweswar

Re: cassandra data to hadoop

2011-12-23 Thread ravikumar visweswara
Thank you for your reference. I have looked at Brisk. In our situation both are disconnected clusters for various reasons and using different distributions (i.e cloudera). Is there any other/similar way to inject data to HDFS R On Fri, Dec 23, 2011 at 7:34 AM, Sanjeev Verma wrote: > Hey Ravi: >

1gig or 10gig network for cluster?

2011-12-23 Thread Koert Kuipers
For a hadoop cluster that starts medium size (50 nodes) but could grow to hundred of nodes, what is the recommended network in the rack? 1gig or 10gig We have machines with 8 cores, 4 X 1tb drive (could grow to 8 X 1b drive), 48 Gb ram per node. We expect "balanced" usage of the cluster (both stora

RE: Does hadoop installations need to be at same locations in cluster ?

2011-12-23 Thread Michael Segel
Ok, Here's the thing... 1) When building the cluster, you want to be consistent. 2) Location of $HADOOP_HOME is configurable. So you can place it anywhere. Putting the software in two different locations isn't a good idea because you now have to set it up with a unique configuration per node.

Re: 1gig or 10gig network for cluster?

2011-12-23 Thread alo alt
Hi, recommend or optimum? 10G are the best for optimal rackawareness. If you plan to grow up seriously, start with the best you can effort. Depends on your available investment, I think. - Alex On Fri, Dec 23, 2011 at 6:23 PM, Koert Kuipers wrote: > For a hadoop cluster that starts medium size

Re: 1gig or 10gig network for cluster?

2011-12-23 Thread Mads Toftum
On Fri, Dec 23, 2011 at 12:23:59PM -0500, Koert Kuipers wrote: > For a hadoop cluster that starts medium size (50 nodes) but could grow to > hundred of nodes, what is the recommended network in the rack? 1gig or 10gig > We have machines with 8 cores, 4 X 1tb drive (could grow to 8 X 1b drive), > 48

Re: 1gig or 10gig network for cluster?

2011-12-23 Thread Joep Rottinghuis
One or two 1gig nics on a 10g backbone sound reasonable with "only" 4 1T drives. 12*2T disks per node are getting more common and do not all have 10gig network cards, even on 600+ node clusters. Cheers, Joep Sent from my iPhone On Dec 23, 2011, at 11:15 AM, Mads Toftum wrote: > On Fri, Dec

Re: Does hadoop installations need to be at same locations in cluster ?

2011-12-23 Thread J. Rottinghuis
Agreed that different locations is not a good idea. However, the question was, can it be done? Yes, with some hacking I suppose. Do I recommend hacking? No. But, if you cannot help yourself, then having data nodes in a different locations per slave: create a hdfs-site.xml per node (enjoy). For the

Another newbie - problem with grep example

2011-12-23 Thread Pat Flaherty
Hi, Installed 0.22.0 on CentOS 5.7. I can start dfs and mapred and see their processes. Ran the first grep example: bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'. It seems the correct jar name is hadoop-mapred-examples-0.22.0.jar - there are no other hadoop*examples*.ja

Newbie - problem with grep example.

2011-12-23 Thread Pat Flaherty
i, Installed 0.22.0 on CentOS 5.7. I can start dfs and mapred and see their processes (3 dfs and 3 mapred). Ran the first grep example: bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'. It seems the correct jar name is hadoop-mapred-examples-0.22.0.jar - there are no other

Re: Another newbie - problem with grep example

2011-12-23 Thread Prashant Kommireddi
Seems like you do not have "/user/MyId/input/conf" on HDFS. Try this. cd $HADOOP_HOME_DIR (this should be your hadoop root dir) hadoop fs -put conf input/conf And then run the MR job again. -Prashant Kommireddi On Fri, Dec 23, 2011 at 3:40 PM, Pat Flaherty wrote: > Hi, > > Installed 0.22.0 o

Question on deprecations in Hadoop 0.20.203.0 API

2011-12-23 Thread Sanjeev Verma
Hey everyone: I am going through the "hadoop in action" book, and I guess the version of hadoop that book refers to is already old :-). The installation I have is 0.20.203.0, and in this version, a few key base classes have been deprecated, like: Interface InputSplit is deprecated in favor of Inp

Re: Question on deprecations in Hadoop 0.20.203.0 API

2011-12-23 Thread Harsh J
These items have been un-deprecated in 0.20.205+, and is also supported in 0.22/0.23+. The deprecated APIs are now the stable ones again, and you shouldn't carry further confusion while using it. On 24-Dec-2011, at 6:53 AM, Sanjeev Verma wrote: > Hey everyone: > > I am going through the "hadoo

Re: Another newbie - problem with grep example

2011-12-23 Thread Harsh J
Pat, Perhaps for some reason your program isn't picking up the right filesystem as it starts. What does "hadoop classpath" print? As a workaround, you can also pass an explicit FS to your command: input -> hdfs://host:port/user/path/to/input output -> hdfs://host:port/user/path/to/output And th

Re: Re: DN limit

2011-12-23 Thread Harsh J
Bourne, You have 14 million files, each taking up a single block or are these files multi-blocked? What does the block count come up as in the live nodes list of the NN web UI? 2011/12/23 bourne1900 : > Sorry, a detailed description: > I wanna know how many files a datanode can hold, so there is

Re: cassandra data to hadoop

2011-12-23 Thread Sanjeev Verma
I am probably stating the obvious again - have you looked at the DBInputFormat class? Or, another option might be to programmatically move data to hdfs using the FileSystem api. On Dec 23, 2011 10:50 AM, "ravikumar visweswara" wrote:

Re: Question on deprecations in Hadoop 0.20.203.0 API

2011-12-23 Thread Sanjeev Verma
Thanks Harsh! On Dec 23, 2011 11:21 PM, "Harsh J" wrote: > These items have been un-deprecated in 0.20.205+, and is also supported in > 0.22/0.23+. The deprecated APIs are now the stable ones again, and you > shouldn't carry further confusion while using it. > > On 24-Dec-2011, at 6:53 AM, Sanjee