Re: how to dump data from a mysql cluster to hdfs?

2009-08-05 Thread Yang Zhou
Write a Java program which will dump data from mysql cluster and save them into HDFS at the same time. Run it on namenode. I assume namenode should be able to connect to mysql gateway. Will it work? On Thu, Aug 6, 2009 at 12:02 PM, Min Zhou wrote: > Hi Aaron, > > We couldnot run mysqldump on the

Help in running hadoop from eclipse

2009-08-05 Thread ashish pareek
Hi Everybody, I am trying to run hadoop from eclipse... but when i run NmaeNode.java as java appliaction i get following error. Please help in getting rid of this problem. 2009-08-05 23:42:00,760 INFO dfs.NameNode (StringUtils.java:startupShutdownMessage(464)) - STARTUP_M

Re: how to dump data from a mysql cluster to hdfs?

2009-08-05 Thread Min Zhou
Hi Aaron, We couldnot run mysqldump on the nodes mysqld runs on. The only way is handling a connection to a gateway of the mysql cluster. Our hadoop cluster serves us with also gateways, it's not allowed hadoop datanodes directly connect to mysql gateway. Min On Thu, Aug 6, 2009 at 1:27 AM, Aaro

Re: Maps running - how to increase?

2009-08-05 Thread Zeev Milin
I now see that the mapred.tasktracker.map.tasks.maximum=32 on the job level and still only 6 maps running and 5000+ pending.. Not sure how to force the cluster to run more maps.

Re: Maps running - how to increase?

2009-08-05 Thread Zeev Milin
This is the setting in hadoop-site.xml file: mapred.tasktracker.map.tasks.maximum 32 When I look at the job configuration file (xml), I see that this parameter is set to 2. Not sure why the hadoop-site value is not being used.

Re: Maps running - how to increase?

2009-08-05 Thread Tim Sell
have you set: mapred.tasktracker.map.tasks.maximum ? That specifies the number of maps that can run on single node at a time. 2009/8/5 Zeev Milin : > I have a map/reduce job that has a total of 6000 map tasks. The issue is > that the number of maps that is "running" at any given time is 6 (number

Maps running - how to increase?

2009-08-05 Thread Zeev Milin
I have a map/reduce job that has a total of 6000 map tasks. The issue is that the number of maps that is "running" at any given time is 6 (number of nodes) and rest are pending. Does anyone know how to force the cluster to run more maps in parallel to increase the throughput? This is the only job t

Re: distcp between 0.17 and 0.18.3 issues

2009-08-05 Thread charles du
Thanks a lot. It solved my problem. Chuang On Wed, Aug 5, 2009 at 2:04 PM, Tsz Wo (Nicholas), Sze < s29752-hadoopu...@yahoo.com> wrote: > >hadoop distcp -i hftp://nn1:50070/src hftp://nn2:50070/dest > The problem in the command above is hftp://nn2:50070/dest while hftp (i.e. > HftpFileSystem

Re: distcp between 0.17 and 0.18.3 issues

2009-08-05 Thread Tsz Wo (Nicholas), Sze
>hadoop distcp -i hftp://nn1:50070/src hftp://nn2:50070/dest The problem in the command above is hftp://nn2:50070/dest while hftp (i.e. HftpFileSystem) is a read-only file system. You may change it to hdfs://nn2:/dest, where is a different port. You may find the port number from t

Re: distcp between 0.17 and 0.18.3 issues

2009-08-05 Thread charles du
Hi Nicholas: The command I used is hadoop distcp -i hftp://nn1:50070/src hftp://nn2:50070/dest I ran "hadoop ls" on both src and destination, and it lists files just fine. nn1 is 0.17.0, and nn2 is 0.18.3 Thanks. tp On Wed, Aug 5, 2009 at 1:49 PM, Tsz Wo (Nicholas), Sze < s29752-hadoopu...

Re: distcp between 0.17 and 0.18.3 issues

2009-08-05 Thread Tsz Wo (Nicholas), Sze
Hi tp, distcp definitely supports copying file from a 0.17 cluster to a 0.18 cluster. The error message is saying that the delete operation is not supported in HftpFileSystem. Would you mind to show me the actual command used? Nicholas Sze - Original Message > From: charles du >

distcp between 0.17 and 0.18.3 issues

2009-08-05 Thread charles du
Hi: I tried to use "distcp" to copy files from one cluster running hadoop 0.17.0 to another cluster running hadoop 0.18.3, and got the following errors. With failures, global counters are inaccurate; consider running with -i Copy failed: java.io.IOException: Not supported at org.apache.ha

Re: namenode -upgrade problem

2009-08-05 Thread Aaron Kimball
The only time you would need to upgrade is if you've increased the Hadoop version but are retaining the same HDFS :) So, that's the normal case. What does "netstat --listening --numeric --program" report? - Aaron On Wed, Aug 5, 2009 at 10:53 AM, bharath vissapragada < bharathvissapragada1...@gmai

Re: namenode -upgrade problem

2009-08-05 Thread bharath vissapragada
yes .. I have stopped all the daemons ... when i use jps ...i get only ... " Jps" Actually .. i upgraded the version from 18.2 to 19.x on the same path of hdfs .. is it a problem? On Wed, Aug 5, 2009 at 11:02 PM, Aaron Kimball wrote: > Are you sure you stopped all the daemons? Use 'sudo jps'

Re: Questions on "dfs.datanode.du.reserved"

2009-08-05 Thread jchernandez
Hi all, Does anyone has an answer to this question? I've search on forums, Hadoop change logs, and issues and it seems it still is an open issue. Has anyone seen this parameter working, and under what situations? Thank you, Julian Taeho Kang wrote: > > Dear All, > I have few questions on "d

Re: namenode -upgrade problem

2009-08-05 Thread Aaron Kimball
Are you sure you stopped all the daemons? Use 'sudo jps' to make sure :) - Aaron On Mon, Aug 3, 2009 at 7:26 PM, bharath vissapragada < bharathvissapragada1...@gmail.com> wrote: > Todd thanks for replying .. > > I stopped the cluster and issued the command > > "bin/hadoop namenode -upgrade" and i

Re: THIS WEEK: PNW Hadoop, HBase / Apache Cloud Stack Users' Meeting, Wed Jul 29th, Seattle

2009-08-05 Thread Bradford Stephens
A big "thanks" to everyone who came out despite the heat! Hope to see you again the last week of August, probably at UW. On Wed, Jul 29, 2009 at 4:52 PM, Bradford Stephens wrote: > Don't forget this is tonight! Excited to see everyone there. > > On Tue, Jul 28, 2009 at 11:25 AM, Bradford > Stephen

Re: how to dump data from a mysql cluster to hdfs?

2009-08-05 Thread Aaron Kimball
mysqldump to local files on all 50 nodes, scp them to datanodes, and then bin/hadoop fs -put? - Aaron On Mon, Aug 3, 2009 at 8:15 PM, Min Zhou wrote: > hi all, > > We need to dump data from a mysql cluster with about 50 nodes to a hdfs > file. Considered about the issues on security , we can't u

Re: questions about HDFS file access synchronization

2009-08-05 Thread Aaron Kimball
On Wed, Aug 5, 2009 at 6:09 AM, Zhang Bingjun (Eddy) wrote: > Hi All, > > I am quite new to Hadoop. May I ask a simple question about HDFS file > access > synchronization? > > For some very typical scenarios below, how does HDFS respond? Is there a > way > to synchronize file access in HDFS? > > A

Re: how to get out of Safe Mode?

2009-08-05 Thread Aaron Kimball
For future reference, $ bin/hadoop dfsadmin safemode -leave will also just cause HDFS to exit safemode forcibly. - Aaron On Wed, Aug 5, 2009 at 1:04 AM, Amandeep Khurana wrote: > Two alternatives: > > 1. Do bin/hadoop namenode -format. That'll format the metadata and you can > start afresh. > >

Re: network interface trunking

2009-08-05 Thread Todd Lipcon
Hi Ryan, Yes, you can do this -- the term is called "interface bonding" and isn't too hard to set up in Linux as long as your switch supports it. However, it is pretty rare that it provides an appreciable performance benefit on typical hardware and workloads -- probably not worth the doubled switch

Eclipse plugin - user permissions

2009-08-05 Thread John Clarke
Hi, I have installed Eclipse Europa and the Hadoop 0.18.3 plugin. I am running Windows XP. My Hadoop test environment is Ubuntu 9.04 running via VirtualBox. I have a user called "hadoop" in Ubuntu that I use to run Hadoop. I launched all the namenode/datanode etc etc with this user. Back in Ecli

questions about HDFS file access synchronization

2009-08-05 Thread Zhang Bingjun (Eddy)
Hi All, I am quite new to Hadoop. May I ask a simple question about HDFS file access synchronization? For some very typical scenarios below, how does HDFS respond? Is there a way to synchronize file access in HDFS? A tries to read a file currently being written by B. A tries to write a file curr

network interface trunking

2009-08-05 Thread Ryan Smith
Hello everyone, If I have a machine (DN) with 2 network cards, can i get double bandwidth for my data node in hadoop? Or is the preferred solution to link aggregate the 2 interfaces on the os layer? Any thoughts on this would be appreciated. Thanks in advance. -Ryan

Re: Some tasks fail to report status between the end of the map and the beginning of the merge

2009-08-05 Thread Mathias De Maré
> On Wed, Aug 5, 2009 at 9:38 AM, Jothi Padmanabhan > wrote: > Hi, > > Could you please try setting this parameter > mapred.merge.recordsBeforeProgress to a lower number? > See https://issues.apache.org/jira/browse/HADOOP-4714 > > Cheers > Jothi Hm, that bug looks like it's applicable during the

Re: Hadoop 0.19.2 + eclipse 3.5

2009-08-05 Thread John Clarke
Thanks for the quick reply! Hmm that's a shame as I have my Galileo env set up nicely :/ I'll go get Europa so! cheers 2009/8/5 Puri, Aseem > John, > > You need to Eclipse Europa for that. > > You can download it from http://archive.eclipse.org/eclipse/downloads/ > > Regards, > Aseem Puri >

RE: Hadoop 0.19.2 + eclipse 3.5

2009-08-05 Thread Puri, Aseem
John, You need to Eclipse Europa for that. You can download it from http://archive.eclipse.org/eclipse/downloads/ Regards, Aseem Puri -Original Message- From: John Clarke [mailto:clarke...@gmail.com] Sent: Wednesday, August 05, 2009 3:12 PM To: common-user@hadoop.apache.org Subject: H

Hadoop 0.19.2 + eclipse 3.5

2009-08-05 Thread John Clarke
Hi, I am trying to get my Eclipse Galileo 3.5 envronment working with the Hadoop 0.19.2 Eclipse plugin. I am running Windows XP. Previously, I tried to use Hadoop 0.18.3 with this Eclipse and it partly worked. I could browse the DFS but could not run a project because when I clicked Run on Hadoop

RE: Some tasks fail to report status between the end of the map and the beginning of the merge

2009-08-05 Thread Amogh Vasekar
10 mins reminds me of parameter mapred.task.timeout . This is configurable. Or alternatively you might just do a sysout to let tracker know of its existence ( not an ideal solution though ) Thanks, Amogh -Original Message- From: Mathias De Maré [mailto:mathias.dem...@gmail.com] Sent: We

Re: how to get out of Safe Mode?

2009-08-05 Thread Amandeep Khurana
Two alternatives: 1. Do bin/hadoop namenode -format. That'll format the metadata and you can start afresh. 2. If that doesnt work, manually go and delete everything that resides in the directories to which you've pointed your Namenode and Datanodes to store their stuff in. On Tue, Aug 4, 2009

Re: Re: FYI X-RIME: Hadoop based large scale social network analysis released

2009-08-05 Thread Bin Cai
Hi, Edward J. Yoon Sorry that I found I was not in hama-user maillist. Just joined[?] Xrime is based on Map/Reduce using HDFS to store graph information. It is distributed and parallel. We will release some documents and examples recently. I also noticed project Hamburg. It is interesti

how to get out of Safe Mode?

2009-08-05 Thread Phil Whelan
Hi, In setting up my cluster and brought a few machines up and down. I did have some data in which I moved to Trash. Now that data is not 100% available, which is fine, because I didn't want it. But now I'm stuck in "Safe Mode", because it cannot find the data. I cannot purge the Trash because it'

Some tasks fail to report status between the end of the map and the beginning of the merge

2009-08-05 Thread Mathias De Maré
Hi, I'm having some problems (Hadoop 0.20.0) where map tasks fail to report status for 10 minutes and get killed eventually. All of the tasks output around the same amount of data, some only take a few seconds before starting the 'merge' on the segments, but some seem to fail by just stopping to w