Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-02 Thread Nitin Pawar
i can think of following options 1) write a simple get and put code which gets the data from DFS and loads it in dfs 2) see if the distcp between both versions are compatible 3) this is what I had done (and my data was hardly few hundred GB) .. did a dfs -copyToLocal and then in the new grid did

Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-02 Thread Austin Chungath
Hi, I am migrating from Apache hadoop 0.20.205 to CDH3u3. I don't want to lose the data that is in the HDFS of Apache hadoop 0.20.205. How do I migrate to CDH3u3 but keep the data that I have on 0.20.205. What is the best practice/ techniques to do this? Thanks & Regards, Austin

Re: Where is the Hadoop installation directory

2012-05-02 Thread zillou
Thank you very much On Wednesday, May 2, 2012, Harsh J wrote: > The very same will do fine. > > On Wed, May 2, 2012 at 7:14 PM, zillou > > wrote: > > Thank you Harsh J, > > > > The directory /usr/share/hadoop is right. But another question is which > > directory I can use for $HADOOP_HOME for Hiv

Reduce Hangs at 66%

2012-05-02 Thread Keith Thompson
I am running a task which gets to 66% of the Reduce step and then hangs indefinitely. Here is the log file (I apologize if I am putting too much here but I am not exactly sure what is relevant): 2012-05-02 16:42:52,975 INFO org.apache.hadoop.mapred.JobTracker: Adding task (REDUCE) 'attempt_2012022

Re: Splitting data input to Distcp

2012-05-02 Thread Pedro Figueiredo
On 2 May 2012, at 18:29, Himanshu Vijay wrote: > Hi, > > I have 100 files each of ~3 GB. I need to distcp them to S3 but copying > fails because of large size of files. The files are not gzipped so they are > splittable. Is there a way or property to tell Distcp to first split the > input files

Splitting data input to Distcp

2012-05-02 Thread Himanshu Vijay
Hi, I have 100 files each of ~3 GB. I need to distcp them to S3 but copying fails because of large size of files. The files are not gzipped so they are splittable. Is there a way or property to tell Distcp to first split the input files into let's say 200 MB or N lines each before copying to desti

Re: Where is the Hadoop installation directory

2012-05-02 Thread Harsh J
The very same will do fine. On Wed, May 2, 2012 at 7:14 PM, zillou wrote: > Thank you Harsh J, > > The directory /usr/share/hadoop is right. But another question is which > directory I can use for $HADOOP_HOME for Hive or other applications. > > Thanks, > robot zillou > > On Wednesday, May 2, 201

Re: Where is the Hadoop installation directory

2012-05-02 Thread zillou
Thank you Harsh J, The directory /usr/share/hadoop is right. But another question is which directory I can use for $HADOOP_HOME for Hive or other applications. Thanks, robot zillou On Wednesday, May 2, 2012, Harsh J wrote: > Zillou, > > The .deb packages from Hadoop itself would install jars to

Re: hadoop streaming using a java program as mapper

2012-05-02 Thread Robert Evans
Do you have the error message from running java? You can use myMapper.sh to help you debug what is happening and logging it. Stderr of myMapper.sh is logged and you can get to it. You can run shell commands link find, ls, and you can probably look at any error messages that java produced whil

Re: Where is the Hadoop installation directory

2012-05-02 Thread Harsh J
Zillou, The .deb packages from Hadoop itself would install jars to /usr/share/hadoop if am right. List that path out to see for sure. You can alternatively use the Apache BigTop 0.3.0 repos for DEBs/RPMs from https://cwiki.apache.org/confluence/display/BIGTOP/How+to+install+Hadoop+distribution+f

Re: Need to run common,mapred and HDFS test separately

2012-05-02 Thread Harsh J
Hi, Run the tests from their top level project directories: $ cd hadoop-common-project; mvn clean test On Wed, May 2, 2012 at 12:06 PM, Amith D K wrote: > Hi > > Currently all the tests in HDFS,common and MR will run when we execute > pom.xml > I want to separate the Common,MR and HDFS tests ru