Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-09 Thread Austin Chungath
$DuplicationException: Invalid input, there are duplicated files in the sources: hftp://ub13:50070/tmp/Rtmp1BU9Kb/file6abc6ccb6551/_logs/history, hftp://ub13:50070/tmp/Rtmp3yCJhu/file1ca96d9331/_logs/history Any idea what is the problem here? They are different files how are they conflicting?

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-08 Thread Adam Faris
Hi Austin, I'm glad that helped out. Regarding the -p flag for distcp, here's the online documentation http://hadoop.apache.org/common/docs/current/distcp.html#Option+Index You can also get this info from running 'hadoop distcp' without any flags. -p[rbugp] Preserve

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-07 Thread Austin Chungath
Thanks, So I decided to try and move using distcp. $ hadoop distcp hdfs://localhost:54310/tmp hdfs://localhost:8021/tmp_copy 12/05/07 14:57:38 INFO tools.DistCp: srcPaths=[hdfs://localhost:54310/tmp] 12/05/07 14:57:38 INFO tools.DistCp: destPath=hdfs://localhost:8021/tmp_copy With failures,

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-07 Thread Austin Chungath
ok that was a lame mistake. $ hadoop distcp hftp://localhost:50070/tmp hftp://localhost:60070/tmp_copy I had spelled hdfs instead of hftp $ hadoop distcp hftp://localhost:50070/docs/index.html hftp://localhost:60070/user/hadoop 12/05/07 16:38:09 INFO tools.DistCp:

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-07 Thread Nitin Pawar
things to check 1) when you launch distcp jobs all the datanodes of older hdfs are live and connected 2) when you launch distcp no data is being written/moved/deleteed in hdfs 3) you can use option -log to log errors into directory and user -i to ignore errors also u can try using distcp with

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-07 Thread Adam Faris
Hi Austin, I don't know about using CDH3, but we use distcp for moving data between different versions of apache grids and several things come to mind. 1) you should use the -i flag to ignore checksum differences on the blocks. I'm not 100% but want to say hftp doesn't support checksums on

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-07 Thread Austin Chungath
Thanks Adam, That was very helpful. Your second point solved my problems :-) The hdfs port number was wrong. I didn't use the option -ppgu what does it do? On Mon, May 7, 2012 at 8:07 PM, Adam Faris afa...@linkedin.com wrote: Hi Austin, I don't know about using CDH3, but we use distcp for

Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-03 Thread Austin Chungath
Hi, I am migrating from Apache hadoop 0.20.205 to CDH3u3. I don't want to lose the data that is in the HDFS of Apache hadoop 0.20.205. How do I migrate to CDH3u3 but keep the data that I have on 0.20.205. What is the best practice/ techniques to do this? Thanks Regards, Austin

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-03 Thread Nitin Pawar
i can think of following options 1) write a simple get and put code which gets the data from DFS and loads it in dfs 2) see if the distcp between both versions are compatible 3) this is what I had done (and my data was hardly few hundred GB) .. did a dfs -copyToLocal and then in the new grid did

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-03 Thread Austin Chungath
Thanks for the suggestions, My concerns are that I can't actually copyToLocal from the dfs because the data is huge. Say if my hadoop was 0.20 and I am upgrading to 0.20.205 I can do a namenode upgrade. I don't have to copy data out of dfs. But here I am having Apache hadoop 0.20.205 and I want

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-03 Thread Nitin Pawar
you can actually look at the distcp http://hadoop.apache.org/common/docs/r0.20.0/distcp.html but this means that you have two different set of clusters available to do the migration On Thu, May 3, 2012 at 12:51 PM, Austin Chungath austi...@gmail.com wrote: Thanks for the suggestions, My

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-03 Thread Austin Chungath
There is only one cluster. I am not copying between clusters. Say I have a cluster running apache 0.20.205 with 10 TB storage capacity and has about 8 TB of data. Now how can I migrate the same cluster to use cdh3 and use that same 8 TB of data. I can't copy 8 TB of data using distcp because I

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-03 Thread Prashant Kommireddi
Seems like a matter of upgrade. I am not a Cloudera user so would not know much, but you might find some help moving this to Cloudera mailing list. On Thu, May 3, 2012 at 2:51 AM, Austin Chungath austi...@gmail.com wrote: There is only one cluster. I am not copying between clusters. Say I

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-03 Thread Austin Chungath
Yes. This was first posted on the cloudera mailing list. There were no responses. But this is not related to cloudera as such. cdh3 is based on apache hadoop 0.20 as the base. My data is in apache hadoop 0.20.205 There is an upgrade namenode option when we are migrating to a higher version say

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-03 Thread Michel Segel
Well, you've kind of painted yourself in to a corner... Not sure why you didn't get a response from the Cloudera lists, but it's a generic question... 8 out of 10 TB. Are you talking effective storage or actual disks? And please tell me you've already ordered more hardware.. Right? And please

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-03 Thread Austin Chungath
Yeah I know :-) and this is not a production cluster ;-) and yes there is more hardware coming :-) On Thu, May 3, 2012 at 4:10 PM, Michel Segel michael_se...@hotmail.comwrote: Well, you've kind of painted yourself in to a corner... Not sure why you didn't get a response from the Cloudera

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-03 Thread Michel Segel
Ok... When you get your new hardware... Set up one server as your new NN, JT, SN. Set up the others as a DN. (Cloudera CDH3u3) On your existing cluster... Remove your old log files, temp files on HDFS anything you don't need. This should give you some more space. Start copying some of the

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-03 Thread Edward Capriolo
Honestly that is a hassle, going from 205 to cdh3u3 is probably more or a cross-grade then an upgrade or downgrade. I would just stick it out. But yes like Michael said two clusters on the same gear and distcp. If you are using RF=3 you could also lower your replication to rf=2 'hadoop dfs

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-03 Thread Suresh Srinivas
This probably is a more relevant question in CDH mailing lists. That said, what Edward is suggesting seems reasonable. Reduce replication factor, decommission some of the nodes and create a new cluster with those nodes and do distcp. Could you share with us the reasons you want to migrate from

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-03 Thread Michel Segel
Ok... So riddle me this... I currently have a replication factor of 3. I reset it to two. What do you have to do to get the replication factor of 3 down to 2? Do I just try to rebalance the nodes? The point is that you are looking at a very small cluster. You may want to start the be cluster