$DuplicationException: Invalid input, there are duplicated files in the
sources: hftp://ub13:50070/tmp/Rtmp1BU9Kb/file6abc6ccb6551/_logs/history,
hftp://ub13:50070/tmp/Rtmp3yCJhu/file1ca96d9331/_logs/history
Any idea what is the problem here?
They are different files how are they conflicting?
Hi Austin,
I'm glad that helped out. Regarding the -p flag for distcp, here's the online
documentation
http://hadoop.apache.org/common/docs/current/distcp.html#Option+Index
You can also get this info from running 'hadoop distcp' without any flags.
-p[rbugp] Preserve
Thanks,
So I decided to try and move using distcp.
$ hadoop distcp hdfs://localhost:54310/tmp hdfs://localhost:8021/tmp_copy
12/05/07 14:57:38 INFO tools.DistCp: srcPaths=[hdfs://localhost:54310/tmp]
12/05/07 14:57:38 INFO tools.DistCp: destPath=hdfs://localhost:8021/tmp_copy
With failures,
ok that was a lame mistake.
$ hadoop distcp hftp://localhost:50070/tmp hftp://localhost:60070/tmp_copy
I had spelled hdfs instead of hftp
$ hadoop distcp hftp://localhost:50070/docs/index.html
hftp://localhost:60070/user/hadoop
12/05/07 16:38:09 INFO tools.DistCp:
things to check
1) when you launch distcp jobs all the datanodes of older hdfs are live and
connected
2) when you launch distcp no data is being written/moved/deleteed in hdfs
3) you can use option -log to log errors into directory and user -i to
ignore errors
also u can try using distcp with
Hi Austin,
I don't know about using CDH3, but we use distcp for moving data between
different versions of apache grids and several things come to mind.
1) you should use the -i flag to ignore checksum differences on the blocks.
I'm not 100% but want to say hftp doesn't support checksums on
Thanks Adam,
That was very helpful. Your second point solved my problems :-)
The hdfs port number was wrong.
I didn't use the option -ppgu what does it do?
On Mon, May 7, 2012 at 8:07 PM, Adam Faris afa...@linkedin.com wrote:
Hi Austin,
I don't know about using CDH3, but we use distcp for
Hi,
I am migrating from Apache hadoop 0.20.205 to CDH3u3.
I don't want to lose the data that is in the HDFS of Apache hadoop
0.20.205.
How do I migrate to CDH3u3 but keep the data that I have on 0.20.205.
What is the best practice/ techniques to do this?
Thanks Regards,
Austin
i can think of following options
1) write a simple get and put code which gets the data from DFS and loads
it in dfs
2) see if the distcp between both versions are compatible
3) this is what I had done (and my data was hardly few hundred GB) .. did a
dfs -copyToLocal and then in the new grid did
Thanks for the suggestions,
My concerns are that I can't actually copyToLocal from the dfs because the
data is huge.
Say if my hadoop was 0.20 and I am upgrading to 0.20.205 I can do a
namenode upgrade. I don't have to copy data out of dfs.
But here I am having Apache hadoop 0.20.205 and I want
you can actually look at the distcp
http://hadoop.apache.org/common/docs/r0.20.0/distcp.html
but this means that you have two different set of clusters available to do
the migration
On Thu, May 3, 2012 at 12:51 PM, Austin Chungath austi...@gmail.com wrote:
Thanks for the suggestions,
My
There is only one cluster. I am not copying between clusters.
Say I have a cluster running apache 0.20.205 with 10 TB storage capacity
and has about 8 TB of data.
Now how can I migrate the same cluster to use cdh3 and use that same 8 TB
of data.
I can't copy 8 TB of data using distcp because I
Seems like a matter of upgrade. I am not a Cloudera user so would not know
much, but you might find some help moving this to Cloudera mailing list.
On Thu, May 3, 2012 at 2:51 AM, Austin Chungath austi...@gmail.com wrote:
There is only one cluster. I am not copying between clusters.
Say I
Yes. This was first posted on the cloudera mailing list. There were no
responses.
But this is not related to cloudera as such.
cdh3 is based on apache hadoop 0.20 as the base. My data is in apache
hadoop 0.20.205
There is an upgrade namenode option when we are migrating to a higher
version say
Well, you've kind of painted yourself in to a corner...
Not sure why you didn't get a response from the Cloudera lists, but it's a
generic question...
8 out of 10 TB. Are you talking effective storage or actual disks?
And please tell me you've already ordered more hardware.. Right?
And please
Yeah I know :-)
and this is not a production cluster ;-) and yes there is more hardware
coming :-)
On Thu, May 3, 2012 at 4:10 PM, Michel Segel michael_se...@hotmail.comwrote:
Well, you've kind of painted yourself in to a corner...
Not sure why you didn't get a response from the Cloudera
Ok... When you get your new hardware...
Set up one server as your new NN, JT, SN.
Set up the others as a DN.
(Cloudera CDH3u3)
On your existing cluster...
Remove your old log files, temp files on HDFS anything you don't need.
This should give you some more space.
Start copying some of the
Honestly that is a hassle, going from 205 to cdh3u3 is probably more
or a cross-grade then an upgrade or downgrade. I would just stick it
out. But yes like Michael said two clusters on the same gear and
distcp. If you are using RF=3 you could also lower your replication to
rf=2 'hadoop dfs
This probably is a more relevant question in CDH mailing lists. That said,
what Edward is suggesting seems reasonable. Reduce replication factor,
decommission some of the nodes and create a new cluster with those nodes
and do distcp.
Could you share with us the reasons you want to migrate from
Ok... So riddle me this...
I currently have a replication factor of 3.
I reset it to two.
What do you have to do to get the replication factor of 3 down to 2?
Do I just try to rebalance the nodes?
The point is that you are looking at a very small cluster.
You may want to start the be cluster
20 matches
Mail list logo