Re: Copying many files to HDFS

Ahmed Ossama Fri, 13 Feb 2015 05:38:53 -0800

Hi Kevin,

Have a look at Apache Flume. It collects large amounts of data.


http://flume.apache.org/FlumeUserGuide.html

On 02/13/2015 03:28 PM, Kevin wrote:

Hi,
I am setting up a Hadoop cluster (CDH5.1.3) and I need to copy athousand or so files into HDFS, which totals roughly 1 TB. The clusterwill be isolated on its own private LAN with a single client machinethat is connected to the Hadoop cluster as well as the public network.The data that needs to be copied into HDFS is mounted as an NFS on theclient machine.
I can run `hadoop fs -put` concurrently on the client machine to tryand increase the throughput.
If these files were able to be accessed by each node in the Hadoopcluster, then I could write a MapReduce job to copy a number of filesfrom the network into HDFS. I could not find anything in thedocumentation saying that `distcp` works with locally hosted files(its code in the tools package doesn't tell any sign of it either) -but I wouldn't expect it to.
In general, are there any other ways of copying a very large number ofclient-local files to HDFS? I search the mail archives to find asimilar question and I didn't come across one. I'm sorry if this is aduplicate question.
Thanks for your time,
Kevin


--
Regards,
Ahmed Ossama

Re: Copying many files to HDFS

Reply via email to