1. In real production environment do we copy these 10 files in hdfs under a folder one by one. If this is the case then how many mappers do we specify 10 mappers. And do we use put command of hadoop to transfer this file.
Ans: This will depend on what you want to do with files. There is no rule which says that all files need to go in one folder one. While uploading files to hdfs via dfs clients (native hadoop cli or your java dfs client), they do not need mappers. Its a file system operation. Remember mappers will be involved only if you call mapreduce framework for processing the files or writing the files. In normal file uploads, its only dfs operations. 2. If the above is not the case then do we pre-process to merge these 10 files make it one file of size 10 TB and copy this in hdfs. Ans: You do not need to merge the files outside and put them on hdfs as long as individual files are of fair enough sized . When it goes to hdfs, do you want to merge it that again depends on the purpose you want to use it for. On Mon, Apr 14, 2014 at 7:28 PM, Shashidhar Rao <raoshashidhar...@gmail.com>wrote: > Hi, > > Please can somebody clarify my doubts. Say. I have a cluster of 30 nodes > and I want to put the files in HDFS. And all the files combine together the > size is 10 TB but each file is roughly say 1GB only and the total number > of files 10 files > > 1. In real production environment do we copy these 10 files in hdfs under > a folder one by one. If this is the case then how many mappers do we > specify 10 mappers. And do we use put command of hadoop to transfer this > file. > > 2. If the above is not the case then do we pre-process to merge these 10 > files make it one file of size 10 TB and copy this in hdfs . > > Regards > Shashidhar > > > -- Nitin Pawar