@Tariq can you point me to some resource which shows how distcp is used to
upload files from local to hdfs.
isn't distcp a MR job ? wouldn't it need the data to be already present in
the hadoop's fs?
Rahul
On Sat, May 11, 2013 at 10:52 PM, Mohammad Tariq donta...@gmail.com wrote:
You'r
you can do that using file:///
example:
hadoop distcp hdfs://localhost:8020/somefile file:///Users/myhome/Desktop/
On Sun, May 12, 2013 at 5:23 PM, Rahul Bhattacharjee
rahul.rec@gmail.com wrote:
@Tariq can you point me to some resource which shows how distcp is used to
upload files
@Rahul : I'm sorry I answered this on a wrong thread by mistake. You could
do that as Nitin has shown.
Warm Regards,
Tariq
cloudfront.blogspot.com
On Sun, May 12, 2013 at 5:36 PM, Nitin Pawar nitinpawar...@gmail.comwrote:
you can do that using file:///
example:
hadoop distcp
Thanks to both of you!
Rahul
On Sun, May 12, 2013 at 5:36 PM, Nitin Pawar nitinpawar...@gmail.comwrote:
you can do that using file:///
example:
hadoop distcp hdfs://localhost:8020/somefile file:///Users/myhome/Desktop/
On Sun, May 12, 2013 at 5:23 PM, Rahul Bhattacharjee
No. distcp is actually a mapreduce job under the hood.
Warm Regards,
Tariq
cloudfront.blogspot.com
On Sun, May 12, 2013 at 6:00 PM, Rahul Bhattacharjee
rahul.rec@gmail.com wrote:
Thanks to both of you!
Rahul
On Sun, May 12, 2013 at 5:36 PM, Nitin Pawar nitinpawar...@gmail.comwrote:
I had said that if you use distcp to copy data *from localFS to HDFS* then
you won't be able to exploit parallelism as entire file is present on a
single machine. So no multiple TTs.
Please comment if you think I am wring somewhere.
Warm Regards,
Tariq
cloudfront.blogspot.com
On Sun, May 12,
yeah you are right I mis read your earlier post.
Thanks,
Rahul
On Sun, May 12, 2013 at 6:25 PM, Mohammad Tariq donta...@gmail.com wrote:
I had said that if you use distcp to copy data *from localFS to HDFS*then you
won't be able to exploit parallelism as entire file is present on
a single
This is what I would say :
The number of maps is decided as follows. Since it’s a good idea to get
each map to copy a reasonable amount of data to minimize overheads in task
setup, each map copies at least 256 MB (unless the total size of the input
is less, in which case one map handles it all).
Hi All,
Can anyone help me know how does companies like Facebook ,Yahoo etc upload
bulk files say to the tune of 100 petabytes to Hadoop HDFS cluster for
processing
and after processing how they download those files from HDFS to local file
system.
I don't think they might be using the command
first of all .. most of the companies do not get 100 PB of data in one go.
Its an accumulating process and most of the companies do have a data
pipeline in place where the data is written to hdfs on a frequency basis
and then its retained on hdfs for some duration as per needed and from
there its
@Nitin Pawar , thanks for clearing my doubts .
But I have one more question , say I have 10 TB data in the pipeline .
Is it perfectly OK to use hadopo fs put command to upload these files of
size 10 TB and is there any limit to the file size using hadoop command
line . Can hadoop put command
is it safe? .. there is no direct answer yes or no
when you say , you have files worth 10TB files and you want to upload to
HDFS, several factors come into picture
1) Is the machine in the same network as your hadoop cluster?
2) If there guarantee that network will not go down?
and Most
@Nitin , parallel dfs to write to hdfs is great , but could not understand
the meaning of capable NN. As I know , the NN would not be a part of the
actual data write pipeline , means that the data would not travel through
the NN , the dfs would contact the NN from time to time to get locations of
NN would still be in picture because it will be writing a lot of meta data
for each individual file. so you will need a NN capable enough which can
store the metadata for your entire dataset. Data will never go to NN but
lot of metadata about data will be on NN so its always good idea to have a
Sorry for barging in guys. I think Nitin is talking about this :
Every file and block in HDFS is treated as an object and for each object
around 200B of metadata get created. So the NN should be powerful enough to
handle that much metadata, since it is going to be in-memory. Actually
memory is
absolutely rite Mohammad
On Sat, May 11, 2013 at 9:33 PM, Mohammad Tariq donta...@gmail.com wrote:
Sorry for barging in guys. I think Nitin is talking about this :
Every file and block in HDFS is treated as an object and for each object
around 200B of metadata get created. So the NN should
@Thoihen. If the data that you are trying to load is not streaming or the
data loading is not real-time in nature then why don't you use
Sqoop? Relatively easy to use with not much learning curve.
Regards,
Shahab
On Sat, May 11, 2013 at 12:03 PM, Mohammad Tariq donta...@gmail.com wrote:
Sorry
IMHO,I think the statement about NN with regard to block metadata is more
like a general statement. Even if you put lots of small files of combined
size 10 TB , you need to have a capable NN.
can disct cp be used to copy local - to - hdfs ?
Thanks,
Rahul
On Sat, May 11, 2013 at 9:35 PM, Nitin
@Rahul : Yes. distcp can do that.
And, bigger the files lesser the metadata hence lesser memory consumption.
Warm Regards,
Tariq
cloudfront.blogspot.com
On Sat, May 11, 2013 at 9:40 PM, Rahul Bhattacharjee
rahul.rec@gmail.com wrote:
IMHO,I think the statement about NN with regard to
Thanks Tariq!
On Sat, May 11, 2013 at 10:34 PM, Mohammad Tariq donta...@gmail.com wrote:
@Rahul : Yes. distcp can do that.
And, bigger the files lesser the metadata hence lesser memory consumption.
Warm Regards,
Tariq
cloudfront.blogspot.com
On Sat, May 11, 2013 at 9:40 PM, Rahul
You'r welcome :)
Warm Regards,
Tariq
cloudfront.blogspot.com
On Sat, May 11, 2013 at 10:46 PM, Rahul Bhattacharjee
rahul.rec@gmail.com wrote:
Thanks Tariq!
On Sat, May 11, 2013 at 10:34 PM, Mohammad Tariq donta...@gmail.comwrote:
@Rahul : Yes. distcp can do that.
And, bigger the
In our case we have our own written hdfs client to write the data and
downlod it.
*Thanks Regards*
∞
Shashwat Shriparv
On Sat, May 11, 2013 at 10:52 PM, Mohammad Tariq donta...@gmail.com wrote:
You'r welcome :)
Warm Regards,
Tariq
cloudfront.blogspot.com
On Sat, May 11, 2013 at
22 matches
Mail list logo