java api to invoke copy command??
Thanks a lot
Regards,
JD
From: Dmitriy Ryaboy
To: user@pig.apache.org; jagaran das
Sent: Saturday, 16 July 2011 8:38 PM
Subject: Re: Hadoop Production Issue
1) Correct.
2) Copy to the cluster from any machine, just have the
aboy
> To: user@pig.apache.org; jagaran das
> Sent: Saturday, 16 July 2011 7:58 AM
> Subject: Re: Hadoop Production Issue
>
> Merging: doesn't actually speed things up all that much; reduces load
> on the Namenode, and speeds up job initialization some. You don't
Our Config:
72 G RAM 4 Quad Core processor 1.8 TB local memory
10 node CDH3 clusterÂ
From: jagaran das
To: "user@pig.apache.org"
Sent: Saturday, 16 July 2011 11:00 AM
Subject: Re: Hadoop Production Issue
ok then
1. We have to write a pig job for
; jagaran das
Sent: Saturday, 16 July 2011 7:58 AM
Subject: Re: Hadoop Production Issue
Merging: doesn't actually speed things up all that much; reduces load
on the Namenode, and speeds up job initialization some. You don't have
to do it on the namenode itself. Neither do you have to do
Merging: doesn't actually speed things up all that much; reduces load
on the Namenode, and speeds up job initialization some. You don't have
to do it on the namenode itself. Neither do you have to do copying on
the NN. In fact, don't do anything but run the NameNode on the
namenode.
Pig jobs can t
One thing that we use is filecrush to merge small files below a threshold. It
works pretty well.
http://www.jointhegrid.com/hadoop_filecrush/index.jsp
On Jul 16, 2011, at 1:17 AM, jagaran das wrote:
>
>
> Hi,
>
> Due to requirements in our current production CDH3 cluster we need to copy
> a
Hi,
Due to requirements in our current production CDH3 cluster we need to copy
around 11520 small size files (Total Size 12 GB) to the cluster for one
application.
Like this we have 20 applications that would run in parallel
So one set would have 11520 files of total size 12 GB
Like this we w