Re: Hadoop Production Issue

2011-07-16 Thread jagaran das
java api to invoke copy command?? Thanks a lot Regards, JD From: Dmitriy Ryaboy To: user@pig.apache.org; jagaran das Sent: Saturday, 16 July 2011 8:38 PM Subject: Re: Hadoop Production Issue 1) Correct. 2) Copy to the cluster from any machine, just have the

Re: Hadoop Production Issue

2011-07-16 Thread Dmitriy Ryaboy
aboy > To: user@pig.apache.org; jagaran das > Sent: Saturday, 16 July 2011 7:58 AM > Subject: Re: Hadoop Production Issue > > Merging: doesn't actually speed things up all that much; reduces load > on the Namenode, and speeds up job initialization some. You don't

Re: Hadoop Production Issue

2011-07-16 Thread jagaran das
Our Config: 72 G RAM 4 Quad Core processor 1.8 TB local memory 10 node CDH3 cluster  From: jagaran das To: "user@pig.apache.org" Sent: Saturday, 16 July 2011 11:00 AM Subject: Re: Hadoop Production Issue ok then 1. We have to write a pig job for

Re: Hadoop Production Issue

2011-07-16 Thread jagaran das
; jagaran das Sent: Saturday, 16 July 2011 7:58 AM Subject: Re: Hadoop Production Issue Merging: doesn't actually speed things up all that much; reduces load on the Namenode, and speeds up job initialization some. You don't have to do it on the namenode itself. Neither do you have to do

Re: Hadoop Production Issue

2011-07-16 Thread Dmitriy Ryaboy
Merging: doesn't actually speed things up all that much; reduces load on the Namenode, and speeds up job initialization some. You don't have to do it on the namenode itself. Neither do you have to do copying on the NN. In fact, don't do anything but run the NameNode on the namenode. Pig jobs can t

Re: Hadoop Production Issue

2011-07-16 Thread Jeremy Hanna
One thing that we use is filecrush to merge small files below a threshold. It works pretty well. http://www.jointhegrid.com/hadoop_filecrush/index.jsp On Jul 16, 2011, at 1:17 AM, jagaran das wrote: > > > Hi, > > Due to requirements in our current production CDH3 cluster we need to copy > a

Hadoop Production Issue

2011-07-15 Thread jagaran das
Hi, Due to requirements in our current production CDH3 cluster we need to copy around 11520 small size files (Total Size 12 GB) to the cluster for one application. Like this we have 20 applications that would run in parallel So one set would have 11520 files of total size 12 GB Like this we w