RE: Hadoop or HBase

2012-08-27 Thread Kushal Agrawal
As the data is too much in (10's of terabytes) it's difficult to take backup because it takes 1.5 days to take backup of data every time. Instead of that if we uses distributed file system we need not to do that. Thanks & Regards, Kushal Agrawal kushalagra...@teledna.com   -Original Message---

Re: Hadoop or HBase

2012-08-27 Thread Kai Voigt
Typically, CMSs require a RDBMS. Which Hadoop and HBase are not. Which CMS do you plan to use, and what's wrong with MySQL or other open source RDBMSs? Kai Am 28.08.2012 um 08:21 schrieb "Kushal Agrawal" : > Hi, > I wants to use DFS for Content-Management-System (CMS), in > th

Hadoop or HBase

2012-08-27 Thread Kushal Agrawal
Hi, I wants to use DFS for Content-Management-System (CMS), in that I just wants to store and retrieve files. Please suggest me what should I use: Hadoop or HBase Thanks & Regards, Kushal Agrawal kushalagra...@teledna.com cid:image001.

Number of reducers

2012-08-27 Thread Abhishek
Hi all, I just want to know that, based on what factor map reduce framework decides number of reducers to launch for a job By default only one reducer will be launched for a given job is this right? If we explicitly does not mention number to launch via command line or driver class. If i choo

Re: Measuring Shuffle time for MR job

2012-08-27 Thread Raj Vishwanathan
You can extract the shuffle time from the job log. Take a look at  https://github.com/rajvish/hadoop-summary  Raj > > From: Bertrand Dechoux >To: common-user@hadoop.apache.org >Sent: Monday, August 27, 2012 12:57 AM >Subject: Re: Measuring Shuffle time for M

Re: Measuring Shuffle time for MR job

2012-08-27 Thread Bertrand Dechoux
Shuffle time is considered as part of the reduce step. Without reduce, there is no need for shuffling. One way to measure it would be using the full reduce time with a '/dev/null' reducer. I am not aware of any way to measure it. Regards Bertrand On Mon, Aug 27, 2012 at 8:18 AM, praveenesh kuma