Re: Loading Data to HDFS

2012-10-30 Thread sumit ghosh
Hi Bertrand, Gateway machine is one which is usually used to connect to the Hadoop cluster however the machine itself does not contain DataNode/Tasktracker.   Warm Regards, Sumit From: Bertrand Dechoux To: common-user@hadoop.apache.org; sumit ghosh Sent

Re: Loading Data to HDFS

2012-10-30 Thread sumit ghosh
s Bertrand On Tue, Oct 30, 2012 at 11:07 AM, sumit ghosh wrote: > Hi, > > I have a data on remote machine accessible over ssh. I have Hadoop CDH4 > installed on RHEL. I am planning to load quite a few Petabytes of Data onto > HDFS. > > Which will be the fastest method to u

Loading Data to HDFS

2012-10-30 Thread sumit ghosh
OC/Deploying+HDFS+on+a+Cluster#DeployingHDFSonaCluster-EnablingWebHDFS > 3: https://ccp.cloudera.com/display/CDH4DOC/Mountable+HDFS > > On Wed, Oct 24, 2012 at 01:33:29AM -0700, Sumit Ghosh wrote: >> >> >> Hi, >> >> I have a data on remote machine accessible ove

Re: Multiple reducers

2011-12-07 Thread sumit ghosh
Hi Try setting the parameter mapred.reduce.tasks hadoop jar hadoop-0.20.2-examples.jar  wordcount -D mapred.reduce.tasks=4 Thanks. Sumit From: Hoot Thompson To: hadoop-u...@lucene.apache.org Sent: Tuesday, 29 November 2011 8:03 PM Subject: Multiple red

how to sort the output by vlaue in reduce instead of by key?

2011-04-11 Thread sumit ghosh
Your field1 data can be split over multiple reducers. Is it possible to emit field1 as the key from the reducer (in case you do not need the ip anymore)? From: leibnitz To: hadoop-u...@lucene.apache.org Sent: Mon, 11 April, 2011 12:02:46 PM Subject: how to sor

Re: how to sort the output by vlaue in reduce instead of by key?

2011-04-11 Thread sumit ghosh
Your field1 data can be split over multiple reducers. Is it possible to emit field1 as the key from the reducer (in case you do not need the ip anymore)? From: leibnitz To: hadoop-u...@lucene.apache.org Sent: Mon, 11 April, 2011 12:02:46 PM Subject: how to sor

Re: Architectural question

2011-04-11 Thread sumit ghosh
The original posting said - "The app does simple match every line of input data with every line of persistent data." Hence the "key" should be replaced by a String from the 10 GB store or a hash of it. Hence, we can match it with the hash or String from the persistent Store. _