Re: How does mapper process partial records?

2013-01-24 Thread Praveen Sripati
/ http://stackoverflow.com/users/614157/praveen-sripati If you aren’t taking advantage of big data, then you don’t have big data, you have just a pile of data. On Fri, Jan 25, 2013 at 12:52 AM, Harsh J wrote: > Hi Praveen, > > This is explained at http://wiki.apache.org/hadoop/HadoopM

Ant offline build

2012-03-31 Thread Praveen Sripati
Hi, I have got the code for 0.22 and did the build successfully using 'ant clean compile eclipse' command. But, the ant command is downloading the dependent jar files every time. How to make ant use the local jar files and not download from the internet, so that build can be done offline? Here is

Re: Not able to start the NodeManager

2012-01-10 Thread Praveen Sripati
s your yarn-env.sh just the standard one from > ./hadoop-mapreduce-project/hadoop-yarn/conf/yarn-env.sh? > > Tom > > > > On 1/9/12 6:16 AM, "Praveen Sripati" wrote: > > Hi, > > I am trying to setup 0.23 on a cluster and am stuck with errors while > starting

Not able to start the NodeManager

2012-01-09 Thread Praveen Sripati
Hi, I am trying to setup 0.23 on a cluster and am stuck with errors while starting the NodeManager. The slaves file is proper and I am able to do a password-less ssh from the master to the slaves. The ResourceManager also starts properly. On running the below command from the master node. >> bin

Re: Queries on next gen MR architecture

2012-01-07 Thread Praveen Sripati
Regards, Praveen On Sun, Jan 8, 2012 at 12:08 AM, Arun C Murthy wrote: > > On Jan 5, 2012, at 8:29 AM, Praveen Sripati wrote: > > Hi, > > I had been going through the MRv2 documentation and have the following > queries > > 1) Let's say that an InputSplit is o

Re: Problem setting up 0.23 on a Cluster

2012-01-07 Thread Praveen Sripati
/slaves: No such file or directory Regards, Praveen On Sat, Jan 7, 2012 at 3:23 PM, Praveen Sripati wrote: > Ronald, > > Here is the output > > uid=1000(praveensripati) gid=1000(praveensripati) > groups=1000(praveensripati),4(adm),20(dialout),24(cdrom),46(plugdev),116(lpadm

Re: Problem setting up 0.23 on a Cluster

2012-01-07 Thread Praveen Sripati
x27;id' output? > > Kindest regards. > > Ron > > > On Fri, Jan 6, 2012 at 9:51 AM, Praveen Sripati > wrote: > >> Hi, >> >> I am able to run 0.23 on a single node and trying to setup it on a >> cluster and getting errors. >> >> When

Re: Queries on next gen MR architecture

2012-01-06 Thread Praveen Sripati
Could someone please clarify on the below queries? Regards, Praveen On Thu, Jan 5, 2012 at 9:59 PM, Praveen Sripati wrote: > Hi, > > I had been going through the MRv2 documentation and have the following > queries > > 1) Let's say that an InputSplit is on Node1

Problem setting up 0.23 on a Cluster

2012-01-06 Thread Praveen Sripati
Hi, I am able to run 0.23 on a single node and trying to setup it on a cluster and getting errors. When I try to start the data nodes, I get the below errors. I have also tried adding `export HADOOP_LOG_DIR=/home/praveensripati/Installations/hadoop-0.23.0/logs` to .bashrc and there hadn't been an

Queries on next gen MR architecture

2012-01-05 Thread Praveen Sripati
Hi, I had been going through the MRv2 documentation and have the following queries 1) Let's say that an InputSplit is on Node1 and Node2. Can the ApplicationMaster ask the ResourceManager for a container either on Node1 or Node2 with an OR condition? 2) > The Scheduler receives periodic informa

Re: Is it possible to user hadoop archive to specify third party libs

2012-01-03 Thread Praveen Sripati
Check this article from Cloudera for different options. http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/ Praveen On Tue, Jan 3, 2012 at 7:41 AM, Harsh J wrote: > Samir, > > I believe HARs won't work there. But you can use a regular tar instead,

Re: output files written by reducers

2012-01-02 Thread Praveen Sripati
1- Does hadoop automatically use the content of the files written by reducers? No. If Job1 and Job2 are run in sequence, then the o/p of Job1 can be i/p to Job2. This has to be done programatically. 2-Are these files (files written by reducers) discarded? If so, when and how? No, if the o/p of t

Re: 0.22 Release and Security

2011-12-29 Thread Praveen Sripati
support Kerberos. > > -Joey > > On Thu, Dec 29, 2011 at 9:41 AM, Praveen Sripati > wrote: > > Hi, > > > > The release notes for 0.22 > > ( > http://hadoop.apache.org/common/releases.html#10+December%2C+2011%3A+release+0.22.0+available > ) >

0.22 Release and Security

2011-12-29 Thread Praveen Sripati
Hi, The release notes for 0.22 ( http://hadoop.apache.org/common/releases.html#10+December%2C+2011%3A+release+0.22.0+available) it says >The following features are not supported in Hadoop 0.22.0. >Security. >Latest optimizations of the MapReduce framework introduced in the Hadoop 0.20.se

Re: External libraries usage

2011-12-28 Thread Praveen Sripati
Check this article from Cloudera on different ways of distributing a jar file to the job. http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/ Praveen On Wed, Dec 28, 2011 at 5:40 AM, Eyal Golan wrote: > Hello, > Another newbie question. > Suppose I

Re: how does Hadoop Yarn support different programming models?

2011-12-26 Thread Praveen Sripati
Bing, FYI ... here are some applications ported to YARN. http://wiki.apache.org/hadoop/PoweredByYarn Praveen On Tue, Dec 27, 2011 at 5:27 AM, Mahadev Konar wrote: > Hi Bing, > These links should give you more info: > > > http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-sit

Re: how to access a mapper counter in reducer

2011-12-09 Thread Praveen Sripati
gt; > > On 12/6/11 9:14 AM, "Mapred Learn" wrote: > > Hi Praveen, > Could you share here so that we can use ? > > Thanks, > > Sent from my iPhone > > On Dec 6, 2011, at 6:29 AM, Praveen Sripati > wrote: > > Robert, > > > I have made the above

Re: Which Hadoop version has Adaptive scheduler ?

2011-12-07 Thread Praveen Sripati
The resolution of the JIRA says unresolved, so it's not yet in any of the release. Best bet is to download the patch attached with the JIRA and see the code changes if interested. Regards, Praveen On Wed, Dec 7, 2011 at 8:06 PM, arun k wrote: > Hi guys ! > > In which Hadoop Version can i find t

Re: how to access a mapper counter in reducer

2011-12-06 Thread Praveen Sripati
Robert, > I have made the above thing work. Any plans to make it into the Hadoop framework. There had been similar queries about it in other forums also. Need any help testing/documenting or anything, please let me know. Regards, Praveen On Sat, Dec 3, 2011 at 2:34 AM, Robert Evans wrote: >

Re: determining what files made up a failing task

2011-12-04 Thread Praveen Sripati
Matt, I could not find the properties in the documentation, so I mentioned this feature as hidden. As Harsh mentioned there is an API. There was a blog entry on ' Automatically Documenting Apache Hadoop Configuration' from Cloudera. It would be great if it is contributed to Apache and made part o

Re: output to file

2011-12-04 Thread Praveen Sripati
Matt, You can extend ArrayWritable. Also use TextOutputFormat as the output format. In the TextOutputFormat key.toString() and value.toString() are called, so override toString() in the subclass of ArrayWritable to get the desired output format for the array. If toString() is not overridden then

Re: determining what files made up a failing task

2011-12-04 Thread Praveen Sripati
Mat, There is no need to know the input data which caused the task and finally the job to fail. Set the 'mapreduce.map.failures.maxpercent` and 'mapreduce.reduce.failures.maxpercent' to the failure tolerance for the job to complete irrespective of some task failures. Again, this is one of the hi

Re: Distributing our jars to all machines in a cluster

2011-11-19 Thread Praveen Sripati
Hi, Here are the different ways of distributing 3rd party jars with the application. http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/ Thanks, Praveen On Wed, Nov 16, 2011 at 11:30 PM, Dmitriy Ryaboy wrote: > Libjars works if your MR job is init

Re: Different ways of configuring the memory to the TaskTracker child process (Mapper and Reduce Tasks)

2011-11-07 Thread Praveen Sripati
Hi, Can someone please clarify me on the below query? Thanks, Praveen On Sun, Nov 6, 2011 at 8:47 PM, Praveen Sripati wrote: > > Hi, > > What is the difference between setting the mapred.job.map.memory.mb and > mapred.child.java.opts using -Xmx to control the maximum memory use

Different ways of configuring the memory to the TaskTracker child process (Mapper and Reduce Tasks)

2011-11-06 Thread Praveen Sripati
Hi, What is the difference between setting the mapred.job.map.memory.mb and mapred.child.java.opts using -Xmx to control the maximum memory used by a Mapper and Reduce task? Which one takes precedence? Thanks, Praveen

Re: Streaming question.

2011-11-02 Thread Praveen Sripati
Dan, It is a known bug (https://issues.apache.org/jira/browse/MAPREDUCE-1888) which has been identified in 0.21.0 release. Which Hadoop release are you using? Thanks, Praveen On Thu, Nov 3, 2011 at 10:22 AM, Dan Young wrote: > I'm a total newbie @ Hadoop and and trying to follow an example (a

Specifying the MR jar file

2011-10-11 Thread Praveen Sripati
Hi, What is the difference between specifying the jar file using JobConf API and the 'hadoop jar' command? JobConf conf = new JobConf(getConf(), getClass()); bin/hadoop jar /home/praveensripati/Hadoop/MaxTemperature/MaxTemperature.jar MaxTemperature /user/praveensripati/input /user/praveensripat

Re: How to stop a MR Job when a certain condition is met?

2011-09-30 Thread Praveen Sripati
nputs to your map when the > mapper/recordreader finds the needle in the haystack. > > Arun > > Sent from my iPhone > > On Sep 30, 2011, at 8:39 PM, Praveen Sripati > wrote: > > Hi, > > Is there a way to stop an entire job when a certain condition is met in the

How to stop a MR Job when a certain condition is met?

2011-09-30 Thread Praveen Sripati
Hi, Is there a way to stop an entire job when a certain condition is met in the map/reduce function? Like looking for a particular key or value. Thanks, Praveen

Re: How to pull data in the Map/Reduce functions?

2011-09-24 Thread Praveen Sripati
ion and Hadoop In Action > covered the new api.**** > > ** ** > > Matt > > ** ** > > *From:* Praveen Sripati [mailto:praveensrip...@gmail.com] > *Sent:* Saturday, September 24, 2011 8:43 AM > *To:* mapreduce-user@hadoop.apache.org > *Subject:* How to pull data in the Map/Reduc

How to pull data in the Map/Reduce functions?

2011-09-24 Thread Praveen Sripati
Hi, Normally the Hadoop framework calls the map()/reduce() for each record in the input split. I read in the 'Hadoop : The Definitive Guide' that that data can be pulled using the new MR API. What is the new API for pulling the data in the map()/reduce() or is there a sample code? Thanks, Pravee

Re: Running Hadoop in different modes

2011-09-23 Thread Praveen Sripati
1:10 PM, Harsh J wrote: > Hello Praveen, > > Is your question from a test-case perspective? > > Cause otherwise is it not clear what you gain in 'Distributed' vs. > 'Standalone'? > > On Fri, Sep 23, 2011 at 12:15 PM, Praveen Sripati > wrote: > >

Running Hadoop in different modes

2011-09-22 Thread Praveen Sripati
Hi, What are the features available in the Fully-Distributed Mode and the Pseudo-Distributed Mode that are not available in the Local (Standalone) Mode? Local (Stanndalone) Mode is very fast and I am able get in run in Eclipse also. Thanks, Praveen

Re: Regarding FIFO scheduler

2011-09-22 Thread Praveen Sripati
f filtering, so there > isn't too much intermediate data. > > -Joey > > On Thu, Sep 22, 2011 at 6:38 AM, Praveen Sripati > wrote: > > Joey, > > > > Thanks for the response. > > > > 'mapreduce.job.reduce.slowstart.completedmaps' is def

Re: Regarding FIFO scheduler

2011-09-22 Thread Praveen Sripati
slower job, and you haven't configured > mapred.reduce.slowstart.completed.maps, then J1 could launch a bunch > of idle reduce tasks which would starve J2. > > In general, it's best to configure the slow start property and to use > the fair scheduler or capacity scheduler. > > -Joey

Regarding FIFO scheduler

2011-09-22 Thread Praveen Sripati
Hi, Lets assume that there are two jobs J1 (100 map tasks) and J2 (200 map tasks) and the cluster has a capacity of 150 map tasks (15 nodes with 10 map tasks per node) and Hadoop is using the default FIFO scheduler. If I submit first J1 and then J2, will the jobs run in parallel or the job J1 has

Error running Hadoop in VirtualBox

2011-09-09 Thread Praveen Sripati
Hi, I have the following configuration - Ubuntu 11.04 as Guest and Host using VirtualBox and trying to run Hadoop 0.21.0. The host is acting as namenode/data node/job tracker/task tracker and the guest is acting as a data node/task tracker. Every thing works fine in a 'Bridged Adapter' mode, but

Re: Need Help in Setting up the NextGen MapReduce.

2011-08-02 Thread Praveen Sripati
Vinay, https://issues.apache.org/jira/browse/MAPREDUCE-279 http://svn.apache.org/repos/asf/hadoop/common/branches/MR-279/mapreduce/INSTALL http://svn.apache.org/repos/asf/hadoop/common/branches/MR-279/mapreduce/yarn/README Thanks, Praveen On Tue, Aug 2, 2011 at 3:43 PM, Vinayakumar B wrote: >

NodeManager not able to connect to the ResourceManager (MRv2)

2011-07-21 Thread Praveen Sripati
Hi, I followed the below instructions to compile the MRv2 code. http://svn.apache.org/repos/asf/hadoop/common/branches/MR-279/mapreduce/INSTALL I start the resourcemanager and then the nodemanager and see the following error in the yarn-praveensripati-nodemanager-master.log file. 2011-07-21 19:

Hadoop Jar Files

2011-05-30 Thread Praveen Sripati
Hi, I have extracted the hadoop-0.20.2, hadoop-0.20.203.0 and hadoop-0.21.0 files. In the hadoop-0.21.0 folder the hadoop-hdfs-0.21.0.jar, hadoop-mapred-0.21.0.jar and the hadoop-common-0.21.0.jar files are there. But in the hadoop-0.20.2 and the hadoop-0.20.203.0 releases the same files are mis

Re: Number of map tasks spawned

2010-04-26 Thread Praveen Sripati
Could someone please answer this? Thanks, Praveen On Sun, Apr 25, 2010 at 4:28 PM, Praveen Sripati wrote: > > Hi, > > The MapReduce tutorial specifies that > > >> The Hadoop Map/Reduce framework spawns one map task for each InputSplit > generated by the InputForma

Number of map tasks spawned

2010-04-25 Thread Praveen Sripati
Hi, The MapReduce tutorial specifies that >> The Hadoop Map/Reduce framework spawns one map task for each InputSplit generated by the InputFormat for the job. But, the mapred.map.tasks definition is >> The default number of map tasks per job. Ignored when mapred.job.tracker is "local". S