Re: How does mapper process partial records?

2013-01-24 Thread Praveen Sripati
://stackoverflow.com/users/614157/praveen-sripati If you aren’t taking advantage of big data, then you don’t have big data, you have just a pile of data. On Fri, Jan 25, 2013 at 12:52 AM, Harsh J ha...@cloudera.com wrote: Hi Praveen, This is explained at http://wiki.apache.org/hadoop/HadoopMapReduce

Re: Modifying Hadoop For join Operation

2013-01-24 Thread Praveen Sripati
Hadoop CDH4 (95%) http://www.thecloudavenue.com/ http://stackoverflow.com/users/614157/praveen-sripati If you aren’t taking advantage of big data, then you don’t have big data, you have just a pile of data. On Thu, Jan 24, 2013 at 8:39 PM, Harsh J ha...@cloudera.com wrote: Hi, Can you also

Ant offline build

2012-03-31 Thread Praveen Sripati
Hi, I have got the code for 0.22 and did the build successfully using 'ant clean compile eclipse' command. But, the ant command is downloading the dependent jar files every time. How to make ant use the local jar files and not download from the internet, so that build can be done offline? Here

Kerberos and Delegation Tokens

2012-03-17 Thread Praveen Sripati
Hi, According to the 'Hadoop - The Definitive Guide' In a distributed system like HDFS or MapReduce, there are many client-server interactions, each of which must be authenticated. For example, an HDFS read operation will involve multiple calls to the namenode and calls to one or more

Re: Security at file level in Hadoop

2012-02-22 Thread Praveen Sripati
According to this (http://goo.gl/rfwy4) Prior to 0.22, Hadoop uses the 'whoami' and id commands to determine the user and groups of the running process. How does this work now? Praveen On Wed, Feb 22, 2012 at 6:03 PM, Joey Echeverria j...@cloudera.com wrote: HDFS supports POSIX style file

Re: Rack Awareness behaviour - Loss of rack

2012-02-10 Thread Praveen Sripati
I have rack awareness configured and seems to work fine. My default rep count is 2. Now I lost one rack due to switch failure. Here is what I observe HDFS continues to write in the existing available rack. It still keeps two copies of each block, but now these blocks are being stored in

Re: Setting up Federated HDFS

2012-02-09 Thread Praveen Sripati
Chandra, In the namenode hdfs*xml, dfs.federation.nameservice.id is set to ns1, but ns1 is not being used in the xml for defining the name node properties.. Here are the instructions to getting started with HDFS federation and mount tables.

Re: Can't achieve load distribution

2012-02-02 Thread Praveen Sripati
I have a simple MR job, and I want each Mapper to get one line from my input file (which contains further instructions for lengthy processing). Use the NLineInputFormat class. http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.html

Re: Can't achieve load distribution

2012-02-02 Thread Praveen Sripati
the right thing, but it's API 0.21 (I googled about the problems with it), so I have to use either the next Cloudera release, or Hortonworks, or something, am I right? Mark On Thu, Feb 2, 2012 at 7:39 AM, Praveen Sripati praveensrip...@gmail.com wrote: I have a simple MR job, and I want each

Re: HDFS Federation Exception

2012-01-11 Thread Praveen Sripati
/r0.23.0/hadoop-yarn/hadoop-yarn-site/Federation.html Praveen On Wed, Jan 11, 2012 at 3:40 PM, Praveen Sripati praveensrip...@gmail.comwrote: Hi, Got the latest code to see if any bugs were fixed and did try federation with the same configuration, but was getting similar exception. 2012-01-11 15

Re: HDFS Federation Exception

2012-01-11 Thread Praveen Sripati
Suresh, Here is the JIRA - https://issues.apache.org/jira/browse/HDFS-2778 Regards, Praveen On Wed, Jan 11, 2012 at 9:28 PM, Suresh Srinivas sur...@hortonworks.comwrote: Thanks for figuring that. Could you create an HDFS Jira for this issue? On Wednesday, January 11, 2012, Praveen Sripati

Re: Not able to start the NodeManager

2012-01-10 Thread Praveen Sripati
-env.sh? Is your yarn-env.sh just the standard one from ./hadoop-mapreduce-project/hadoop-yarn/conf/yarn-env.sh? Tom On 1/9/12 6:16 AM, Praveen Sripati praveensrip...@gmail.com wrote: Hi, I am trying to setup 0.23 on a cluster and am stuck with errors while starting the NodeManager

Re: connection between slaves and master

2012-01-10 Thread Praveen Sripati
Mark, [mark@node67 ~]$ telnet node77 You need to specify the port number along with the server name like `telnet node77 1234`. 2012-01-09 10:04:03,436 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:12123. Already tried 0 time(s). Slaves are not able to

Fwd: HDFS Federation Exception

2012-01-10 Thread Praveen Sripati
Hi, I am trying to setup a HDFS federation and getting the below error. Also, pasted the core-site.xml and hdfs-site.xml at the bottom of the mail. Did I miss something in the configuration files? 2012-01-11 12:12:15,759 ERROR namenode.NameNode (NameNode.java:main(803)) - Exception in namenode

Not able to start the NodeManager

2012-01-09 Thread Praveen Sripati
Hi, I am trying to setup 0.23 on a cluster and am stuck with errors while starting the NodeManager. The slaves file is proper and I am able to do a password-less ssh from the master to the slaves. The ResourceManager also starts properly. On running the below command from the master node.

Re: Problem setting up 0.23 on a Cluster

2012-01-07 Thread Praveen Sripati
, What does 'id' output? Kindest regards. Ron On Fri, Jan 6, 2012 at 9:51 AM, Praveen Sripati praveensrip...@gmail.comwrote: Hi, I am able to run 0.23 on a single node and trying to setup it on a cluster and getting errors. When I try to start the data nodes, I get the below errors. I

Re: Problem setting up 0.23 on a Cluster

2012-01-07 Thread Praveen Sripati
/slaves: No such file or directory Regards, Praveen On Sat, Jan 7, 2012 at 3:23 PM, Praveen Sripati praveensrip...@gmail.comwrote: Ronald, Here is the output uid=1000(praveensripati) gid=1000(praveensripati) groups=1000(praveensripati),4(adm),20(dialout),24(cdrom),46(plugdev),116(lpadmin),118

Re: Queries on next gen MR architecture

2012-01-07 Thread Praveen Sripati
8, 2012 at 12:08 AM, Arun C Murthy a...@hortonworks.com wrote: On Jan 5, 2012, at 8:29 AM, Praveen Sripati wrote: Hi, I had been going through the MRv2 documentation and have the following queries 1) Let's say that an InputSplit is on Node1 and Node2. Can the ApplicationMaster ask

NameNode Safe Mode and CheckPointing

2012-01-07 Thread Praveen Sripati
When the checkpointing starts, the primary namenode starts a new edits file. During the checkpointing process will the namenode go into safe mode? According to the Hadoop - The Definitive Guide The schedule for checkpointing is controlled by two configuration parameters. The secondary namenode

Re: NameNode Safe Mode and CheckPointing

2012-01-07 Thread Praveen Sripati
During the time the NN stops writing to the old edits file and creates a new edit file, will the file modifications work or not? Curious, how this is handled in the code. Praveen On Sun, Jan 8, 2012 at 9:34 AM, Harsh J ha...@cloudera.com wrote: Praveen, On 08-Jan-2012, at 9:13 AM, Praveen

Problem setting up 0.23 on a Cluster

2012-01-06 Thread Praveen Sripati
Hi, I am able to run 0.23 on a single node and trying to setup it on a cluster and getting errors. When I try to start the data nodes, I get the below errors. I have also tried adding `export HADOOP_LOG_DIR=/home/praveensripati/Installations/hadoop-0.23.0/logs` to .bashrc and there hadn't been

Re: Queries on next gen MR architecture

2012-01-06 Thread Praveen Sripati
Could someone please clarify on the below queries? Regards, Praveen On Thu, Jan 5, 2012 at 9:59 PM, Praveen Sripati praveensrip...@gmail.comwrote: Hi, I had been going through the MRv2 documentation and have the following queries 1) Let's say that an InputSplit is on Node1 and Node2

Queries on next gen MR architecture

2012-01-05 Thread Praveen Sripati
Hi, I had been going through the MRv2 documentation and have the following queries 1) Let's say that an InputSplit is on Node1 and Node2. Can the ApplicationMaster ask the ResourceManager for a container either on Node1 or Node2 with an OR condition? 2) The Scheduler receives periodic

Re: Is it possible to user hadoop archive to specify third party libs

2012-01-03 Thread Praveen Sripati
Check this article from Cloudera for different options. http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/ Praveen On Tue, Jan 3, 2012 at 7:41 AM, Harsh J ha...@cloudera.com wrote: Samir, I believe HARs won't work there. But you can use a

Re: Allowing multiple users to submit jobs in hadoop 0.20.205 ?

2012-01-03 Thread Praveen Sripati
By default `security.job.submission.protocol.acl` is set to * in the hadoop-policy.xml, so it will allow any/multiple users to submit/query job status. Check this (1) for more details. property namesecurity.job.submission.protocol.acl/name value*/value descriptionACL for

Re: Hive starting error

2012-01-03 Thread Praveen Sripati
http://hive.apache.org/releases.html#21+June%2C+2011%3A+release+0.7.1+available 21 June, 2011: release 0.7.1 available This release is the latest release of Hive and it works with Hadoop 0.20.1 and 0.20.2 I don't see the method the method thrown in the exception in 0.20.205. Praveen On Fri,

Re: output files written by reducers

2012-01-02 Thread Praveen Sripati
1- Does hadoop automatically use the content of the files written by reducers? No. If Job1 and Job2 are run in sequence, then the o/p of Job1 can be i/p to Job2. This has to be done programatically. 2-Are these files (files written by reducers) discarded? If so, when and how? No, if the o/p of

Re: Remote access to namenode is not allowed despite the services are already started.

2012-01-02 Thread Praveen Sripati
Changing the VM settings won't help. Change the value of fs.default.name to hdfs://106.77.211.187:9000 from hdfs://localhost:9000 in core-site.xml for both the client and the NameNode. Replace the IP address with the IP address of the node on which the NameNode is running or with the hostname.

Re: 0.22 Release and Security

2011-12-29 Thread Praveen Sripati
. -Joey On Thu, Dec 29, 2011 at 9:41 AM, Praveen Sripati praveensrip...@gmail.com wrote: Hi, The release notes for 0.22 ( http://hadoop.apache.org/common/releases.html#10+December%2C+2011%3A+release+0.22.0+available ) it says The following features are not supported in Hadoop

0.22 Release and Security

2011-12-29 Thread Praveen Sripati
Hi, The release notes for 0.22 ( http://hadoop.apache.org/common/releases.html#10+December%2C+2011%3A+release+0.22.0+available) it says The following features are not supported in Hadoop 0.22.0. Security. Latest optimizations of the MapReduce framework introduced in the Hadoop

Re: Hadoop MySQL database access

2011-12-29 Thread Praveen Sripati
Check the `mapreduce.job.reduce.slowstart.completedmaps` parameter. The reducers cannot start processing the data from the mappers until the all the map tasks are complete, but the reducers can start fetching the data from the nodes on which the map tasks have completed. Praveen On Thu, Dec 29,

Re: External libraries usage

2011-12-28 Thread Praveen Sripati
Check this article from Cloudera on different ways of distributing a jar file to the job. http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/ Praveen On Wed, Dec 28, 2011 at 5:40 AM, Eyal Golan egola...@gmail.com wrote: Hello, Another newbie

Re: how does Hadoop Yarn support different programming models?

2011-12-26 Thread Praveen Sripati
Bing, FYI ... here are some applications ported to YARN. http://wiki.apache.org/hadoop/PoweredByYarn Praveen On Tue, Dec 27, 2011 at 5:27 AM, Mahadev Konar maha...@hortonworks.comwrote: Hi Bing, These links should give you more info:

Re: Configuring Fully-Distributed Operation

2011-12-26 Thread Praveen Sripati
At the minimum you need to specify the location of the namenode and the jobtracker in the configuration files for all the nodes and the client, rest of the properties are defaulted. Also, based on the # of data nodes you also need to specify the hdfs replication factor. Praveen On Sun, Dec 25,

Re: Which Hadoop version has Adaptive scheduler ?

2011-12-07 Thread Praveen Sripati
The resolution of the JIRA says unresolved, so it's not yet in any of the release. Best bet is to download the patch attached with the JIRA and see the code changes if interested. Regards, Praveen On Wed, Dec 7, 2011 at 8:06 PM, arun k arunk...@gmail.com wrote: Hi guys ! In which Hadoop

Re: how to access a mapper counter in reducer

2011-12-06 Thread Praveen Sripati
Robert, I have made the above thing work. Any plans to make it into the Hadoop framework. There had been similar queries about it in other forums also. Need any help testing/documenting or anything, please let me know. Regards, Praveen On Sat, Dec 3, 2011 at 2:34 AM, Robert Evans

Re: Automate Hadoop installation

2011-12-06 Thread Praveen Sripati
Also, checkout Ambari (http://incubator.apache.org/ambari/) which is still in the Incubator status. How does Ambari and Puppet compare? Regards, Praveen On Tue, Dec 6, 2011 at 1:00 PM, alo alt wget.n...@googlemail.com wrote: Hi, to deploy software I suggest pulp:

Re: Multiple Mappers for Multiple Tables

2011-12-06 Thread Praveen Sripati
MultipleInputs take multiple Path (files) and not DB as input. As mentioned earlier export tables into HDFS either using Sqoop or native DB export tool and then do the processing. Sqoop is configured to use native DB export tool whenever possible. Regards, Praveen On Tue, Dec 6, 2011 at 3:44 AM,

Re: Running a job continuously

2011-12-06 Thread Praveen Sripati
If the requirement is for real time data processing, using Flume will not suffice as there is a time lag between the collection of files by Flume and processing done by Hadoop. Consider frameworks like S4, Storm (from Twitter), HStreaming etc which suits realtime processing. Regards, Praveen On

Re: HDFS HTTP client

2011-12-06 Thread Praveen Sripati
Also check WebHDFS (1). I think both Hoop and WebHDFS are not into Hadoop yet. Check the HDFS-2178 and HDFS-2316 JIRA for the status. (1) - http://hortonworks.com/webhdfs-%E2%80%93-http-rest-access-to-hdfs/ Regards, Praveen On Tue, Dec 6, 2011 at 4:39 PM, alo alt wget.n...@googlemail.com wrote:

Re: determining what files made up a failing task

2011-12-04 Thread Praveen Sripati
Mat, There is no need to know the input data which caused the task and finally the job to fail. Set the 'mapreduce.map.failures.maxpercent` and 'mapreduce.reduce.failures.maxpercent' to the failure tolerance for the job to complete irrespective of some task failures. Again, this is one of the

Re: determining what files made up a failing task

2011-12-04 Thread Praveen Sripati
Matt, I could not find the properties in the documentation, so I mentioned this feature as hidden. As Harsh mentioned there is an API. There was a blog entry on ' Automatically Documenting Apache Hadoop Configuration' from Cloudera. It would be great if it is contributed to Apache and made part

Re: Availability of Job traces or logs

2011-12-04 Thread Praveen Sripati
Arun, I want to control the split placements. InputSplits are logical and part of the input data, there is nothing to do with placement of the InputSplits. InputSplits are calculated on a client by the InputFormat class when a job is submitted and the InputSplit metadata data is put in HDFS to

Re: How do I programmatically get total job execution time?

2011-12-02 Thread Praveen Sripati
Hi, Ran a job using new MR API in stand alone mode and 0.21. Both, Job#getFinishTime and Job#getStartTime are returning 0. Not sure, if this is a bug. Thanks, Praveen On Sat, Dec 3, 2011 at 6:14 AM, Raj V rajv...@yahoo.com wrote: As Harsh said, I don't think there is a simple way to way to

Re: Hadoop Security

2011-11-28 Thread Praveen Sripati
Hi, 3. Is any kind of encryption is handled in hadoop at the time of storing the files in HDFS. You could define a compression codec that does the encryption. Check the below thread for more details. http://www.mail-archive.com/common-user@hadoop.apache.org/msg06229.html Thanks, Praveen On

Re: Streaming question.

2011-11-02 Thread Praveen Sripati
Dan, It is a known bug (https://issues.apache.org/jira/browse/MAPREDUCE-1888) which has been identified in 0.21.0 release. Which Hadoop release are you using? Thanks, Praveen On Thu, Nov 3, 2011 at 10:22 AM, Dan Young danoyo...@gmail.com wrote: I'm a total newbie @ Hadoop and and trying to

Specifying the MR jar file

2011-10-11 Thread Praveen Sripati
Hi, What is the difference between specifying the jar file using JobConf API and the 'hadoop jar' command? JobConf conf = new JobConf(getConf(), getClass()); bin/hadoop jar /home/praveensripati/Hadoop/MaxTemperature/MaxTemperature.jar MaxTemperature /user/praveensripati/input

Re: How to stop a MR Job when a certain condition is met?

2011-09-30 Thread Praveen Sripati
inputs to your map when the mapper/recordreader finds the needle in the haystack. Arun Sent from my iPhone On Sep 30, 2011, at 8:39 PM, Praveen Sripati praveensrip...@gmail.com wrote: Hi, Is there a way to stop an entire job when a certain condition is met in the map/reduce function? Like

How to pull data in the Map/Reduce functions?

2011-09-24 Thread Praveen Sripati
Hi, Normally the Hadoop framework calls the map()/reduce() for each record in the input split. I read in the 'Hadoop : The Definitive Guide' that that data can be pulled using the new MR API. What is the new API for pulling the data in the map()/reduce() or is there a sample code? Thanks,

Re: How to pull data in the Map/Reduce functions?

2011-09-24 Thread Praveen Sripati
** ** *From:* Praveen Sripati [mailto:praveensrip...@gmail.com] *Sent:* Saturday, September 24, 2011 8:43 AM *To:* mapreduce-user@hadoop.apache.org *Subject:* How to pull data in the Map/Reduce functions? ** ** Hi, Normally the Hadoop framework calls the map()/reduce() for each record

Running Hadoop in different modes

2011-09-23 Thread Praveen Sripati
Hi, What are the features available in the Fully-Distributed Mode and the Pseudo-Distributed Mode that are not available in the Local (Standalone) Mode? Local (Stanndalone) Mode is very fast and I am able get in run in Eclipse also. Thanks, Praveen

Re: Running Hadoop in different modes

2011-09-23 Thread Praveen Sripati
PM, Harsh J ha...@cloudera.com wrote: Hello Praveen, Is your question from a test-case perspective? Cause otherwise is it not clear what you gain in 'Distributed' vs. 'Standalone'? On Fri, Sep 23, 2011 at 12:15 PM, Praveen Sripati praveensrip...@gmail.com wrote: Hi, What

Regarding FIFO scheduler

2011-09-22 Thread Praveen Sripati
Hi, Lets assume that there are two jobs J1 (100 map tasks) and J2 (200 map tasks) and the cluster has a capacity of 150 map tasks (15 nodes with 10 map tasks per node) and Hadoop is using the default FIFO scheduler. If I submit first J1 and then J2, will the jobs run in parallel or the job J1 has

Re: Regarding FIFO scheduler

2011-09-22 Thread Praveen Sripati
of filtering, so there isn't too much intermediate data. -Joey On Thu, Sep 22, 2011 at 6:38 AM, Praveen Sripati praveensrip...@gmail.com wrote: Joey, Thanks for the response. 'mapreduce.job.reduce.slowstart.completedmaps' is default set to 0.05 and says 'Fraction of the number of maps

Error running Hadoop in VirtualBox

2011-09-09 Thread Praveen Sripati
Hi, I have the following configuration - Ubuntu 11.04 as Guest and Host using VirtualBox and trying to run Hadoop 0.21.0. The host is acting as namenode/data node/job tracker/task tracker and the guest is acting as a data node/task tracker. Every thing works fine in a 'Bridged Adapter' mode, but

Re: Binary content

2011-09-02 Thread Praveen Sripati
Mohit, Hadoop: The Definitive Guide (Chapter 3 - Hadoop I/O) has a section on SequenceFile and is worth reading. http://oreilly.com/catalog/9780596521981 Thanks, Praveen On Thu, Sep 1, 2011 at 9:15 PM, Owen O'Malley o...@hortonworks.com wrote: On Thu, Sep 1, 2011 at 8:37 AM, Mohit Anchlia

Re: set reduced block size for a specific file

2011-08-27 Thread Praveen Sripati
Hi, There are tons of parameters for mapreduce. How to know if a property is a client or serve side property? Thanks, Praveen On Sun, Aug 28, 2011 at 4:53 AM, Aaron T. Myers a...@cloudera.com wrote: Hey Ben, I just filed this JIRA to add this feature:

NodeManager not able to connect to the ResourceManager (MRv2)

2011-07-21 Thread Praveen Sripati
Hi, I followed the below instructions to compile the MRv2 code. http://svn.apache.org/repos/asf/hadoop/common/branches/MR-279/mapreduce/INSTALL I start the resourcemanager and then the nodemanager and see the following error in the yarn-praveensripati-nodemanager-master.log file. 2011-07-21

Hadoop Jar Files

2011-05-30 Thread Praveen Sripati
Hi, I have extracted the hadoop-0.20.2, hadoop-0.20.203.0 and hadoop-0.21.0 files. In the hadoop-0.21.0 folder the hadoop-hdfs-0.21.0.jar, hadoop-mapred-0.21.0.jar and the hadoop-common-0.21.0.jar files are there. But in the hadoop-0.20.2 and the hadoop-0.20.203.0 releases the same files are

Eclipse Hadoop Plugin Error creating New Hadoop location ....

2011-05-30 Thread Praveen Sripati
Hi, I am trying to run Hadoop from Eclipse using the Eclipse Hadoop Plugin and stuck with the following problem. First copied the hadoop-0.21.0-eclipse-plugin.jar to the Eclipse Plugin folder, started eclipse and switched to the Map/Reduce perspective. In the Map/Reduce Locations View when I try

Hadoop Jar Files

2011-05-30 Thread Praveen Sripati
Hi, I have extracted the hadoop-0.20.2, hadoop-0.20.203.0 and hadoop-0.21.0 files. In the hadoop-0.21.0 folder the hadoop-hdfs-0.21.0.jar, hadoop-mapred-0.21.0.jar and the hadoop-common-0.21.0.jar files are there. But in the hadoop-0.20.2 and the hadoop-0.20.203.0 releases the same files are

Hadoop Jar Files

2011-05-30 Thread Praveen Sripati
Hi, I have extracted the hadoop-0.20.2, hadoop-0.20.203.0 and hadoop-0.21.0 files. In the hadoop-0.21.0 folder the hadoop-hdfs-0.21.0.jar, hadoop-mapred-0.21.0.jar and the hadoop-common-0.21.0.jar files are there. But in the hadoop-0.20.2 and the hadoop-0.20.203.0 releases the same files are

Number of map tasks spawned

2010-04-25 Thread Praveen Sripati
Hi, The MapReduce tutorial specifies that The Hadoop Map/Reduce framework spawns one map task for each InputSplit generated by the InputFormat for the job. But, the mapred.map.tasks definition is The default number of map tasks per job. Ignored when mapred.job.tracker is local. So, is