/
http://stackoverflow.com/users/614157/praveen-sripati
If you aren’t taking advantage of big data, then you don’t have big data,
you have just a pile of data.
On Fri, Jan 25, 2013 at 12:52 AM, Harsh J wrote:
> Hi Praveen,
>
> This is explained at http://wiki.apache.org/hadoop/HadoopM
Hi,
I have got the code for 0.22 and did the build successfully using 'ant
clean compile eclipse' command. But, the ant command is downloading the
dependent jar files every time. How to make ant use the local jar files and
not download from the internet, so that build can be done offline?
Here is
s your yarn-env.sh just the standard one from
> ./hadoop-mapreduce-project/hadoop-yarn/conf/yarn-env.sh?
>
> Tom
>
>
>
> On 1/9/12 6:16 AM, "Praveen Sripati" wrote:
>
> Hi,
>
> I am trying to setup 0.23 on a cluster and am stuck with errors while
> starting
Hi,
I am trying to setup 0.23 on a cluster and am stuck with errors while
starting the NodeManager. The slaves file is proper and I am able to do a
password-less ssh from the master to the slaves. The ResourceManager also
starts properly.
On running the below command from the master node.
>> bin
Regards,
Praveen
On Sun, Jan 8, 2012 at 12:08 AM, Arun C Murthy wrote:
>
> On Jan 5, 2012, at 8:29 AM, Praveen Sripati wrote:
>
> Hi,
>
> I had been going through the MRv2 documentation and have the following
> queries
>
> 1) Let's say that an InputSplit is o
/slaves: No such
file or directory
Regards,
Praveen
On Sat, Jan 7, 2012 at 3:23 PM, Praveen Sripati wrote:
> Ronald,
>
> Here is the output
>
> uid=1000(praveensripati) gid=1000(praveensripati)
> groups=1000(praveensripati),4(adm),20(dialout),24(cdrom),46(plugdev),116(lpadm
x27;id' output?
>
> Kindest regards.
>
> Ron
>
>
> On Fri, Jan 6, 2012 at 9:51 AM, Praveen Sripati
> wrote:
>
>> Hi,
>>
>> I am able to run 0.23 on a single node and trying to setup it on a
>> cluster and getting errors.
>>
>> When
Could someone please clarify on the below queries?
Regards,
Praveen
On Thu, Jan 5, 2012 at 9:59 PM, Praveen Sripati wrote:
> Hi,
>
> I had been going through the MRv2 documentation and have the following
> queries
>
> 1) Let's say that an InputSplit is on Node1
Hi,
I am able to run 0.23 on a single node and trying to setup it on a cluster
and getting errors.
When I try to start the data nodes, I get the below errors. I have also
tried adding `export
HADOOP_LOG_DIR=/home/praveensripati/Installations/hadoop-0.23.0/logs` to
.bashrc and there hadn't been an
Hi,
I had been going through the MRv2 documentation and have the following
queries
1) Let's say that an InputSplit is on Node1 and Node2.
Can the ApplicationMaster ask the ResourceManager for a container either on
Node1 or Node2 with an OR condition?
2) > The Scheduler receives periodic informa
Check this article from Cloudera for different options.
http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
Praveen
On Tue, Jan 3, 2012 at 7:41 AM, Harsh J wrote:
> Samir,
>
> I believe HARs won't work there. But you can use a regular tar instead,
1- Does hadoop automatically use the content of the files written by
reducers?
No. If Job1 and Job2 are run in sequence, then the o/p of Job1 can be i/p
to Job2. This has to be done programatically.
2-Are these files (files written by reducers) discarded? If so, when and
how?
No, if the o/p of t
support Kerberos.
>
> -Joey
>
> On Thu, Dec 29, 2011 at 9:41 AM, Praveen Sripati
> wrote:
> > Hi,
> >
> > The release notes for 0.22
> > (
> http://hadoop.apache.org/common/releases.html#10+December%2C+2011%3A+release+0.22.0+available
> )
>
Hi,
The release notes for 0.22 (
http://hadoop.apache.org/common/releases.html#10+December%2C+2011%3A+release+0.22.0+available)
it says
>The following features are not supported in Hadoop 0.22.0.
>Security.
>Latest optimizations of the MapReduce framework introduced in the
Hadoop 0.20.se
Check this article from Cloudera on different ways of distributing a jar
file to the job.
http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
Praveen
On Wed, Dec 28, 2011 at 5:40 AM, Eyal Golan wrote:
> Hello,
> Another newbie question.
> Suppose I
Bing,
FYI ... here are some applications ported to YARN.
http://wiki.apache.org/hadoop/PoweredByYarn
Praveen
On Tue, Dec 27, 2011 at 5:27 AM, Mahadev Konar wrote:
> Hi Bing,
> These links should give you more info:
>
>
> http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-sit
gt;
>
> On 12/6/11 9:14 AM, "Mapred Learn" wrote:
>
> Hi Praveen,
> Could you share here so that we can use ?
>
> Thanks,
>
> Sent from my iPhone
>
> On Dec 6, 2011, at 6:29 AM, Praveen Sripati
> wrote:
>
> Robert,
>
> > I have made the above
The resolution of the JIRA says unresolved, so it's not yet in any of the
release. Best bet is to download the patch attached with the JIRA and see
the code changes if interested.
Regards,
Praveen
On Wed, Dec 7, 2011 at 8:06 PM, arun k wrote:
> Hi guys !
>
> In which Hadoop Version can i find t
Robert,
> I have made the above thing work.
Any plans to make it into the Hadoop framework. There had been similar
queries about it in other forums also. Need any help testing/documenting or
anything, please let me know.
Regards,
Praveen
On Sat, Dec 3, 2011 at 2:34 AM, Robert Evans wrote:
>
Matt,
I could not find the properties in the documentation, so I mentioned this
feature as hidden. As Harsh mentioned there is an API.
There was a blog entry on '
Automatically Documenting Apache Hadoop Configuration' from Cloudera. It
would be great if it is contributed to Apache and made part o
Matt,
You can extend ArrayWritable. Also use TextOutputFormat as the output
format.
In the TextOutputFormat key.toString() and value.toString() are called, so
override toString() in the subclass of ArrayWritable to get the desired
output format for the array. If toString() is not overridden then
Mat,
There is no need to know the input data which caused the task and finally
the job to fail.
Set the 'mapreduce.map.failures.maxpercent` and
'mapreduce.reduce.failures.maxpercent' to the failure tolerance for the job
to complete irrespective of some task failures.
Again, this is one of the hi
Hi,
Here are the different ways of distributing 3rd party jars with the
application.
http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
Thanks,
Praveen
On Wed, Nov 16, 2011 at 11:30 PM, Dmitriy Ryaboy wrote:
> Libjars works if your MR job is init
Hi,
Can someone please clarify me on the below query?
Thanks,
Praveen
On Sun, Nov 6, 2011 at 8:47 PM, Praveen Sripati wrote:
>
> Hi,
>
> What is the difference between setting the mapred.job.map.memory.mb and
> mapred.child.java.opts using -Xmx to control the maximum memory use
Hi,
What is the difference between setting the mapred.job.map.memory.mb and
mapred.child.java.opts using -Xmx to control the maximum memory used by a
Mapper and Reduce task? Which one takes precedence?
Thanks,
Praveen
Dan,
It is a known bug (https://issues.apache.org/jira/browse/MAPREDUCE-1888)
which has been identified in 0.21.0 release. Which Hadoop release are you
using?
Thanks,
Praveen
On Thu, Nov 3, 2011 at 10:22 AM, Dan Young wrote:
> I'm a total newbie @ Hadoop and and trying to follow an example (a
Hi,
What is the difference between specifying the jar file using JobConf API and
the 'hadoop jar' command?
JobConf conf = new JobConf(getConf(), getClass());
bin/hadoop jar /home/praveensripati/Hadoop/MaxTemperature/MaxTemperature.jar
MaxTemperature /user/praveensripati/input /user/praveensripat
nputs to your map when the
> mapper/recordreader finds the needle in the haystack.
>
> Arun
>
> Sent from my iPhone
>
> On Sep 30, 2011, at 8:39 PM, Praveen Sripati
> wrote:
>
> Hi,
>
> Is there a way to stop an entire job when a certain condition is met in the
Hi,
Is there a way to stop an entire job when a certain condition is met in the
map/reduce function? Like looking for a particular key or value.
Thanks,
Praveen
ion and Hadoop In Action
> covered the new api.****
>
> ** **
>
> Matt
>
> ** **
>
> *From:* Praveen Sripati [mailto:praveensrip...@gmail.com]
> *Sent:* Saturday, September 24, 2011 8:43 AM
> *To:* mapreduce-user@hadoop.apache.org
> *Subject:* How to pull data in the Map/Reduc
Hi,
Normally the Hadoop framework calls the map()/reduce() for each record in
the input split. I read in the 'Hadoop : The Definitive Guide' that that
data can be pulled using the new MR API.
What is the new API for pulling the data in the map()/reduce() or is there a
sample code?
Thanks,
Pravee
1:10 PM, Harsh J wrote:
> Hello Praveen,
>
> Is your question from a test-case perspective?
>
> Cause otherwise is it not clear what you gain in 'Distributed' vs.
> 'Standalone'?
>
> On Fri, Sep 23, 2011 at 12:15 PM, Praveen Sripati
> wrote:
> >
Hi,
What are the features available in the Fully-Distributed Mode and the
Pseudo-Distributed Mode that are not available in the Local (Standalone)
Mode? Local (Stanndalone) Mode is very fast and I am able get in run in
Eclipse also.
Thanks,
Praveen
f filtering, so there
> isn't too much intermediate data.
>
> -Joey
>
> On Thu, Sep 22, 2011 at 6:38 AM, Praveen Sripati
> wrote:
> > Joey,
> >
> > Thanks for the response.
> >
> > 'mapreduce.job.reduce.slowstart.completedmaps' is def
slower job, and you haven't configured
> mapred.reduce.slowstart.completed.maps, then J1 could launch a bunch
> of idle reduce tasks which would starve J2.
>
> In general, it's best to configure the slow start property and to use
> the fair scheduler or capacity scheduler.
>
> -Joey
Hi,
Lets assume that there are two jobs J1 (100 map tasks) and J2 (200 map
tasks) and the cluster has a capacity of 150 map tasks (15 nodes with 10 map
tasks per node) and Hadoop is using the default FIFO scheduler. If I submit
first J1 and then J2, will the jobs run in parallel or the job J1 has
Hi,
I have the following configuration - Ubuntu 11.04 as Guest and Host using
VirtualBox and trying to run Hadoop 0.21.0. The host is acting as
namenode/data node/job tracker/task tracker and the guest is acting as a
data node/task tracker.
Every thing works fine in a 'Bridged Adapter' mode, but
Vinay,
https://issues.apache.org/jira/browse/MAPREDUCE-279
http://svn.apache.org/repos/asf/hadoop/common/branches/MR-279/mapreduce/INSTALL
http://svn.apache.org/repos/asf/hadoop/common/branches/MR-279/mapreduce/yarn/README
Thanks,
Praveen
On Tue, Aug 2, 2011 at 3:43 PM, Vinayakumar B wrote:
>
Hi,
I followed the below instructions to compile the MRv2 code.
http://svn.apache.org/repos/asf/hadoop/common/branches/MR-279/mapreduce/INSTALL
I start the resourcemanager and then the nodemanager and see the following
error in the yarn-praveensripati-nodemanager-master.log file.
2011-07-21 19:
Hi,
I have extracted the hadoop-0.20.2, hadoop-0.20.203.0 and hadoop-0.21.0
files.
In the hadoop-0.21.0 folder the hadoop-hdfs-0.21.0.jar,
hadoop-mapred-0.21.0.jar and the hadoop-common-0.21.0.jar files are there.
But in the hadoop-0.20.2 and the hadoop-0.20.203.0 releases the same files
are mis
Could someone please answer this?
Thanks,
Praveen
On Sun, Apr 25, 2010 at 4:28 PM, Praveen Sripati
wrote:
>
> Hi,
>
> The MapReduce tutorial specifies that
>
> >> The Hadoop Map/Reduce framework spawns one map task for each InputSplit
> generated by the InputForma
Hi,
The MapReduce tutorial specifies that
>> The Hadoop Map/Reduce framework spawns one map task for each
InputSplit generated by the InputFormat for the job.
But, the mapred.map.tasks definition is
>> The default number of map tasks per job. Ignored when
mapred.job.tracker is "local".
S
42 matches
Mail list logo