://stackoverflow.com/users/614157/praveen-sripati
If you aren’t taking advantage of big data, then you don’t have big data,
you have just a pile of data.
On Fri, Jan 25, 2013 at 12:52 AM, Harsh J ha...@cloudera.com wrote:
Hi Praveen,
This is explained at http://wiki.apache.org/hadoop/HadoopMapReduce
Hadoop CDH4 (95%)
http://www.thecloudavenue.com/
http://stackoverflow.com/users/614157/praveen-sripati
If you aren’t taking advantage of big data, then you don’t have big data,
you have just a pile of data.
On Thu, Jan 24, 2013 at 8:39 PM, Harsh J ha...@cloudera.com wrote:
Hi,
Can you also
Hi,
I have got the code for 0.22 and did the build successfully using 'ant
clean compile eclipse' command. But, the ant command is downloading the
dependent jar files every time. How to make ant use the local jar files and
not download from the internet, so that build can be done offline?
Here
Hi,
According to the 'Hadoop - The Definitive Guide'
In a distributed system like HDFS or MapReduce, there are many
client-server interactions, each of which must be authenticated. For
example, an HDFS read operation will involve multiple calls to the namenode
and calls to one or more
According to this (http://goo.gl/rfwy4)
Prior to 0.22, Hadoop uses the 'whoami' and id commands to determine the
user and groups of the running process.
How does this work now?
Praveen
On Wed, Feb 22, 2012 at 6:03 PM, Joey Echeverria j...@cloudera.com wrote:
HDFS supports POSIX style file
I have rack awareness configured and seems to work fine. My default rep
count is 2. Now I lost one rack due to switch failure. Here is what I
observe
HDFS continues to write in the existing available rack. It still keeps
two
copies of each block, but now these blocks are being stored in
Chandra,
In the namenode hdfs*xml, dfs.federation.nameservice.id is set to ns1, but
ns1 is not being used in the xml for defining the name node properties..
Here are the instructions to getting started with HDFS federation and mount
tables.
I have a simple MR job, and I want each Mapper to get one line from my
input file (which contains further instructions for lengthy processing).
Use the NLineInputFormat class.
http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.html
the right thing, but it's API 0.21 (I googled about
the problems with it), so I have to use either the next Cloudera release,
or Hortonworks, or something, am I right?
Mark
On Thu, Feb 2, 2012 at 7:39 AM, Praveen Sripati praveensrip...@gmail.com
wrote:
I have a simple MR job, and I want each
/r0.23.0/hadoop-yarn/hadoop-yarn-site/Federation.html
Praveen
On Wed, Jan 11, 2012 at 3:40 PM, Praveen Sripati
praveensrip...@gmail.comwrote:
Hi,
Got the latest code to see if any bugs were fixed and did try federation
with the same configuration, but was getting similar exception.
2012-01-11 15
Suresh,
Here is the JIRA - https://issues.apache.org/jira/browse/HDFS-2778
Regards,
Praveen
On Wed, Jan 11, 2012 at 9:28 PM, Suresh Srinivas sur...@hortonworks.comwrote:
Thanks for figuring that. Could you create an HDFS Jira for this issue?
On Wednesday, January 11, 2012, Praveen Sripati
-env.sh? Is your yarn-env.sh just the standard one from
./hadoop-mapreduce-project/hadoop-yarn/conf/yarn-env.sh?
Tom
On 1/9/12 6:16 AM, Praveen Sripati praveensrip...@gmail.com wrote:
Hi,
I am trying to setup 0.23 on a cluster and am stuck with errors while
starting the NodeManager
Mark,
[mark@node67 ~]$ telnet node77
You need to specify the port number along with the server name like `telnet
node77 1234`.
2012-01-09 10:04:03,436 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: localhost/127.0.0.1:12123. Already tried 0 time(s).
Slaves are not able to
Hi,
I am trying to setup a HDFS federation and getting the below error. Also,
pasted the core-site.xml and hdfs-site.xml at the bottom of the mail. Did I
miss something in the configuration files?
2012-01-11 12:12:15,759 ERROR namenode.NameNode (NameNode.java:main(803)) -
Exception in namenode
Hi,
I am trying to setup 0.23 on a cluster and am stuck with errors while
starting the NodeManager. The slaves file is proper and I am able to do a
password-less ssh from the master to the slaves. The ResourceManager also
starts properly.
On running the below command from the master node.
,
What does 'id' output?
Kindest regards.
Ron
On Fri, Jan 6, 2012 at 9:51 AM, Praveen Sripati
praveensrip...@gmail.comwrote:
Hi,
I am able to run 0.23 on a single node and trying to setup it on a
cluster and getting errors.
When I try to start the data nodes, I get the below errors. I
/slaves: No such
file or directory
Regards,
Praveen
On Sat, Jan 7, 2012 at 3:23 PM, Praveen Sripati praveensrip...@gmail.comwrote:
Ronald,
Here is the output
uid=1000(praveensripati) gid=1000(praveensripati)
groups=1000(praveensripati),4(adm),20(dialout),24(cdrom),46(plugdev),116(lpadmin),118
8, 2012 at 12:08 AM, Arun C Murthy a...@hortonworks.com wrote:
On Jan 5, 2012, at 8:29 AM, Praveen Sripati wrote:
Hi,
I had been going through the MRv2 documentation and have the following
queries
1) Let's say that an InputSplit is on Node1 and Node2.
Can the ApplicationMaster ask
When the checkpointing starts, the primary namenode starts a new edits
file. During the checkpointing process will the namenode go into safe
mode? According
to the Hadoop - The Definitive Guide
The schedule for checkpointing is controlled by two configuration
parameters. The secondary namenode
During the time the NN stops writing to the old edits file and creates a
new edit file, will the file modifications work or not? Curious, how this
is handled in the code.
Praveen
On Sun, Jan 8, 2012 at 9:34 AM, Harsh J ha...@cloudera.com wrote:
Praveen,
On 08-Jan-2012, at 9:13 AM, Praveen
Hi,
I am able to run 0.23 on a single node and trying to setup it on a cluster
and getting errors.
When I try to start the data nodes, I get the below errors. I have also
tried adding `export
HADOOP_LOG_DIR=/home/praveensripati/Installations/hadoop-0.23.0/logs` to
.bashrc and there hadn't been
Could someone please clarify on the below queries?
Regards,
Praveen
On Thu, Jan 5, 2012 at 9:59 PM, Praveen Sripati praveensrip...@gmail.comwrote:
Hi,
I had been going through the MRv2 documentation and have the following
queries
1) Let's say that an InputSplit is on Node1 and Node2
Hi,
I had been going through the MRv2 documentation and have the following
queries
1) Let's say that an InputSplit is on Node1 and Node2.
Can the ApplicationMaster ask the ResourceManager for a container either on
Node1 or Node2 with an OR condition?
2) The Scheduler receives periodic
Check this article from Cloudera for different options.
http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
Praveen
On Tue, Jan 3, 2012 at 7:41 AM, Harsh J ha...@cloudera.com wrote:
Samir,
I believe HARs won't work there. But you can use a
By default `security.job.submission.protocol.acl` is set to * in the
hadoop-policy.xml, so it will allow any/multiple users to submit/query job
status.
Check this (1) for more details.
property
namesecurity.job.submission.protocol.acl/name
value*/value
descriptionACL for
http://hive.apache.org/releases.html#21+June%2C+2011%3A+release+0.7.1+available
21 June, 2011: release 0.7.1 available
This release is the latest release of Hive and it works with Hadoop
0.20.1 and 0.20.2
I don't see the method the method thrown in the exception in 0.20.205.
Praveen
On Fri,
1- Does hadoop automatically use the content of the files written by
reducers?
No. If Job1 and Job2 are run in sequence, then the o/p of Job1 can be i/p
to Job2. This has to be done programatically.
2-Are these files (files written by reducers) discarded? If so, when and
how?
No, if the o/p of
Changing the VM settings won't help.
Change the value of fs.default.name to hdfs://106.77.211.187:9000 from
hdfs://localhost:9000 in core-site.xml for both the client and the
NameNode. Replace the IP address with the IP address of the node on which
the NameNode is running or with the hostname.
.
-Joey
On Thu, Dec 29, 2011 at 9:41 AM, Praveen Sripati
praveensrip...@gmail.com wrote:
Hi,
The release notes for 0.22
(
http://hadoop.apache.org/common/releases.html#10+December%2C+2011%3A+release+0.22.0+available
)
it says
The following features are not supported in Hadoop
Hi,
The release notes for 0.22 (
http://hadoop.apache.org/common/releases.html#10+December%2C+2011%3A+release+0.22.0+available)
it says
The following features are not supported in Hadoop 0.22.0.
Security.
Latest optimizations of the MapReduce framework introduced in the
Hadoop
Check the `mapreduce.job.reduce.slowstart.completedmaps` parameter. The
reducers cannot start processing the data from the mappers until the all
the map tasks are complete, but the reducers can start fetching the data
from the nodes on which the map tasks have completed.
Praveen
On Thu, Dec 29,
Check this article from Cloudera on different ways of distributing a jar
file to the job.
http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
Praveen
On Wed, Dec 28, 2011 at 5:40 AM, Eyal Golan egola...@gmail.com wrote:
Hello,
Another newbie
Bing,
FYI ... here are some applications ported to YARN.
http://wiki.apache.org/hadoop/PoweredByYarn
Praveen
On Tue, Dec 27, 2011 at 5:27 AM, Mahadev Konar maha...@hortonworks.comwrote:
Hi Bing,
These links should give you more info:
At the minimum you need to specify the location of the namenode and the
jobtracker in the configuration files for all the nodes and the client,
rest of the properties are defaulted. Also, based on the # of data nodes
you also need to specify the hdfs replication factor.
Praveen
On Sun, Dec 25,
The resolution of the JIRA says unresolved, so it's not yet in any of the
release. Best bet is to download the patch attached with the JIRA and see
the code changes if interested.
Regards,
Praveen
On Wed, Dec 7, 2011 at 8:06 PM, arun k arunk...@gmail.com wrote:
Hi guys !
In which Hadoop
Robert,
I have made the above thing work.
Any plans to make it into the Hadoop framework. There had been similar
queries about it in other forums also. Need any help testing/documenting or
anything, please let me know.
Regards,
Praveen
On Sat, Dec 3, 2011 at 2:34 AM, Robert Evans
Also, checkout Ambari (http://incubator.apache.org/ambari/) which is still
in the Incubator status. How does Ambari and Puppet compare?
Regards,
Praveen
On Tue, Dec 6, 2011 at 1:00 PM, alo alt wget.n...@googlemail.com wrote:
Hi,
to deploy software I suggest pulp:
MultipleInputs take multiple Path (files) and not DB as input. As mentioned
earlier export tables into HDFS either using Sqoop or native DB export tool
and then do the processing. Sqoop is configured to use native DB export
tool whenever possible.
Regards,
Praveen
On Tue, Dec 6, 2011 at 3:44 AM,
If the requirement is for real time data processing, using Flume
will not suffice as there is a time lag between the collection of files
by Flume and processing done by Hadoop. Consider frameworks like S4,
Storm (from Twitter), HStreaming etc which suits realtime processing.
Regards,
Praveen
On
Also check WebHDFS (1). I think both Hoop and WebHDFS are not into Hadoop
yet. Check the HDFS-2178 and HDFS-2316 JIRA for the status.
(1) - http://hortonworks.com/webhdfs-%E2%80%93-http-rest-access-to-hdfs/
Regards,
Praveen
On Tue, Dec 6, 2011 at 4:39 PM, alo alt wget.n...@googlemail.com wrote:
Mat,
There is no need to know the input data which caused the task and finally
the job to fail.
Set the 'mapreduce.map.failures.maxpercent` and
'mapreduce.reduce.failures.maxpercent' to the failure tolerance for the job
to complete irrespective of some task failures.
Again, this is one of the
Matt,
I could not find the properties in the documentation, so I mentioned this
feature as hidden. As Harsh mentioned there is an API.
There was a blog entry on '
Automatically Documenting Apache Hadoop Configuration' from Cloudera. It
would be great if it is contributed to Apache and made part
Arun,
I want to control the split placements.
InputSplits are logical and part of the input data, there is nothing to do
with placement of the InputSplits. InputSplits are calculated on a client
by the InputFormat class when a job is submitted and the InputSplit
metadata data is put in HDFS to
Hi,
Ran a job using new MR API in stand alone mode and 0.21. Both,
Job#getFinishTime and Job#getStartTime are returning 0. Not sure, if this
is a bug.
Thanks,
Praveen
On Sat, Dec 3, 2011 at 6:14 AM, Raj V rajv...@yahoo.com wrote:
As Harsh said, I don't think there is a simple way to way to
Hi,
3. Is any kind of encryption is handled in hadoop at the time of storing
the files in HDFS.
You could define a compression codec that does the encryption. Check the
below thread for more details.
http://www.mail-archive.com/common-user@hadoop.apache.org/msg06229.html
Thanks,
Praveen
On
Dan,
It is a known bug (https://issues.apache.org/jira/browse/MAPREDUCE-1888)
which has been identified in 0.21.0 release. Which Hadoop release are you
using?
Thanks,
Praveen
On Thu, Nov 3, 2011 at 10:22 AM, Dan Young danoyo...@gmail.com wrote:
I'm a total newbie @ Hadoop and and trying to
Hi,
What is the difference between specifying the jar file using JobConf API and
the 'hadoop jar' command?
JobConf conf = new JobConf(getConf(), getClass());
bin/hadoop jar /home/praveensripati/Hadoop/MaxTemperature/MaxTemperature.jar
MaxTemperature /user/praveensripati/input
inputs to your map when the
mapper/recordreader finds the needle in the haystack.
Arun
Sent from my iPhone
On Sep 30, 2011, at 8:39 PM, Praveen Sripati praveensrip...@gmail.com
wrote:
Hi,
Is there a way to stop an entire job when a certain condition is met in the
map/reduce function? Like
Hi,
Normally the Hadoop framework calls the map()/reduce() for each record in
the input split. I read in the 'Hadoop : The Definitive Guide' that that
data can be pulled using the new MR API.
What is the new API for pulling the data in the map()/reduce() or is there a
sample code?
Thanks,
** **
*From:* Praveen Sripati [mailto:praveensrip...@gmail.com]
*Sent:* Saturday, September 24, 2011 8:43 AM
*To:* mapreduce-user@hadoop.apache.org
*Subject:* How to pull data in the Map/Reduce functions?
** **
Hi,
Normally the Hadoop framework calls the map()/reduce() for each record
Hi,
What are the features available in the Fully-Distributed Mode and the
Pseudo-Distributed Mode that are not available in the Local (Standalone)
Mode? Local (Stanndalone) Mode is very fast and I am able get in run in
Eclipse also.
Thanks,
Praveen
PM, Harsh J ha...@cloudera.com wrote:
Hello Praveen,
Is your question from a test-case perspective?
Cause otherwise is it not clear what you gain in 'Distributed' vs.
'Standalone'?
On Fri, Sep 23, 2011 at 12:15 PM, Praveen Sripati
praveensrip...@gmail.com wrote:
Hi,
What
Hi,
Lets assume that there are two jobs J1 (100 map tasks) and J2 (200 map
tasks) and the cluster has a capacity of 150 map tasks (15 nodes with 10 map
tasks per node) and Hadoop is using the default FIFO scheduler. If I submit
first J1 and then J2, will the jobs run in parallel or the job J1 has
of filtering, so there
isn't too much intermediate data.
-Joey
On Thu, Sep 22, 2011 at 6:38 AM, Praveen Sripati
praveensrip...@gmail.com wrote:
Joey,
Thanks for the response.
'mapreduce.job.reduce.slowstart.completedmaps' is default set to 0.05 and
says 'Fraction of the number of maps
Hi,
I have the following configuration - Ubuntu 11.04 as Guest and Host using
VirtualBox and trying to run Hadoop 0.21.0. The host is acting as
namenode/data node/job tracker/task tracker and the guest is acting as a
data node/task tracker.
Every thing works fine in a 'Bridged Adapter' mode, but
Mohit,
Hadoop: The Definitive Guide (Chapter 3 - Hadoop I/O) has a section on
SequenceFile and is worth reading.
http://oreilly.com/catalog/9780596521981
Thanks,
Praveen
On Thu, Sep 1, 2011 at 9:15 PM, Owen O'Malley o...@hortonworks.com wrote:
On Thu, Sep 1, 2011 at 8:37 AM, Mohit Anchlia
Hi,
There are tons of parameters for mapreduce. How to know if a property is a
client or serve side property?
Thanks,
Praveen
On Sun, Aug 28, 2011 at 4:53 AM, Aaron T. Myers a...@cloudera.com wrote:
Hey Ben,
I just filed this JIRA to add this feature:
Hi,
I followed the below instructions to compile the MRv2 code.
http://svn.apache.org/repos/asf/hadoop/common/branches/MR-279/mapreduce/INSTALL
I start the resourcemanager and then the nodemanager and see the following
error in the yarn-praveensripati-nodemanager-master.log file.
2011-07-21
Hi,
I have extracted the hadoop-0.20.2, hadoop-0.20.203.0 and hadoop-0.21.0
files.
In the hadoop-0.21.0 folder the hadoop-hdfs-0.21.0.jar,
hadoop-mapred-0.21.0.jar and the hadoop-common-0.21.0.jar files are there.
But in the hadoop-0.20.2 and the hadoop-0.20.203.0 releases the same files
are
Hi,
I am trying to run Hadoop from Eclipse using the Eclipse Hadoop Plugin and
stuck with the following problem.
First copied the hadoop-0.21.0-eclipse-plugin.jar to the Eclipse Plugin
folder, started eclipse and switched to the Map/Reduce perspective. In the
Map/Reduce Locations View when I try
Hi,
I have extracted the hadoop-0.20.2, hadoop-0.20.203.0 and hadoop-0.21.0
files.
In the hadoop-0.21.0 folder the hadoop-hdfs-0.21.0.jar,
hadoop-mapred-0.21.0.jar and the hadoop-common-0.21.0.jar files are there.
But in the hadoop-0.20.2 and the hadoop-0.20.203.0 releases the same files
are
Hi,
I have extracted the hadoop-0.20.2, hadoop-0.20.203.0 and hadoop-0.21.0
files.
In the hadoop-0.21.0 folder the hadoop-hdfs-0.21.0.jar,
hadoop-mapred-0.21.0.jar and the hadoop-common-0.21.0.jar files are there.
But in the hadoop-0.20.2 and the hadoop-0.20.203.0 releases the same files
are
Hi,
The MapReduce tutorial specifies that
The Hadoop Map/Reduce framework spawns one map task for each
InputSplit generated by the InputFormat for the job.
But, the mapred.map.tasks definition is
The default number of map tasks per job. Ignored when
mapred.job.tracker is local.
So, is
63 matches
Mail list logo