Hi,
The stack trace mentions that it is getting an access denied. Could
you check the permissions of the directory /folder/timezone ? Also,
are you using the local job tracker, and not a cluster ?
In general, please ensure your configuration is pointing to the right
cluster where the job needs
Hi,
Do both input files contain data that needs to be processed by the
mapper in the same fashion ? In which case, you could just put the
input files under a directory in HDFS and provide that as input. The
-input option does accept a directory as argument.
Otherwise, can you please explain a
Hi,
The error is talking about hadoop configuration. So probably you need to
put the hadoop core jar in the lib folder. That said, there might be other
dependencies you might need as well. But you can try it out once.
Thanks
hemanth
On Thu, Aug 30, 2012 at 3:53 PM, Visioner Sadak
The mapred.local.dir is local directories on the file system of slave
nodes. In pseudo distributed mode, this would be your own machine. If
you've specified any configuration for it, it should be in your
mapred-site.xml. If not, it defaults to value of
hadoop.tmp.dir/mapred/local. The default
Hi,
The number of maps specified to any map reduce program (including
those part of MRBench) is generally only a hint, and the actual number
of maps will be influenced in typical cases by the amount of data
being processed. You can take a look at this wiki link to understand
more:
Not exactly what you may want - but could you try using a HTTP client
in Java ? Some of them have the ability to automatically follow
redirects, manage cookies etc.
Thanks
hemanth
On Thu, Dec 9, 2010 at 4:35 PM, edward choi mp2...@gmail.com wrote:
Excuse me for asking a general Java question
Hi,
On Sat, Dec 4, 2010 at 4:50 AM, yogeshv yogeshv.i...@gmail.com wrote:
Dear all,
Which file in the hadoop svn processes/receives the hadoop command line
arguments.?
While execution for ex: hadoop jar jar_file_path package_namespace
inputfolderpath outputfolderpath.
'hadoop' in the
Hi,
Changing the parameter for a specific job works better for me.
But I was asking in general in which configuration file(s) should I change
the value of the parameters.
For parameters in hdfs-site.xml, I should changes the configuration file in
each machine. But for parameters in
Amandeep,
On Fri, Nov 5, 2010 at 11:54 PM, Amandeep Khurana ama...@gmail.com wrote:
On Fri, Nov 5, 2010 at 2:00 AM, Hemanth Yamijala yhema...@gmail.com wrote:
Hi,
On Fri, Nov 5, 2010 at 2:23 PM, Amandeep Khurana ama...@gmail.com wrote:
Right. I meant I'm not using fair or capacity
Amadeep,
Which scheduler are you using ?
Thanks
hemanth
On Tue, Nov 2, 2010 at 2:44 AM, Amandeep Khurana ama...@gmail.com wrote:
How are the following configs supposed to be used?
mapred.cluster.map.memory.mb
mapred.cluster.reduce.memory.mb
mapred.cluster.max.map.memory.mb
of the parameters are different, though you
can see the correspondence with similar variables in Hadoop 0.20.
Thanks
Hemanth
-Amandeep
On Fri, Nov 5, 2010 at 12:21 AM, Hemanth Yamijala yhema...@gmail.comwrote:
Amadeep,
Which scheduler are you using ?
Thanks
hemanth
On Tue, Nov 2, 2010
and
the task trackers. Then any submission by the job would not override
the settings.
Thanks
Hemanth
-Amandeep
On Nov 5, 2010, at 1:43 AM, Hemanth Yamijala yhema...@gmail.com wrote:
Hi,
I'm not using any scheduler.. Dont have multiple jobs running at the same
time on the cluster
Hi,
On Tue, Oct 26, 2010 at 8:14 PM, siddharth raghuvanshi
track009.siddha...@gmail.com wrote:
Hi,
While running Terrior on Hadoop, I am getting the following error again
again, can someone please point out where the problem is?
attempt_201010252225_0001_m_09_2: WARN - Error running
Hi,
On Wed, Oct 27, 2010 at 2:19 AM, Bibek Paudel eternalyo...@gmail.com wrote:
[Apologies for cross-posting]
HI all,
I am rewriting a hadoop java code for the new (0.20.2) API- the code
was originally written for versions = 0.19.
1. What is the equivalent of the getCounter() method ? For
Hi,
On Thu, Oct 28, 2010 at 5:11 PM, Adarsh Sharma adarsh.sha...@orkash.com wrote:
Dear all,
I am listing all the HDFS delails through -fs shell. I know the superuser
owns the privileges to list files. But know I want to grant all read and
write privileges to two new users (for e. g Tom and
Hi,
On Sat, Oct 23, 2010 at 1:44 AM, Burhan Uddin burhan...@gmail.com wrote:
Hello,
I am a beginner with hadoop framework. I am trying create a distributed
crawling application. I have googled a lot. but the resources are too low.
Can anyone please help me on the following topics.
I suppose
Hi,
You mentioned you'd like to configure different memory settings for
the process depending on which nodes the tasks run on. Which process
are you referring to here - the Hadoop daemons, or your map/reduce
program ?
An alternative approach could be to see if you can get only those
nodes in
Hi,
On Mon, Sep 6, 2010 at 9:49 AM, Rita Liu crystaldol...@gmail.com wrote:
Hi! :)
I add some Log4j loggers into the mapper, the reducer, and the main method
of WordCount.java. However, after I run this application on the cluster, I
couldn't find any of my log messages from WordCount in any
Hi,
The optimization of one Hadoop job I'm running would benefit from knowing
the
maximum number of map slots in the Hadoop cluster.
This number can be obtained (if my understanding is correct) by:
* parsing the mapred-site.xml file to get
the mapred.tasktracker.map.tasks.maximum value
Hi,
On Mon, Sep 6, 2010 at 1:47 AM, Neil Ghosh neil.gh...@gmail.com wrote:
Hi,
I am trying to sort a list of numbers (one per line) using hadoop
mapreduce.
Kindly suggest any reference and code.
How do I implement custom input format and recordreader so that both key and
value are the
Hi,
On Mon, Aug 30, 2010 at 8:19 AM, Gang Luo lgpub...@yahoo.com.cn wrote:
Hi all,
I am trying to configure and start a hadoop cluster on EC2. I got some
problems
here.
1. Can I share hadoop code and its configuration across nodes? Say I have a
distributed file system running in the
Hi,
Can you please confirm if you've set JAVA_HOME in
conf-dir/hadoop-env.sh on all the nodes ?
Thanks
Hemanth
On Tue, Aug 31, 2010 at 6:21 AM, Mohit Anchlia mohitanch...@gmail.com wrote:
Hi,
I am running some basic setup and test to know about hadoop. When I
try to start nodes I get this
Hi,
On Sun, Aug 29, 2010 at 10:14 PM, Gang Luo lgpub...@yahoo.com.cn wrote:
HI all,
I am setting a hadoop cluster where I have to specify the local directory for
temp files/logs, etc. Should I allow everybody have the write permission to
these directories? Who actually does the write
Hmm. Without the / in the property tag, isn't the file malformed XML ?
I am pretty sure Hadoop complains in such cases ?
On Wed, Aug 25, 2010 at 4:44 AM, cliff palmer palmercl...@gmail.com wrote:
Thanks Allen - that has resolved the problem. Good catch!
Cliff
On Tue, Aug 24, 2010 at 3:05
Mark,
On Wed, Aug 18, 2010 at 10:59 PM, Mark static.void@gmail.com wrote:
What is the preferred way of managing multiple configurations.. ie
development, production etc.
Is there someway I can tell hadoop to use a separate conf directory other
than ${hadoop_home}/conf? I think I've
. That in turn is because
JobConf is still a preferred way of setting parameters in the Hadoop
0.20 major release. Later versions of the documentation will hopefully
correct this.
Thanks
hemanth
On Thu, Aug 12, 2010 at 10:16 PM, Hemanth Yamijala yhema...@gmail.com
wrote:
Hi,
I recently
Hi,
Hi, Hemanth. Thinks for your reply!
I tried your recommendation, absolute path, it worked, I was able to run the
jobs successfully. Thank you!
I was wondering why hadoop.tmp.dir ( or mapred.local.dir ? ) with relative
path didn't work.
I am not entirely sure, but when the daemon is
Hi,
1. I login through SSH without password from master and slaves, it's all
right :-)
2.
property
namehadoop.tmp.dir/name
valuetmp/value
/property
In fact, 'tmp' is what I want :-)
$HADOOP_HOME
+ tmp
+ dfs
Hi,
On Thu, Aug 12, 2010 at 10:31 AM, Hemanth Yamijala yhema...@gmail.com wrote:
Hi,
On Thu, Aug 12, 2010 at 3:35 AM, Bobby Dennett
bdennett+softw...@gmail.com wrote:
From what I've read/seen, it appears that, if not the default
scheduler, most installations are using Hadoop's Fair
. This is *not* to be used by client code, and is not
guaranteed to work. In the latter versions of Hadoop (0.21 and trunk),
these methods have been deprecated in the public API and will be
removed altogether.
Thanks
hemanth
Thanks,
-Gang
- 原始邮件
发件人: Hemanth Yamijala yhema...@gmail.com
收件人
Hi,
It would also be worthwhile to look at the Tool interface
(http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Tool),
which is used by example programs in the MapReduce examples as well.
This would allow any arguments to be passed using the
-Dvar.name=var.value convention on
Hi,
On Tue, Aug 3, 2010 at 9:42 AM, saurabhsuman8989
saurabhsuman...@gmail.com wrote:
By By 'tasks' i mean different tasks under one job. When a Job is distributed
in different tasks , can i add prioroty to those tasks.
It would interesting to know why you want to do this. Can you please
Hi,
Thanks Hemanth. Is there any way to invalidate the reuse and ask Hadoop to
resent exactly the same files to cache for every job?
I may be able to answer this better if I understand the use case. If
you need the same files for every job, why would you need to send them
afresh each time ? If
Hi,
Actually I enabled all level logs. But I didn't realize to check logs in .out
files and only looked at .log file and didn't see any error msgs. now I
opened the .out file and saw the following logged exception:
Exception in thread IPC Server handler 5 on 50002
Hi,
if I use distributed cache to send some files to all the nodes in one MR job,
can I reuse these cached files locally in my next job, or will hadoop re-sent
these files again?
Cache files are reused across Jobs. From trunk onwards, they will be
restricted to be reused across jobs of the
Hi,
Is there a list of configuration parameters that can be set per job.
I'm almost certain there's no list that documents per-job settable
parameters that well. From 0.21 onwards, I think a convention adopted
is to name all job-related or task-related parameters to include 'job'
or 'map' or
Hi,
I'd like to run a Hadoop (0.20.2) job
from within another application, using ToolRunner.
One class of this other application implements the Tool interface.
The implemented run() method:
* constructs a Job()
* sets the input/output/mapper/reducer
* sets the jar file by calling
Hi,
I am trying to use the hadoop's datajoin for joining two relation. According
to
the Readme file of datajoin, it gives the following syntax:
$HADOOP_HOME/bin/hadoop jar hadoop-datajoin-examples.jar
org.apache.hadoop.contrib.utils.join.DataJoinJob datajoin/input
datajoin/output
Text
John,
Can you please redirect this to pig-u...@hadoop.apache.org ? You're
more likely to get good responses there.
Thanks
hemanth
On Thu, Jul 8, 2010 at 7:01 AM, John Seer pulsph...@yahoo.com wrote:
Hello, Is there any way to share shema file in pig for the same table between
projects?
--
Edward,
Overall, I think the consideration should be about how much load do
you expect to support on your cluster. For HDFS, there's a good amount
of information about how much RAM is required to support a certain
amount of data stored in DFS; something similar can be found for
Map/Reduce as
Alex,
I don't think this is what I am looking for. Essential, I wish to run both
mapper as well as reducer. But at the same time, i wish to make sure that
the temp files that are used between mappers and reducers are of my choice.
Here, the choice means that I can specify the files in HDFS
Michael,
Configuration is not reloaded for daemons. There is currently no way
to refresh configuration once the cluster is started. Some specific
aspects - like queue configuration, blacklist nodes can be reloaded
based on commands like hadoop admin refreshQueues or some such.
Thanks
Hemanth
On
Hi,
I am running a mapreduce job on my hadoop cluster.
I am running a 10 gigabytes data and one tiny failed task crashes the whole
operation.
I am up to 98% complete and throwing away all the finished data seems just
like an awful waste.
I'd like to save the finished data and run again
Michael,
In addition to default FIFO scheduler, there are fair scheduler and capacity
scheduler. In some sense, fair scheduler can be considered a user-based
scheduling while capacity scheduler does a queue-based scheduling. Is there
or will there be a hybrid scheduler that combines the
Shashank,
Hi,
Setup Info:
I have 2 node hadoop (20.2) cluster on Linux boxes.
HW info: 16 CPU (Hyperthreaded)
RAM: 32 GB
I am trying to configure capacity scheduling. I want to use memory
management provided by capacity scheduler. But I am facing few issues.
I have added
Hi,
I've been getting the following error when trying to run a very simple
MapReduce job.
Map finishes without problem, but error occurs as soon as it enters
Reduce phase.
10/06/24 18:41:00 INFO mapred.JobClient: Task Id :
attempt_201006241812_0001_r_00_0, Status : FAILED
Shuffle
23, 2010 at 10:52 AM, Hemanth Yamijala yhema...@gmail.comwrote:
Pierre,
I have a program that generates the data that's supposed to be treated by
hadoop.
It's a java program that should write right on hdfs.
So as a test, I do this:
Configuration config = new
Vidhya,
Hi
This looks like a trivial problem but would be glad if someone can help..
I have been trying to run a m-r job on my cluster. I had modified my configs
(primarily reduced the heap sizes for the task tracker and the data nodes)
and restarted my hadoop cluster and the job won't
to point to this path.
On Wed, Jun 23, 2010 at 10:52 AM, Hemanth Yamijala yhema...@gmail.comwrote:
Pierre,
I have a program that generates the data that's supposed to be treated by
hadoop.
It's a java program that should write right on hdfs.
So as a test, I do
There was also https://issues.apache.org/jira/browse/MAPREDUCE-1316
whose cause hit clusters at Yahoo! very badly last year. The situation
was particularly noticeable in the face of lots of jobs with failed
tasks and a specific fix that enabled OutOfBand heartbeats. The latter
(i.e. the OOB
Felix,
I'm using the new Job class:
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/Job.html
There is a way to set the number of reduce tasks:
setNumReduceTasks(int tasks)
However, I don't see how to set the number of MAP tasks?
I tried to set it through
Ted,
When the user calling FileSystem.copyFromLocalFile() doesn't have permission
to write to certain hdfs path:
Thread [main] (Suspended (exception AccessControlException))
DFSClient.mkdirs(String, FsPermission) line: 905
DistributedFileSystem.mkdirs(Path, FsPermission) line: 262
Edward,
If it's an option to copy the libraries to a fixed location on all the
cluster nodes, you could do that and configure them in the library
path via mapred.child.java.opts. Please look at http://bit.ly/ab93Z8
(MapReduce tutorial on Hadoop site) to see how to use this config
option for
Greg,
Does anybody know whether or not speculative execution works with Hadoop
streaming?
If so, I have a script that does not appear to ever launch redundant mappers
for the slow performers. This may be due to the fact that each mapper
quickly reports (inaccurately) that it is 100%
Erik,
I've been unable to resolve this problem on my own so I've decided to ask
for help. I've pasted the logs I have for the DataNode on of the slave
nodes. The logs for TaskTracker are essentially the same (i.e. same
exception causing a shutdown).
Any suggestions or hints as to what
Keith,
On Sat, May 22, 2010 at 5:01 AM, Keith Wiley kwi...@keithwiley.com wrote:
On May 21, 2010, at 16:07 , Mikhail Yakshin wrote:
On Fri, May 21, 2010 at 11:09 PM, Keith Wiley wrote:
My Java mapper hands its processing off to C++ through JNI. On the C++
side I need to access a file. I
Jim,
I have two machines, one is Windows XP and another one is Widows Vista. I
did the same thing on two machines. Hadoop Eclipse Plugin works fine in
Windows XP. But I got an error when I run it in Windows Vista.
I copied hadoop-0.20.2-eclipse-plugin into Eclipse/plugins folder and
Song,
I guess you are very close to my point. I mean whether we can find a way
to set the qsub parameter ppn?
From what I could see in the HOD code, it appears you cannot override
the ppn value with HOD. You could look at
src/contrib/hod/hodlib/NodePools/torque.py, and specifically the
Song,
I know it is the way to set the capacity of each node, however, I want to
know, how can we make Torque manager that we will run more than 1 mapred
tasks on each machine. Because if we dont do this, torque will assign other
cores on this machine to other tasks, which may cause a
Song,
HOD is good, and can manage a large virtual cluster on a huge physical
cluster. but the problem is, it doesnt apply more than one core for each
machine, and I have already recieved complaint from our admin!
I assume what you want is the Map/Reduce cluster that is started by
HOD to
Manish,
The pre-emption code in capacity scheduler was found to require a good
relook and due to the inherent complexity of the problem is likely to
have issues of the type you have noticed. We have decided to relook at
the pre-emption code from scratch and to this effect removed it from the
Craig,
Hello,
We have two HOD questions:
(1) For our current Torque PBS setup, the number of nodes requested by
HOD (-l nodes=X) corresponds to the number of CPUs allocated, however
these nodes can be spread across various partially or empty nodes.
Unfortunately, HOD does not appear to
executes the jobtracker on the first node always, which
also seems
useful to me. It will be nice if you can still try HOD and see if it
makes your life
simpler in any way. :-)
Sorry for my english :-P
Regards
2008/9/2 Hemanth Yamijala [EMAIL PROTECTED]
Allen Wittenauer wrote:
On 8
executes the jobtracker on the first node always, which
also seems
useful to me. It will be nice if you can still try HOD and see if it
makes your life
simpler in any way. :-)
Sorry for my english :-P
Regards
2008/9/2 Hemanth Yamijala [EMAIL PROTECTED]
Allen Wittenauer wrote:
On 8
Allen Wittenauer wrote:
On 8/18/08 11:33 AM, Filippo Spiga [EMAIL PROTECTED] wrote:
Well but I haven't understand how I should configurate HOD to work in this
manner.
For HDFS I folllow this sequence of steps
- conf/master contain only master node of my cluster
- conf/slaves contain all
Jiaqi,
Hi,
I have a question about using HoD and the locality of the assigned
TaskTrackers to the data.
Suppose I have a long-running HDFS installation with
TaskTrackers/JobTracker nodes dynamically allocated by HoD, and I
uploaded my data to HDFS prior to running my job/allocating nodes
using
Luca,
Luca wrote:
Hello everyone,
I wonder what is the meaning of hodring.log-destination-uri versus
hodring.log-dir. I'd like to collect MapReduce UI logs after a job has
been run and the only attribute seems to be hod.hadoop-ui-log-dir, in
the hod section.
log-destination-uri is a
Luca,
#!/bin/bash
hadoop --config /home/luca/hod-test jar
/mnt/scratch/hadoop/hadoop-0.16.0-examples.jar wordcount
file:///mnt/scratch/hadoop/test/part-0 test.hodscript.out
Can you try removing the --config from this script ? While running
scripts, HOD automatically allocates a directory
Mahadev Konar wrote:
Hi Luca,
Can you do a ls -l on /mapredsystem and send the output? According to
permissions for mapreduce the system directories created by jobtracker
should be world writable so permissions should have worked as it is for
hod.
No, it doesn't appear to be working that
Jason Venner wrote:
As you have all read from my previous emails, we are still pretty low
on the HOD learning curve.
That is explained. It is new software, so we will improve over time with
feedback from our users, like you :-)
We are having jobs that terminate and the virtual mapred cluster
Jason Venner wrote:
I have found that HOD writes a series of log files to directories on
the virtual cluster master, if you specify log directories.
The interesting part is figuring out which machine was the virtual
cluster master, if you have a decent sized pool of machines.
Can you explain
Jason Venner wrote:
My hadoop jobs don't start
This is configured to use an existing DFS and to unpack a tarball with
a cut down 0.16.0 config
I have looked in the mom logs on the client machines and am not
getting anything meaningful.
What is your hod command line ? Specifically, how did
Luca wrote:
[hod]
xrs-port-range = 1-11000
http-port-range = 1-11000
the Mapred UI is chosen outside this range.
There's no port range option for Mapred and HDFS sections currently. You
seem to have a use-case for specifying the range within which
101 - 173 of 173 matches
Mail list logo