Are you hitting HADOOP-2771?
-Amareshwari
Sandy wrote:
Hello all,
For the sake of benchmarking, I ran the standard hadoop wordcount example on
an input file using 2, 4, and 8 mappers and reducers for my job.
In other words, I do:
time -p bin/hadoop jar hadoop-0.18.3-examples.jar wordcount -m
I haven't used Eucalyptus, but you could start by trying out the
Hadoop EC2 scripts (http://wiki.apache.org/hadoop/AmazonEC2) with your
Eucalyptus installation.
Cheers,
Tom
On Tue, Mar 3, 2009 at 2:51 PM, falcon164 mujahid...@gmail.com wrote:
I am new to hadoop. I want to run hadoop on
Hi Richa,
Yes there is. Please see http://wiki.apache.org/hadoop/AmazonEC2.
Tom
On Thu, Mar 5, 2009 at 4:13 PM, Richa Khandelwal richa...@gmail.com wrote:
Hi All,
Is there an existing Hadoop AMI for EC2 which had Hadaoop setup on it?
Thanks,
Richa Khandelwal
University Of California,
Yeps,
A good starting read: http://wiki.apache.org/hadoop/AmazonEC2
These are the AMIs:
$ ec2-describe-images -a | grep hadoop
IMAGE ami-245db94dcloudbase-1.1-hadoop-fc64/image.manifest.xml
247610401714available public x86_64 machine
IMAGE ami-791ffb10
Hi All,
I am trying trying to log map reduce jobs in HADOOP_LOG_DIR by setting its
value in hadoop-env.sh. But the directory has no log records when the job
finishes running. I am adding JobConf.setProfileEnabled(true) in my job. Can
anyone point out how to log in hadoop?
Thanks,
Richa
On Thu,
I used three different sample.txt files, and was able to replicate the
error. The first was 1.5MB, the second 66MB, and the last 428MB. I get the
same problem despite what size of input file I use: the running time of
wordcount increases with the number of mappers and reducers specified. If it
is
Hi David,
I don't know if you've seen this already, but this might be of some help:
http://hadoop.apache.org/core/docs/r0.18.3/cluster_setup.html
Near the bottom, there is a section called Real-World Cluster
Configurations with some sample configuration parameters that were used to
run a very
Sandy wrote:
I used three different sample.txt files, and was able to replicate the
error. The first was 1.5MB, the second 66MB, and the last 428MB. I get the
same problem despite what size of input file I use: the running time of
wordcount increases with the number of mappers and reducers
I specified a directory containing my 428MB file split into 8 files. Same
results.
I should summarize my hadoop-site.xml file:
mapred.tasktracker.tasks.maximum = 4
mapred.line.input.format.linespermap = 1
mapred.task.timeout = 0
mapred.min.split.size = 1
mapred.child.java.opts = -Xmx2M
We're using JSON serialization for all our data, but we can't seem to find a
good library. We just discovered that the root cause of out of memory errors
is a leak in the net.sf.json library. Can anyone out there recommend a java
json library that they have actually used successfully within
We've used Jackson(http://jackson.codehaus.org/), which we've found to be easy
to use and faster than any other option. We've also had problems with net.sf
in terms of memory and performance.
You can see a performance comparison here:
I had discovered a memory leak in net.sf.json as well. I filed an issue and
it got fixed in the latest release:
http://sourceforge.net/tracker/?func=detailatid=857928aid=2063201group_id=171425
Have you tried the latest version 2.2.3?
On Thu, Mar 5, 2009 at 9:48 AM, Kevin Peterson
Normally I dislike writing about problems without being able to provide
some more information, but unfortunately in this case I just can't find
anything.
Here is the situation - DFS cluster running Hadoop version 0.19.0. The
cluster is running on multiple servers with practically identical
I assume you have only 2 map and 2 reduce slots per tasktracker -
which totals to 2 maps/reduces for you cluster. This means with more
maps/reduces they are serialized to 2 at a time.
Also, the -m is only a hint to the JobTracker, you might see less/more
than the number of maps you have
Ian Swett wrote:
We've used Jackson(http://jackson.codehaus.org/), which we've found to be easy
to use and faster than any other option.
I also use Jackson and recommend it.
Doug
This is unexpected unless some other process is eating up space.
Couple of things to collect next time (along with log):
- All the contents under datanode-directory/ (especially including
'tmp' and 'current')
- Does 'du' of this directory match with what is reported to NameNode
(shown on
That's what I saw just yesterday on one of the data nodes with this
situation (will confirm also next time it happens):
- Tmp and current were either empty or almost empty last time I checked.
- du on the entire data directory matched exactly with reported used
space in NameNode web UI and it did
Hi All,
Does anyone know how to run map reduce jobs using pipes or batch process map
reduce jobs?
Thanks,
Richa Khandelwal
University Of California,
Santa Cruz.
Ph:425-241-7763
I was trying to control the maximum number of tasks per tasktracker by using
the
mapred.tasktracker.tasks.maximum parameter
I am interpreting your comment to mean that maybe this parameter is
malformed and should read:
mapred.tasktracker.map.tasks.maximum = 8
mapred.tasktracker.map.tasks.maximum
As I metioned above, you should at least try like this:
map2 reduce1
map4 reduce1
map8 reduce1
map4 reduce1
map4 reduce2
map4 reduce4
instead of :
map2 reduce2
map4 reduce4
map8 reduce8
2009/3/6 Sandy snickerdoodl...@gmail.com
I was trying to control the maximum number of tasks per
Just trying to understand this better, are you observing that the task,
which failed with the IOException, not getting marked as killed? If yes,
that does not look right...
Jothi
On 3/6/09 8:12 AM, Saptarshi Guha saptarshi.g...@gmail.com wrote:
Hello,
I have given a case where my mapper
Is your job a streaming job?
If so, Which version of hadoop are you using? what is the configured
value for stream.non.zero.exit.is.failure? Can you see
stream.non.zero.exit.is.failure to true and try again?
Thanks
Amareshwari
Saptarshi Guha wrote:
Hello,
I have given a case where my mapper
Right, there's no sense in freezing your Hadoop version forever :)
But if you're an ops team tasked with keeping a production cluster running
24/7, running on 0.19 (or even more daringly, TRUNK) is not something that I
would consider a Best Practice. Ideally you'll be able to carve out some
spare
Richa,
Since the mappers run independently, you'd have a hard time
determining whether a record in mapper A would be joined by a record
in mapper B. The solution, as it were, would be to do this in two
separate MapReduce passes:
* Take an educated guess at which table is the smaller data set.
*
Try throwing RuntimeException, or any other unchecked exception (e.g., any
descendant classes of RuntimeException)
- Aaron
On Thu, Mar 5, 2009 at 4:24 PM, Saptarshi Guha saptarshi.g...@gmail.comwrote:
hello,
I'm not that comfortable with java, so here is my question. In the
MapReduceBase
Song, you should be able to use 'nice' to reprioritize the MPI task
below that of your Hadoop jobs.
- Aaron
On Thu, Mar 5, 2009 at 8:26 PM, 柳松 lamfeel...@126.com wrote:
Dear all:
I run my hadoop program with another MPI program on the same cluster.
here is the result of top.
PID USER
Hello to all.
I have 2 nodes in cluster - master + slave.
names master1 and slave1 stored in /etc/hosts on both hosts and they
are 100% correct.
conf/masters:
master1
conf/slaves:
master1
slave1
conf/slaves + conf/masters are empty on slave1 node. I tried to fill
them in many ways - it
So, is there currently no solution to my problem?
Should I live with it? Or do we have to have a JIRA for this?
What do you think?
2009/3/4 Nick Cen cenyo...@gmail.com
Thanks, about the Secondary Sort, can you provide some example. What does
the intermediate keys stands for?
Assume I have
28 matches
Mail list logo