On 08/09/2012 01:24 AM, Mike Lyon wrote:
I'm takng a wild guess, but i think they are legitimately trying to
unsubscribe but are clueless and don't RTFM
The scariest part is that these are supposedly people with enough
technical chops to set up and program to a Hadoop cluster.
I suddenly do
On 07/26/2012 09:20 PM, Steve Armstrong wrote:
Do you mean I need to deploy the mahout jars to the lib directory of
the master node? Or all the data nodes? Or is there a way to simply
tell the hadoop job launcher to upload the jars itself?
Every node that runs a Task (mapper or reducer) needs
On 06/18/2012 10:19 AM, Mark Kerzner wrote:
If only reducers could be told to start their work on the first
maps that they see, my processing would begin to show results much earlier,
before all the mappers are done.
The sort/shuffle phase isn't just about ordering the keys, it's about
On 06/18/2012 10:40 AM, Mark Kerzner wrote:
that sounds very interesting, and I may implement such a workflow, but
can I write back to HDFS in the mapper? In the reducer it is a standard
context.write(), but it is a different context.
Both Mapper.Context and Reducer.Context descend from
Another important note: the combiner runs can stack.
Let's say Prashant is right that the default spill number that triggers
the combiner is 3, and that we have a mapper that generates 9 spills.
These spills will generate 3 combiner runs, which meets the threshold
again, and so we get
On 02/27/2012 11:53 AM, W.P. McNeill wrote:
You don't need any virtualization. Mac OS X is Linux and runs Hadoop as is.
Nitpick: OS X is NEXTSTEP based on Mach, which is a different
POSIX-compliant system from Linux.
Actually, I think this is what Oozie is for. It seems to leap out as a
great example of a forked workflow.
hth
On 02/15/2012 02:23 PM, W.P. McNeill wrote:
Say I have two Hadoop jobs, A and B, that can be run in parallel. I have
another job, C, that takes the output of both A and B as input.
On Fri, 13 Jan 2012 13:59:12 -0800 (PST), vvkbtnkr vvkbt...@yahoo.com
wrote:
I am running a hadoop jar and keep getting this error -
java.lang.NoSuchMethodError:
org.codehaus.jackson.JsonParser.getValueAsLong()
Nobody seems to have answered this while I was on vacation, so...
Okay, here's
On Thu, 15 Sep 2011 12:51:48 -0700, Frank Astier fast...@yahoo-inc.com
wrote:
I’m using IntelliJ and the WordCount example in Hadoop (which uses
MiniMRCluster). Is it possible to set an IntelliJ debugger breakpoint
straight into the map function of the mapper? - I’ve tried, but so far,
the
One of my colleagues has noticed this problem for a while, and now it's
biting me. Jobs seem to be failing before every really starting. It seems
to be limited (so far) to running in pseudo-distributed mode, since that's
where he saw the problem and where I'm now seeing it; it hasn't come up on
On Fri, 26 Aug 2011 11:46:42 -0700, Ramya Sunil ra...@hortonworks.com
wrote:
How many tasktrackers do you have? Can you check if your tasktrackers
are
running and the total available map and reduce capacity in your cluster?
In pseudo-distributed there's one tasktracker, which is running, and
On Fri, 26 Aug 2011 12:20:47 -0700, Ramya Sunil ra...@hortonworks.com
wrote:
Can you also post the configuration of the scheduler you are using? You
might also want to check the jobtracker logs. It would help in further
debugging.
Where would I find the scheduler configuration? I haven't
On Mon, 22 Aug 2011 11:01:23 -0700, W.P. McNeill bill...@gmail.com
wrote:
If it is, what is the proper way to make MyJar.jar available to both the
Job
Client and the Task Trackers?
Do you mean the task trackers, or the tasks themselves? What process do
you want to be able to run the code in
On Fri, 5 Aug 2011 08:50:02 +0800 (CST), Daniel,Wu hadoop...@163.com
wrote:
The book also
mentioned the value if mutable, I think the key might also be mutable,
means as we loop each value in iterableNullWritable, the content of
the
key object is reset.
The mutability of the value is one of
On Thu, 4 Aug 2011 14:07:12 +0800 (CST), Daniel,Wu hadoop...@163.com
wrote:
I am using the new API (released is from cloudera). We can see from the
output, for each call of reduce function, 100 records were processed,
but
as the reduce is defined as
reduce(IntPair key, IterableNullWritable
On Wed, 3 Aug 2011 10:35:51 +0800 (CST), Daniel,Wu hadoop...@163.com
wrote:
So the key of a group is determined by the first coming record in the
group, if we have 3 records in a group
1: (1900,35)
2:(1900,34)
3:(1900,33)
if (1900,35) comes in as the first row, then the result key will be
On Tue, 2 Aug 2011 21:49:22 +0800 (CST), Daniel,Wu hadoop...@163.com
wrote:
we usually use something like values.next() to loop every rows in a
specific group, but I didn't see any code to loop the list, at least it
need to get the first row in the list, which is something like
values.get().
On Mon, 1 Aug 2011 12:11:27 -0400, Aquil H. Abdullah
aquil.abdul...@gmail.com wrote:
but it still isn't clear to me how the -libjars option is parsed,
whether
or
not I need to explicitly add it to the classpath inside my run method,
or
where it needs to be placed in the command-line?
IIRC
On Mon, 1 Aug 2011 13:21:27 -0400, Aquil H. Abdullah
aquil.abdul...@gmail.com wrote:
[AA] I am currently invoking my application as follows:
hadoop jar /home/test/hadoop/test.option.demo.jar
test.option.demo.OptionDemo -libjar /home/test/hadoop/lib/mytestlib.jar
I believe the problem might
On Mon, 1 Aug 2011 15:30:49 -0400, Aquil H. Abdullah
aquil.abdul...@gmail.com wrote:
Don't I feel sheepish...
Happens to the best, or so they tell me.
OK, so I've hacked this sample code below, from the ConfigurationPrinter
example in Hadoop: The Definitive Guide. If -libjars had been added
On Thu, 28 Jul 2011 10:05:57 -0400, Kumar, Ranjan
ranjan.kum...@morganstanleysmithbarney.com wrote:
I have a class to define data I am reading from a MySQL database.
According to online tutorials I created a class called MyRecord and
extended it from Writable, DBWritable. While running it with
On Tue, 19 Jul 2011 20:47:31 +0100, Kobina Kwarko
kobina.kwa...@gmail.com
wrote:
Hello,
Please any assistance?? I am using Hadoop for a school project and
managed
to install it on two computers testing with the wordcount example.
However,
after stopping Hadoop and restarting the computers
On Tue, 05 Jul 2011 08:09:16 -0700, Mark static.void@gmail.com
wrote:
Is there anyway I can write out the results of my mapreduce job into 1
local file... ie the opposite of getmerge?
I don't know about writing directly to the local filesystem, but if you
specify a single reducer you
I've got my map/reduce programs implementing Tool, so I can pass various
parameters as configuration properties. I'm currently using an ad-hoc
method of verifying that a program has been passed all the configuration
properties it requires, but I'm wondering if there's a more Hadoopy way
this
On Sun, 26 Jun 2011 17:34:34 -0700, Mark static.void@gmail.com
wrote:
Hello all,
We have a recommendation system that reads in similarity data via a
Spring context.xml as follows:
bean id=similarity
class=org.apache.mahout.cf.taste.impl.similarity.file.FileItemSimilarity
On Tue, 7 Jun 2011 09:41:21 -0300, Juan P. gordoslo...@gmail.com
wrote:
Not 100% clear on what you meant. You are saying I should put the file
into
my HDFS cluster or should I use DistributedCache? If you suggest the
latter,
could you address my original question?
I mean that you can
On Mon, 06 Jun 2011 09:18:45 -0400, dar...@ontrenet.com wrote:
I never understood how hadoop can throttle an inter-rack fiber switch.
Its supposed to operate on the principle of move-the-code to the data
because of the I/O cost of moving the data, right?
But what happens when a reducer on rack
On Mon, 06 Jun 2011 09:26:11 -0400, dar...@ontrenet.com wrote:
I'm not a hadoop jedi, but in that case, wouldn't one of the hadoop
trackers get bottlenecked to resolve those dependencies?
Again, this exposes the oddity of hadoop IMO, it tries to NOT
be I/O bound, but seems its very I/O
On Mon, 06 Jun 2011 09:34:56 -0400, dar...@ontrenet.com wrote:
Yeah, that's a good point.
I wonder though, what the load on the tracker nodes (port et. al) would
be if a inter-rack fiber switch at 10's of GBS' is getting maxed.
Seems to me that if there is that much traffic being mitigate
On Mon, 06 Jun 2011 16:14:14 -0500, Shi Yu sh...@uchicago.edu wrote:
I still don't understand, in a cluster you have a shared directory to
all the nodes, right? Just put the configuration file in that directory
and load it in all the mappers, isn't that simple?
So I still don't understand
On Thu, 02 Jun 2011 17:23:08 +0300, George Kousiouris
gkous...@mail.ntua.gr wrote:
Are there anywhere instructions on how to change from the default ports
of Hadoop and HDFS? My main interest is in default port 8020.
I think this is part of fs.default.name. You would go into core-site.xml
and
31 matches
Mail list logo