The number of bytes read can exceed the block size somewhat because each block
rarely starts/ends on a record (e.g. line) boundary. So usually it reads to
read a bit before and/or after the actual block boundary in to correctly read
in all of the records it is supposed to. If you look, it's not
Well, if you think about it, you'll have more/better locality if more nodes
with the same blocks. It gives the scheduler more leeway to find a node that
has a block that hasn't been processed yet. Have you tried it with replication
of 2 or 3 and seen what that does?
--Aaron
---
record, so only 1 that
is different per set) and it is my understanding that they would be grouped
together (without the primary key) if I didn't do anything different.
-Trevor
On Wed, Jun 29, 2011 at 2:07 PM, Aaron Baff wrote:
You probably need to implement a custom comparator that you use a
You probably need to implement a custom comparator that you use as the grouping
comparator that compares the primary key, and then if they are the same
compares the int part of the key.
--Aaron
-
From: Trevor Adams [mai
I believe that this data is removed by the JobTracker approximately an hour
after the Job completes. That's the default timeout, it can be changed, but the
parameter name escapes me at the moment.
--Aaron
-Original Message-
From: Pedro Costa [mailto:psdc1...@gmail.com]
Sent: Wednesday,
You need to use the RunningJob (old API) or Job (new API) object, and use those
to get the Mapper & Reducer statuses. They return it as a double, 0.0 to 1.0.
--Aaron
From: praveen.pe...@nokia.com [mailto:praveen.pe...@nokia.com]
Sent: Monday, May 23, 2011
mapred.jar",[FILE]), but I
couldn't find a file format that works?
Lior
On Wed, May 18, 2011 at 8:18 PM, Aaron Baff wrote:
It's not terribly hard to submit MR Job's. Create a hadoop Configuration
object, and set it's fs.default.name and fs.defaultFS to the Namenode URI, an
idn't know one could do this thanks. I'll give it a try.
On 18 May 2011 10:18, Aaron Baff wrote:
It's not terribly hard to submit MR Job's. Create a hadoop Configuration
object, and set it's fs.default.name and fs.defaultFS to the Namenode URI, and
mapreduce.jobtracker.
It's not terribly hard to submit MR Job's. Create a hadoop Configuration
object, and set it's fs.default.name and fs.defaultFS to the Namenode URI, and
mapreduce.jobtracker.address and mapred.job.tracker to the JobTracker URI. You
can then easily setup and use a Job object (new API), or JobConf
As part of the job submission, once it's submitted, grab the JobID from that
object and print it out on STDOUT or to a file and have your startup script(s)
parse it out from there.
--Aaron
-Original Message-
From: Adam Phelps [mailto:a...@opendns.com]
Sent: Tuesday, May 10, 2011 3:45 PM
Cross post from common-users.
I'm using v0.21.0, with the Old API, and I have a daemon that runs and monitors
MR Jobs, allows us to fetch data from the JobTracker about the MR Job's, etc.
We're using Thrift as the API (so we can do PHP->Java). We're having an issue
where some requests for MR Jo
11 matches
Mail list logo