The Mahout project has several tools for this class of problem.
http://mahout.apache.org
On Tue, Mar 8, 2011 at 9:07 AM, Chase Bradford wrote:
> How much smaller is the smaller dataset? If you can use the DC and
> precompute bigrams, locations, etc, and hold all the results in memory
> during se
Hi,
I was trying to use rename() in FileSystem.java to mv files through my java
code but found that it has been deprecated from FileSystem.java but not
FsShell.java.
Is there a particular reason for this ?
Should I use FsShell.java's rename() instead or avoid it all together
and implement "fs -mv
If I want to change how keys are sorted prior to the reduce, my
understanding is that I can do this with
JobConf.setOutputKeyComparatorClass(). I am trying to implement a job using
ChainReducer such that I have,
Map 1 | Reduce | Map 2
and I want to use my own comparator for Map 2 so that all keys
El 3/9/2011 11:09 AM, Evert Lammerts escribió:
I didn't mention it but the complete filesystem is reported healthy by fsck.
I'm guessing that the java.io.EOFException indicates a problem caused by the
load of the job.
Any ideas?
It's a very tricky work to debug a MapReduce Job execution
I didn't mention it but the complete filesystem is reported healthy by fsck.
I'm guessing that the java.io.EOFException indicates a problem caused by the
load of the job.
Any ideas?
From: Marcos Ortiz [mlor...@uci.cu]
Sent: Wednesday, March 09, 2011 4:31
El 3/9/2011 6:27 AM, Evert Lammerts escribió:
We see a lot of IOExceptions coming from HDFS during a job that does nothing
but untar 100 files (1 per Mapper, sizes vary between 5GB and 80GB) that are in
HDFS, to HDFS. DataNodes are also showing Exceptions that I think are related.
(See stacktr
We see a lot of IOExceptions coming from HDFS during a job that does nothing
but untar 100 files (1 per Mapper, sizes vary between 5GB and 80GB) that are in
HDFS, to HDFS. DataNodes are also showing Exceptions that I think are related.
(See stacktraces below.)
This job should not be able to ove
Hi,
I'm running hadoop map-reduce in clustering, and I've a Reduce Task
that it remains in the state COMMIT_PENDING, and it doesn't finish.
This is happening because I've made some changes to the Hadoop MR. I'm
trying to solve my problem, but I don't understand what's happens
after the COMMIT_PEND