Re: Dataset comparison and ranking - views

2011-03-09 Thread Lance Norskog
The Mahout project has several tools for this class of problem. http://mahout.apache.org On Tue, Mar 8, 2011 at 9:07 AM, Chase Bradford wrote: > How much smaller is the smaller dataset?  If you can use the DC and > precompute bigrams, locations, etc, and hold all the results in memory > during se

rename() removed from FileSystem.java but not FsShell.java

2011-03-09 Thread Mapred Learn
Hi, I was trying to use rename() in FileSystem.java to mv files through my java code but found that it has been deprecated from FileSystem.java but not FsShell.java. Is there a particular reason for this ? Should I use FsShell.java's rename() instead or avoid it all together and implement "fs -mv

changing key comparator used with chain mappers

2011-03-09 Thread John Sanda
If I want to change how keys are sorted prior to the reduce, my understanding is that I can do this with JobConf.setOutputKeyComparatorClass(). I am trying to implement a job using ChainReducer such that I have, Map 1 | Reduce | Map 2 and I want to use my own comparator for Map 2 so that all keys

Re: Could not obtain block

2011-03-09 Thread Marcos Ortiz
El 3/9/2011 11:09 AM, Evert Lammerts escribió: I didn't mention it but the complete filesystem is reported healthy by fsck. I'm guessing that the java.io.EOFException indicates a problem caused by the load of the job. Any ideas? It's a very tricky work to debug a MapReduce Job execution

RE: Could not obtain block

2011-03-09 Thread Evert Lammerts
I didn't mention it but the complete filesystem is reported healthy by fsck. I'm guessing that the java.io.EOFException indicates a problem caused by the load of the job. Any ideas? From: Marcos Ortiz [mlor...@uci.cu] Sent: Wednesday, March 09, 2011 4:31

Re: Could not obtain block

2011-03-09 Thread Marcos Ortiz
El 3/9/2011 6:27 AM, Evert Lammerts escribió: We see a lot of IOExceptions coming from HDFS during a job that does nothing but untar 100 files (1 per Mapper, sizes vary between 5GB and 80GB) that are in HDFS, to HDFS. DataNodes are also showing Exceptions that I think are related. (See stacktr

Could not obtain block

2011-03-09 Thread Evert Lammerts
We see a lot of IOExceptions coming from HDFS during a job that does nothing but untar 100 files (1 per Mapper, sizes vary between 5GB and 80GB) that are in HDFS, to HDFS. DataNodes are also showing Exceptions that I think are related. (See stacktraces below.) This job should not be able to ove

What happens after COMMIT_PENDING?

2011-03-09 Thread Pedro Costa
Hi, I'm running hadoop map-reduce in clustering, and I've a Reduce Task that it remains in the state COMMIT_PENDING, and it doesn't finish. This is happening because I've made some changes to the Hadoop MR. I'm trying to solve my problem, but I don't understand what's happens after the COMMIT_PEND