[Announcement] SearchWorkings.org is live!

2011-09-12 Thread Frank Scholten
Hi all, This is an announcement of the community site SearchWorkings.org [1] SearchWorkings.org offers search professionals a point of contact or comprehensive resource to learn and discuss all the new developments in the world of open source search and related subjects like Mahout and Hadoop. T

Distributed Indexing on MapReduce

2012-03-01 Thread Frank Scholten
Hi all, I am looking into reusing some existing code for distributed indexing to test a Mahout tool I am working on https://issues.apache.org/jira/browse/MAHOUT-944 What I want is to index the Apache Public Mail Archives dataset (200G) via MapReduce on Hadoop. I have been going through the Nutch