Language plugin tokenizers in Indexer?

2009-06-18 Thread Aaron Binns
by the traditional Lucene indexer and I just overlooked it? Thanks! Aaron -- Aaron Binns Senior Software Engineer, Web Group Internet Archive aa...@archive.org

Re: The Future of Nutch, reactivated

2009-05-19 Thread Aaron Binns
es have some neat features, such as the "did you mean?" suggestions and things. However, the distributed search functionality is pretty rudimentary IMO and I am concerned about reports that it doesn't scale beyond a few million or tens of millions of documents. Although it a

Re: The Future of Nutch, reactivated

2009-05-18 Thread Aaron Binns
drop-in replacement for Nutch. It looks like Solr has some nice features for certain, I'm just not convinced it can scale up to the billion document level. Aaron -- Aaron Binns Senior Software Engineer, Web Group Internet Archive aa...@archive.org

[jira] Created: (NUTCH-708) NutchBean: OOM due to searcher.max.hits and dedup.

2009-03-01 Thread Aaron Binns (JIRA)
Affects Versions: 1.0.0 Environment: Ubuntu Linux, Java 5. Reporter: Aaron Binns When searching an index we built for the National Archives, this one in particular: http://webharvest.gov/collections/congress110th/ We ran into an interesting situation. We were using