http://issues.apache.org/jira/browse/NUTCH-442
Haven't used Nutch. Can the Nutch-generated index be reverse-engineered into
a Solr schema? In that case, you can just copy the Lucene index files away
from Nutch and run them under Solr.
Thanks Lance! I have no idea whether the Nuth-generated index could be
converted to Solr schema. I wonder what people are using this NUTCH-442 for
(http://issues.apache.org/jira/browse/NUTCH-442).
So what crawler do you use to generate index for Solr? Thanks a lot!!
On Fri, Jan 9, 2009 at 8:04
I don't know about the Nutch format - Solr schema idea either. The
NUTCH-442 system uses Solr for both indexing and searching, and uses Nutch
for only crawling.
At my last job we had a custom scripting system that crawled the front page
of over 5000 sites. Each site had a configured script. Yes,
I would like to build a search engine that indexes online videos from such
websites as metacafe, youtube, etc. I want to use the best open-source Solr
as the indexing tool with Nutch as web crawler. However, I have difficulties
integrating these two open source products. So I am seeking for help