You should: - definately upgrade to 1.1 (1.2 is on the way), and - subscribe to the Nutch mailing list for Nutch specific questions.
On Tuesday 07 September 2010 10:36:58 Yavuz Selim YILMAZ wrote: > In fact, I used nutch 0.9 version, but thinking of passing the new version. > > If anybody did something like that, ? want to learn their experience. > > If indexing an xml file, there are specific fields and all of them are > dependent among them, so duplicates don't happen. > > I want to extract specific fields from the "content" field. Doing such > extraction, new fileds should be indexed as well, then comes me that, > content indexed twice for every new field. > > By the way, any details about how to get new fields from the content will > be helpful. > -- > > Yavuz Selim YILMAZ > > > 2010/9/7 Markus Jelsma <markus.jel...@buyways.nl> > > > Depends on your version of Nutch. At least trunk and 1.1 obey the > > solrmapping.xml file in Nutch' configuration directory. I'd suggest you > > start > > with that mapping file and the Solr schema.xml file shipped with Nutch as > > it > > exactly matches with the mapping file. > > > > Just restart Solr with the new schema (or you change the mapping), crawl, > > fetch, parse and update your DB's and then push the index from Nutch to > > your > > Solr instance. > > > > On Tuesday 07 September 2010 10:00:47 Yavuz Selim YILMAZ wrote: > > > I tried to combine nutch and solr, want to ask somethig. > > > > > > After crawling, nutch has certain fields such as; content, tstamp, > > > title. > > > > > > How can I map "content" field after crawling ? Do I have change the > > > > lucene > > > > > code (such as add extra field)? > > > > > > Or overcome in solr stage? > > > > > > Any suggestion? > > > > > > Thx. > > > -- > > > > > > Yavuz Selim YILMAZ > > > > Markus Jelsma - Technisch Architect - Buyways BV > > http://www.linkedin.com/in/markus17 > > 050-8536620 / 06-50258350 > Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350