Re: Nutch/Solr

Markus Jelsma Tue, 07 Sep 2010 02:52:30 -0700

You should:
- definately upgrade to 1.1 (1.2 is on the way), and
- subscribe to the Nutch mailing list for Nutch specific questions.



On Tuesday 07 September 2010 10:36:58 Yavuz Selim YILMAZ wrote:
> In fact, I used nutch 0.9 version, but thinking of passing the new version.
> 
> If anybody did something like that, ? want to learn their experience.
> 
> If indexing an xml file, there are specific fields and all of them are
> dependent among them, so duplicates don't happen.
> 
> I want to extract specific fields from the "content" field. Doing such
> extraction, new fileds should be indexed as well, then comes me that,
> content indexed twice for every new field.
> 
> By the way, any details about how to get new fields from the content will
>  be helpful.
> --
> 
> Yavuz Selim YILMAZ
> 
> 
> 2010/9/7 Markus Jelsma <markus.jel...@buyways.nl>
> 
> > Depends on your version of Nutch. At least trunk and 1.1 obey the
> > solrmapping.xml file in Nutch' configuration directory. I'd suggest you
> > start
> > with that mapping file and the Solr schema.xml file shipped with Nutch as
> > it
> > exactly matches with the mapping file.
> >
> > Just restart Solr with the new schema (or you change the mapping), crawl,
> > fetch, parse and update your DB's and then push the index from Nutch to
> > your
> > Solr instance.
> >
> > On Tuesday 07 September 2010 10:00:47 Yavuz Selim YILMAZ wrote:
> > > I tried to combine nutch and solr, want to ask somethig.
> > >
> > > After crawling, nutch has certain fields such as; content, tstamp,
> > > title.
> > >
> > > How can I map "content" field after crawling ? Do I have change the
> >
> > lucene
> >
> > > code (such as add extra field)?
> > >
> > > Or overcome in solr stage?
> > >
> > > Any suggestion?
> > >
> > > Thx.
> > > --
> > >
> > > Yavuz Selim YILMAZ
> >
> > Markus Jelsma - Technisch Architect - Buyways BV
> > http://www.linkedin.com/in/markus17
> > 050-8536620 / 06-50258350
> 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: Nutch/Solr

Reply via email to