Re: Nutch/Solr

Yavuz Selim YILMAZ Tue, 07 Sep 2010 01:37:29 -0700

In fact, I used nutch 0.9 version, but thinking of passing the new version.


If anybody did something like that, ı want to learn their experience.

If indexing an xml file, there are specific fields and all of them are
dependent among them, so duplicates don't happen.

I want to extract specific fields from the "content" field. Doing such
extraction, new fileds should be indexed as well, then comes me that,
content indexed twice for every new field.

By the way, any details about how to get new fields from the content will be
helpful.
--

Yavuz Selim YILMAZ


2010/9/7 Markus Jelsma <markus.jel...@buyways.nl>

> Depends on your version of Nutch. At least trunk and 1.1 obey the
> solrmapping.xml file in Nutch' configuration directory. I'd suggest you
> start
> with that mapping file and the Solr schema.xml file shipped with Nutch as
> it
> exactly matches with the mapping file.
>
> Just restart Solr with the new schema (or you change the mapping), crawl,
> fetch, parse and update your DB's and then push the index from Nutch to
> your
> Solr instance.
>
>
> On Tuesday 07 September 2010 10:00:47 Yavuz Selim YILMAZ wrote:
> > I tried to combine nutch and solr, want to ask somethig.
> >
> > After crawling, nutch has certain fields such as; content, tstamp, title.
> >
> > How can I map "content" field after crawling ? Do I have change the
> lucene
> > code (such as add extra field)?
> >
> > Or overcome in solr stage?
> >
> > Any suggestion?
> >
> > Thx.
> > --
> >
> > Yavuz Selim YILMAZ
> >
>
> Markus Jelsma - Technisch Architect - Buyways BV
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>
>

Re: Nutch/Solr

Reply via email to