Re: Changing from Indexing Filter

2012-04-26 Thread Lewis John Mcgibbney
Hi Jim, On Thu, Apr 26, 2012 at 2:23 PM, Jim Chandler wrote: > I am in the > process of trying to change a plugin from an IndexingFilter to a Parser. Personally I wouldn't do this, I would pick up an existing parser and edit it into another parser! Do you have any specific reasons for doing this

fields foreach document

2012-04-26 Thread Ing. Eyeris Rodriguez Rueda
hello, I'm using nutch with solr and i need to know for each type of document crawled by nutch(pdf,docx,ppt) which are the fields recognized on each document. I know that tika parser is incharged of parsing the documents founds on the crawl process but i need to know for all documents supported

Generator OOM

2012-04-26 Thread Markus Jelsma
Hi, We sometimes see the generator running OOM. This happens because we either have a too high topN value or too many segments to generate. In any case, a very large amount of records is being generated with the same (lowest) score and end up in a single reducer. We limit the generator by dom

Changing from Indexing Filter

2012-04-26 Thread Jim Chandler
Greetings, Nutch, Solr, Lucene and everything else is very new to me. I am in the process of trying to change a plugin from an IndexingFilter to a Parser. I am having difficultying understanding where in the nutch process each one of these is run. I've been searching Google to see if I could fi

Re: Question related to NUCTH 1044 redirected URLS and invalid scores

2012-04-26 Thread Lewis John Mcgibbney
Hi Pravin, I won't have time until the weekend to get around to this. I'll try my best though when the time comes around. On Tue, Apr 24, 2012 at 4:19 PM, Pravin Agrawal wrote: > Hi Lewis, thanks for the reply. Sorry I couldn't get back to you soon as I > was on vacation. > > > > I tried out t