RE: invalid utf8 chars when indexing or cleaning

2017-09-01 Thread Markus Jelsma
e.org > Subject: Re: invalid utf8 chars when indexing or cleaning > > It sounds like a good suggestion, but I don't know what you mean by "verify > the output Nutch generates and inspect it manually." How do I get a look at > that XML? > > > From: >

Re: invalid utf8 chars when indexing or cleaning

2017-08-31 Thread Michael Coffey
e patch works as intended. Get the XML, pass it through the method and see what it does to the output. -Original message- > From:Jorge Betancourt <betancourt.jo...@gmail.com> > Sent: Tuesday 29th August 2017 21:54 > To: user@nutch.apache.org > Subject: Re: invalid utf

RE: invalid utf8 chars when indexing or cleaning

2017-08-31 Thread Markus Jelsma
the method and see what it does to the output. -Original message- > From:Jorge Betancourt <betancourt.jo...@gmail.com> > Sent: Tuesday 29th August 2017 21:54 > To: user@nutch.apache.org > Subject: Re: invalid utf8 chars when indexing or cleaning > > From the l

Re: invalid utf8 chars when indexing or cleaning

2017-08-29 Thread Jorge Betancourt
From the logs looks like the error is coming from the Solr side, do you mind checking/sharing the logs on your Solr server? Can you pin point which URL is causing the issue? Best Regards, Jorge On Tue, Aug 29, 2017 at 9:25 PM, Michael Coffey wrote: Does anybody

Re: invalid utf8 chars when indexing or cleaning

2017-08-29 Thread Michael Coffey
Does anybody have any thoughts on this? It seems similar to the NUTCH-1016 bug that was fixed in version 1.4. Some more bits of information: the indexer job rarely fails (only 1 of the last 99 segments) but the cleaning job fails every time now. Once again, this is Nutch 1.12 and Solr 5.4.1. I