Re: java.lang.NullPointerException at org.apache.xerces.parsers.AbstractDOMParser.characters(Unknown Source)

2014-08-13 Thread Steve Cohen
I forgot about the parsechecker and indexchecker command line options. When I run it parsechecker with the default nutch with the standard job file it works. 14/08/13 11:35:28 INFO http.Http: http.proxy.host = null 14/08/13 11:35:28 INFO http.Http: http.proxy.port = 8080 14/08/13 11:35:28 INFO ht

Re: [VOTE] Apache Nutch 1.9 Release Candidate #1

2014-08-13 Thread feng lu
great , pass all tests. +1 for release. On Wed, Aug 13, 2014 at 1:31 PM, Lewis John Mcgibbney < lewis.mcgibb...@gmail.com> wrote: > Hi user@ & dev@,This thread is a VOTE for releasing Apache Nutch 1.9. > The release candidate comprises the following components.* A staging > repository [0] conta

Re: How to recrawl changing the seed.txt list

2014-08-13 Thread Julien Nioche
Hi, Yes, that should be fine. The only thing I would do differently would be : 2. Change the list in regex-urlfilter.txt (add +^http://www.rlp.de/ i.e. > for every url) allow any URLs instead of specifying all the hostnames one by one but set the following property to true in nutch-site.xml :

Re: [VOTE] Apache Nutch 1.9 Release Candidate #1

2014-08-13 Thread Julien Nioche
Hi, +1 to release. Compilation and tests run fine. Signatures look good. Thanks Lewis! Julien On 13 August 2014 06:32, Lewis John Mcgibbney wrote: > VOTE'ing will be open for 'at-least' 72 hours to allow people enough time > to cast their VOTE's. > Thanks > Lewis > > > On Tue, Aug 12, 2014 a

Re: java.lang.NullPointerException at org.apache.xerces.parsers.AbstractDOMParser.characters(Unknown Source)

2014-08-13 Thread Julien Nioche
Hi Steve, I tried with Nutch 1.9 RC1 and am not getting this exception. => ./nutch parsechecker -D http.agent.name=tralala http://www.my-ebenefits.com/PenguinRandomHouse/ Probably something that we fixed since 1.5.1 which is rather outdated. Why don't you give 1.9 a try instead? Julien On 12