Re: Fetched pages has no content

2011-08-02 Thread webdev1977
both are in the list, but I guess since parse-html is listed first, it wins.. -- View this message in context: http://lucene.472066.n3.nabble.com/Fetched-pages-has-no-content-tp3171881p3218585.html Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Fetched pages has no content

2011-08-01 Thread Markus Jelsma
!? I have been wondering for a week or two what has changed between 1.2 and 1.3 that would have caused such a problem. Is there a JIRA open for the issue? -- View this message in context: http://lucene.472066.n3.nabble.com/Fetched-pages-has-no-content-tp3171881p 3216734.html Sent from

Re: Fetched pages has no content

2011-08-01 Thread webdev1977
/test.php?id=123 link .../html.. but then it does nothing with the links here. I have tried changing my filters multiple times and it just won't parse them. I also ran the ParseChecker class and I get 0 outlinks. -- View this message in context: http://lucene.472066.n3.nabble.com/Fetched-pages-has

Re: Fetched pages has no content

2011-07-20 Thread Julien Nioche
protocol-httpclient is broken and needs replacing On 19 July 2011 23:10, Anders Rask anr...@gmail.com wrote: Hi guys! I experimented some more, and it seems I'm only getting these problems when using protocol-httpclient. It works fine when I use protocol-http. Could you please try and see

Re: Fetched pages has no content

2011-07-18 Thread lewis john mcgibbney
Hi, If you have a look at your regex-ulrfilter.txt it will by default be rejecting ? in the URL. Please test with line edited (or commented out) and see if the problem fades. On Mon, Jul 18, 2011 at 10:11 AM, Anders Rask anr...@gmail.com wrote: Hi Markus! We are using a custom parser, but I

Re: Fetched pages has no content

2011-07-18 Thread Markus Jelsma
Judging from the segment those url's are fetched and parsed. I think maybe some HTML parse API's have changed between your 1.1 and 1.2 versions. If parserchecker shows the same issue then it's most likey a parse plugin problem for the new version. Can you check? Hi, If you have a look at

Re: Fetched pages has no content

2011-07-18 Thread Julien Nioche
As pointed out by Markus the logs show that the content has been properly fetched. Moreover ./nutch org.apache.nutch.parse.ParserChecker ' http://www.uu.se/news/news_item.php?typ=pmid=1381' works fine. Double check your custom parser, it is likely to be the source of the problem. BTW : what

Fetched pages has no content

2011-07-15 Thread Anders Rask
Hi! We are using Nutch to crawl a bunch of websites and index them to Solr. At the moment we are in the process of upgrading from Nutch 1.1 to Nutch 1.3 and in the same time going from one server to two servers. Unfortunately we are stuck with a problem which we haven't seen in the old