Hello Nutch Users:

I’m currently having an issue with Nutch 1.4, similar to the one logged here:

https://issues.apache.org/jira/browse/NUTCH-2319

Using the example in that JIRA issue, if I am on the following URL:
http://rssfeeds.azcentral.com/phoenix/asu

I expect that nutch will be able to find the alternate linked URL, specified in 
the following link tag:

<link rel="alternate" type="application/atom+xml" 
href="http://rssfeeds.azcentral.com/phoenix/asu&amp;x=1"; title="Phoenix - ASU">

It does not however, even though I’ve tried to make a few changes to the RegEX 
in in suffix-urlfilter.txt, regex-normalize.xml, regex-urlfilter.txt, and 
prefix-urlfilter.txt but have not had any success.

Any feedback would be appreciated.

Please let me know,

MA
This message contains information which may be confidential and privileged. 
Unless you are the intended addressee (or authorized to receive for the 
intended addressee), you may not use, copy or disclose to anyone the message or 
any information contained in the message. If you have received the message in 
error, please advise the sender by reply and delete the message.

Reply via email to