Hi Lewis, thanks for the reply. Sorry I couldn't get back to you soon as I was 
on vacation.



I tried out the NUTCH 1044 patch on nutch 1.4 with a test website where a jsp 
page sends a 302 redirect request to another jsp page. But here I observed that 
the score of redirected URL is still set to 0 in crawl datum, whereas according 
to the patch it should be 1. Here the crawl dump



http://lcoalhost/Test/302_Redirect.jsp Version: 7

Status: 4 (db_redir_temp)

Fetch time: Thu May 24 19:45:47 IST 2012

Modified time: Tue Apr 24 19:45:47 IST 2012

Retries since fetch: 0

Retry interval: 2592000 seconds (30 days)

Score: 1.0

Signature: null

Metadata: _pst_: temp_moved(13), lastModified=0: 
http://localhost/Test/302_Redirect1.jsp



http://localhost/Test/302_Redirect1.jsp Version: 7

Status: 2 (db_fetched)

Fetch time: Thu May 24 19:45:48 IST 2012

Modified time: Tue Apr 24 19:45:48 IST 2012

Retries since fetch: 0

Retry interval: 2592000 seconds (30 days)

Score: 0.0

Signature: 8fd921e73ae1bdd60fc509fba24548a5

Metadata: _pst_: success(1), lastModified=0_repr_: 
http://localhost/Test/302_Redirect.jsp



I also carried out the above test with 301 redirected urls and soft redirected 
url, but the result are same as the one without patch.



Can you please verify this at your end and let me know.



Thanks,

Pravin

________________________________________

From: Lewis John Mcgibbney [[email protected]]

Sent: Friday, April 06, 2012 8:12 PM

To: [email protected]

Subject: Re: Question related to NUCTH 1044 redirected URLS and invalid scores



Hi Pravin,



By the looks of it, the commit should have included changes to 4 files as

you state and as the patch shows.



Maybe we need to revisit this one and make a commit to trunk?



I would apply the patch then try to reproduce the incorrect results and see

where you get.



Please let us know.



lewis



On Fri, Apr 6, 2012 at 9:50 AM, Pravin Agrawal <

[email protected]> wrote:



> Hi All,

>

> While working with latest release nutch 1.4, I encountered a URL where the

> homepage itself was redirected to some other URL and each outlinks from

> that pages were given a score of 0.

> Found the related issue in jira (NUCTH-1044) which says it is fixed in

> nutch 1.4.

> But While I was going through the patch attached, I observed that there

> are four files that were to be changed, but the changes committed in

> revision 1156342 shows only two files checked in.

> Latest Source code for 1.4 is also missing on the changes for

> OPICScoringFIlter and LinkAnalysisScoringFilter given in the patch.

> can anyone please clarify on this?

> Is the attached patch in NUTCH 104 jira updated?

> Can I apply the patch without worrying about the side effects?

>

> Thanks,

> Pravin

>

> DISCLAIMER

> ==========

> This e-mail may contain privileged and confidential information which is

> the property of Persistent Systems Ltd. It is intended only for the use of

> the individual or entity to which it is addressed. If you are not the

> intended recipient, you are not authorized to read, retain, copy, print,

> distribute or use this message. If you have received this communication in

> error, please notify the sender and delete all copies of this message.

> Persistent Systems Ltd. does not accept any liability for virus infected

> mails.

>







--

*Lewis*

DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.

Reply via email to