[jira] Commented: (NUTCH-721) Fetcher2 Slow

2009-04-02 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695233#action_12695233 ] Hudson commented on NUTCH-721: -- Integrated in Nutch-trunk #772 (See [http://hudson.zones.apach

[jira] Commented: (NUTCH-721) Fetcher2 Slow

2009-04-02 Thread Roger Dunk (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695170#action_12695170 ] Roger Dunk commented on NUTCH-721: -- For the following tests I've used the same segment cont

[jira] Commented: (NUTCH-692) AlreadyBeingCreatedException with Hadoop 0.19

2009-04-02 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695122#action_12695122 ] Doğacan Güney commented on NUTCH-692: - Thanks for the patch. Patch looks good to me. Ca

[jira] Updated: (NUTCH-692) AlreadyBeingCreatedException with Hadoop 0.19

2009-04-02 Thread Cosmin Lehene (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cosmin Lehene updated NUTCH-692: Attachment: NUTCH-692.patch This just checks the destination file existence before attempting to cre

Using keywords metatags

2009-04-02 Thread Rodrigo Reyes C.
Hi all. I would like to add keywords to the information that gets inserted into the Lucene Indexes. I am thinking I need to insert them into the WebDB and later on insert them into the Lucene indexes. Am I right? Which extension points do I need to use? Thanks in advance -- Rodrigo Reyes

Re: Infinite loop bug in Nutch 0.9

2009-04-02 Thread Julien Nioche
George, Try using Nutch-1.0 instead. I have tested your example with the SVN version and it did not get into the problem you described. J. 2009/4/2 George Herlin > Indeed I have... that's how I found out. > > My test case: crawl > > http://www.purdue.ca/research/research_clinical.asp > > with

Re: Nutch Topical / Focused Crawl

2009-04-02 Thread Ken Krugler
Hi @ all, I'd like to turn Nutch into an focused / topical crawler. It's a part of my final year thesis. Further, I'd like that others can contribute from my work. I started to analyze the code and think that I found the right peace of code. I just wanted to know if I am on the right track. I

Nutch Topical / Focused Crawl

2009-04-02 Thread MyD
Hi @ all, I'd like to turn Nutch into an focused / topical crawler. It's a part of my final year thesis. Further, I'd like that others can contribute from my work. I started to analyze the code and think that I found the right peace of code. I just wanted to know if I am on the right track.

[jira] Issue Comment Edited: (NUTCH-721) Fetcher2 Slow

2009-04-02 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694986#action_12694986 ] Doğacan Güney edited comment on NUTCH-721 at 4/2/09 6:01 AM: - I'

[jira] Commented: (NUTCH-721) Fetcher2 Slow

2009-04-02 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694986#action_12694986 ] Doğacan Güney commented on NUTCH-721: - I've committed nutch 0.9 fetcher as OldFetcher. S

[jira] Commented: (NUTCH-692) AlreadyBeingCreatedException with Hadoop 0.19

2009-04-02 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694942#action_12694942 ] Julien Nioche commented on NUTCH-692: - As I pointed out in my previous message the root

Re: Infinite loop bug in Nutch 0.9

2009-04-02 Thread George Herlin
Indeed I have... that's how I found out. My test case: crawl http://www.purdue.ca/research/research_clinical.asp with crawl-urlfilter and regex-urlfilter ending with #purdue +^http://www.purdue.ca/research/ +^http://www.purdue.ca/pdf/ # reject anything else -. The site is very small (which he