[
https://issues.apache.org/jira/browse/NUTCH-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695233#action_12695233
]
Hudson commented on NUTCH-721:
--
Integrated in Nutch-trunk #772 (See
[http://hudson.zones.apach
[
https://issues.apache.org/jira/browse/NUTCH-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695170#action_12695170
]
Roger Dunk commented on NUTCH-721:
--
For the following tests I've used the same segment cont
[
https://issues.apache.org/jira/browse/NUTCH-692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695122#action_12695122
]
Doğacan Güney commented on NUTCH-692:
-
Thanks for the patch.
Patch looks good to me. Ca
[
https://issues.apache.org/jira/browse/NUTCH-692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cosmin Lehene updated NUTCH-692:
Attachment: NUTCH-692.patch
This just checks the destination file existence before attempting to cre
Hi all. I would like to add keywords to the information that gets inserted
into the Lucene Indexes. I am thinking I need to insert them into the WebDB
and later on insert them into the Lucene indexes. Am I right? Which
extension points do I need to use?
Thanks in advance
--
Rodrigo Reyes
George,
Try using Nutch-1.0 instead. I have tested your example with the SVN version
and it did not get into the problem you described.
J.
2009/4/2 George Herlin
> Indeed I have... that's how I found out.
>
> My test case: crawl
>
> http://www.purdue.ca/research/research_clinical.asp
>
> with
Hi @ all,
I'd like to turn Nutch into an focused / topical crawler. It's a
part of my final year thesis. Further, I'd like that others can
contribute from my work. I started to analyze the code and think
that I found the right peace of code. I just wanted to know if I am
on the right track. I
Hi @ all,
I'd like to turn Nutch into an focused / topical crawler. It's a part
of my final year thesis. Further, I'd like that others can contribute
from my work. I started to analyze the code and think that I found the
right peace of code. I just wanted to know if I am on the right track.
[
https://issues.apache.org/jira/browse/NUTCH-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694986#action_12694986
]
Doğacan Güney edited comment on NUTCH-721 at 4/2/09 6:01 AM:
-
I'
[
https://issues.apache.org/jira/browse/NUTCH-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694986#action_12694986
]
Doğacan Güney commented on NUTCH-721:
-
I've committed nutch 0.9 fetcher as OldFetcher. S
[
https://issues.apache.org/jira/browse/NUTCH-692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694942#action_12694942
]
Julien Nioche commented on NUTCH-692:
-
As I pointed out in my previous message the root
Indeed I have... that's how I found out.
My test case: crawl
http://www.purdue.ca/research/research_clinical.asp
with crawl-urlfilter and regex-urlfilter ending with
#purdue
+^http://www.purdue.ca/research/
+^http://www.purdue.ca/pdf/
# reject anything else
-.
The site is very small (which he
12 matches
Mail list logo