Hello,
I'm crawling sites that have mime types that I don't want to fetch, although
the URLs themselves don't have any distinguishing pattern, so I can't use
the regex URL filter to skip these URLs. As far as I know, there is
presently no way to filter fetched content by mime type.
E.g. How can I
[
https://issues.apache.org/jira/browse/NUTCH-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Emmanuel Joke updated NUTCH-615:
Attachment: NUTCH-615.patch
> Redirected URL are fetched wihtout setting any FetchInterval
> ---
Redirected URL are fetched wihtout setting any FetchInterval
Key: NUTCH-615
URL: https://issues.apache.org/jira/browse/NUTCH-615
Project: Nutch
Issue Type: Bug
Components
[
https://issues.apache.org/jira/browse/NUTCH-616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Emmanuel Joke updated NUTCH-616:
Attachment: NUTCH-616.patch
Patch provided
> Reset Fetch Retry counter when fetch is successful
> -
Reset Fetch Retry counter when fetch is successful
--
Key: NUTCH-616
URL: https://issues.apache.org/jira/browse/NUTCH-616
Project: Nutch
Issue Type: Bug
Affects Versions: 1.0.0
[
https://issues.apache.org/jira/browse/NUTCH-614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-614:
---
Attachment: NUTCH-614-2-20080226.patch
Very, very messy patch. This is a first cut at both allowing
See http://hudson.zones.apache.org/hudson/job/Nutch-trunk/371/changes
--
[...truncated 2055 lines...]
AUsrc/plugin/parse-html/plugin.xml
AUsrc/plugin/parse-html/build.xml
A src/plugin/protocol-httpclient
A src/plugin/protocol-
Sorry, ignore this. I'm trying to fix the "whoami" test problem.
Nige
On Feb 26, 2008, at 5:06 PM, Apache Hudson Server wrote:
See http://hudson.zones.apache.org/hudson/job/Nutch-trunk/371/changes
--
[...truncated 2055 lines...]
AUsrc/plugin/pa
See http://hudson.zones.apache.org/hudson/job/Nutch-trunk/372/changes
--
[...truncated 4603 lines...]
copy-generated-lib:
[copy] Copying 1 file to
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-regex
init:
See http://hudson.zones.apache.org/hudson/job/Nutch-trunk/373/changes
--
[...truncated 4603 lines...]
copy-generated-lib:
[copy] Copying 1 file to
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-regex
init:
See http://hudson.zones.apache.org/hudson/job/Nutch-trunk/374/changes
--
[...truncated 4605 lines...]
copy-generated-lib:
[copy] Copying 1 file to
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-regex
init:
See http://hudson.zones.apache.org/hudson/job/Nutch-trunk/375/changes
--
[...truncated 6123 lines...]
init-plugin:
deps-jar:
compile:
[echo] Compiling plugin: lib-regex-filter
jar:
init:
init-plugin:
deps-jar:
compile:
[echo] Compiling plug
12 matches
Mail list logo