[
https://issues.apache.org/jira/browse/NUTCH-471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512712
]
Dennis Kubes commented on NUTCH-471:
Ah, sorry, my configuration was the problem. If you don't upgrade the
[
https://issues.apache.org/jira/browse/NUTCH-497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes closed NUTCH-497.
--
Issue resolved and committed.
Extreme Nested Tags causes StackOverflowException in
[
https://issues.apache.org/jira/browse/NUTCH-497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes resolved NUTCH-497.
Resolution: Fixed
commited with revision 550669
Extreme Nested Tags causes StackOverflowException
[
https://issues.apache.org/jira/browse/NUTCH-497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-497:
---
Attachment: (was: nested-tags-trap2.patch)
Extreme Nested Tags causes StackOverflowException in
[
https://issues.apache.org/jira/browse/NUTCH-497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-497:
---
Attachment: (was: nested-tags-trap3.patch)
Extreme Nested Tags causes StackOverflowException in
[
https://issues.apache.org/jira/browse/NUTCH-497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-497:
---
Attachment: nested-tags-trap2.patch
added nested-tags-trap2.patch with apache grant
Extreme Nested
[
https://issues.apache.org/jira/browse/NUTCH-497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-497:
---
Attachment: nested-tags-trap3.patch
added nested-tags-trap3.patch with apache grant
Extreme Nested
[
https://issues.apache.org/jira/browse/NUTCH-497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506894
]
Dennis Kubes commented on NUTCH-497:
I agree, I think it would be better to have something generic if we are
[
https://issues.apache.org/jira/browse/NUTCH-497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-497:
---
Attachment: nested-tags-trap.patch
This patch reworks DomContentUtils.getOutlinks to use a stack
[
https://issues.apache.org/jira/browse/NUTCH-497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506596
]
Dennis Kubes commented on NUTCH-497:
The newest patch is the nested-tags-trap.patch file.
Extreme Nested Tags
[
https://issues.apache.org/jira/browse/NUTCH-497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-497:
---
Attachment: nested-tags-trap2.patch
Patch with the curNodeDepth removed. The patch file is
[
https://issues.apache.org/jira/browse/NUTCH-497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506725
]
Dennis Kubes commented on NUTCH-497:
Doğacan, that is correct. By using the stack we shouldn't get a
[
https://issues.apache.org/jira/browse/NUTCH-497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-497:
---
Attachment: ExtremeNestedTags.patch
This is a rudimentary fix for those that want a workaround for
DeleteDuplicate fails if Segment index directory has 0 documents
Key: NUTCH-467
URL: https://issues.apache.org/jira/browse/NUTCH-467
Project: Nutch
Issue Type: Bug
[
https://issues.apache.org/jira/browse/NUTCH-467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-467:
---
Attachment: nutch-467.patch
Submitted by Andrzej Bialecki.
DeleteDuplicate fails if Segment index
[
https://issues.apache.org/jira/browse/NUTCH-333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-333:
---
Attachment: use-nutch-job_patch.txt
updated patch, submitted by Doğacan Güney
SegmentMerger and
[
https://issues.apache.org/jira/browse/NUTCH-333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes resolved NUTCH-333.
Resolution: Fixed
Issue resolved
SegmentMerger and SegmentReader should use NutchJob
[
https://issues.apache.org/jira/browse/NUTCH-333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes closed NUTCH-333.
--
SegmentMerger and SegmentReader should use NutchJob
---
Upgrade Nutch to Hadoop 0.12.1
--
Key: NUTCH-459
URL: https://issues.apache.org/jira/browse/NUTCH-459
Project: Nutch
Issue Type: Improvement
Affects Versions: 0.9.0
Environment: All platforms
[
https://issues.apache.org/jira/browse/NUTCH-233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes resolved NUTCH-233.
Resolution: Fixed
The new regex has been added to both the regex-urlfilter.txt and the
[
https://issues.apache.org/jira/browse/NUTCH-436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes closed NUTCH-436.
--
Issue closed.
Incorrect handling of relative paths when the embedded URL path is empty
[
https://issues.apache.org/jira/browse/NUTCH-436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes resolved NUTCH-436.
Resolution: Fixed
Patch tested on 10,000 URL run with no apparent issues. Reviewed and committed.
[
https://issues.apache.org/jira/browse/NUTCH-233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes closed NUTCH-233.
--
Issue closed
wrong regular expression hang reduce process for ever
[
https://issues.apache.org/jira/browse/NUTCH-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12474713
]
Dennis Kubes commented on NUTCH-447:
This tool is for people who need a defined category structure or want to
Allow Plugin Includes and Excludes from File
Key: NUTCH-448
URL: https://issues.apache.org/jira/browse/NUTCH-448
Project: Nutch
Issue Type: Improvement
Affects Versions: 0.9.0
[
https://issues.apache.org/jira/browse/NUTCH-448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-448:
---
Attachment: plugin-fromfile.patch
The plugin-fromfile.patch file contains the functionality for
Dmoz Structure Parser Tool
--
Key: NUTCH-447
URL: https://issues.apache.org/jira/browse/NUTCH-447
Project: Nutch
Issue Type: New Feature
Affects Versions: 0.9.0
Environment: all platforms
[
https://issues.apache.org/jira/browse/NUTCH-447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-447:
---
Attachment: dmoz-structure.patch
Patch that contains the DmozStructureParser class.
Dmoz Structure
[
https://issues.apache.org/jira/browse/NUTCH-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-247:
---
Attachment: agent-names3.patch.txt
This patch logs and throws an exception if the agent name is not
[
https://issues.apache.org/jira/browse/NUTCH-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12474355
]
Dennis Kubes commented on NUTCH-247:
We could move the code to a utility class but if we want it to be called
[
https://issues.apache.org/jira/browse/NUTCH-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12474068
]
Dennis Kubes commented on NUTCH-247:
I agree, but then should we approach the check as a configurable option.
[
https://issues.apache.org/jira/browse/NUTCH-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes reassigned NUTCH-247:
--
Assignee: Dennis Kubes
robot parser to restrict.
-
[
https://issues.apache.org/jira/browse/NUTCH-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-247:
---
Attachment: agent-names.patch
This patch removes the checks and severe logging from the
[
https://issues.apache.org/jira/browse/NUTCH-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473295
]
Dennis Kubes commented on NUTCH-247:
I think the idea here is to NOT allow people to run fetchers for which they
[
https://issues.apache.org/jira/browse/NUTCH-437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-437:
---
Description: The MapFile.Writer signature has changed in hadoop trunk
(version 10.x +) to include a
MapFile in Hadoop 0.10.2 has changed, must update references
Key: NUTCH-437
URL: https://issues.apache.org/jira/browse/NUTCH-437
Project: Nutch
Issue Type: Bug
Affects
[
https://issues.apache.org/jira/browse/NUTCH-437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-437:
---
Attachment: nutch-hadoop-0.10.2-mapfile.patch
This patch changes the references to MapFile.Writer
More description for fetcher.threads.fetch property
---
Key: NUTCH-295
URL: http://issues.apache.org/jira/browse/NUTCH-295
Project: Nutch
Type: Improvement
Components: fetcher
Versions: 0.8-dev
[ http://issues.apache.org/jira/browse/NUTCH-295?page=all ]
Dennis Kubes updated NUTCH-295:
---
Attachment: fetcher_threads_desc.patch
More description for fetcher.threads.fetch property as relating to running in
distributed mode.
More description for
Regular Expression for RegexUrlNormalizer to remove jsessionid
--
Key: NUTCH-255
URL: http://issues.apache.org/jira/browse/NUTCH-255
Project: Nutch
Type: Improvement
Components: fetcher
Versions:
[ http://issues.apache.org/jira/browse/NUTCH-254?page=all ]
Dennis Kubes updated NUTCH-254:
---
Attachment: fetcher_filter_url_patch.txt
patch to fix null pointer in fetcher for filtered urls
Fetcher throws NullPointer if redirect URL is filtered
Some meta-refresh urls get ignored due to matching regular expression
-
Key: NUTCH-243
URL: http://issues.apache.org/jira/browse/NUTCH-243
Project: Nutch
Type: Bug
Components: fetcher
42 matches
Mail list logo