[jira] Updated: (NUTCH-677) Segment merge filering based on segment content

2009-10-08 Thread Marcin Okraszewski (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcin Okraszewski updated NUTCH-677: - Attachment: SegmentMergeFilter.java Added Apache License. Segment merge filering based

[jira] Updated: (NUTCH-677) Segment merge filering based on segment content

2009-10-08 Thread Marcin Okraszewski (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcin Okraszewski updated NUTCH-677: - Attachment: SegmentMergeFilters.java Added Apache license header. Segment merge

[jira] Commented: (NUTCH-677) Segment merge filering based on segment content

2009-10-08 Thread Marcin Okraszewski (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763681#action_12763681 ] Marcin Okraszewski commented on NUTCH-677: -- Sorry, I didn't notice the request for

[jira] Updated: (NUTCH-740) Configuration option to override default language for fetched pages.

2009-06-09 Thread Marcin Okraszewski (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcin Okraszewski updated NUTCH-740: - Attachment: AcceptLanguage_trunk_2009-06-09.patch It does apply, but with Fuzz factor set

[jira] Created: (NUTCH-740) Configuration option to override default language for fetched pages.

2009-05-28 Thread Marcin Okraszewski (JIRA)
Configuration option to override default language for fetched pages. Key: NUTCH-740 URL: https://issues.apache.org/jira/browse/NUTCH-740 Project: Nutch Issue Type:

[jira] Updated: (NUTCH-740) Configuration option to override default language for fetched pages.

2009-05-28 Thread Marcin Okraszewski (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcin Okraszewski updated NUTCH-740: - Attachment: AcceptLanguage.patch The patch which allows overriding of Accept-Language

[jira] Updated: (NUTCH-677) Segment merge filering based on segment content

2009-05-27 Thread Marcin Okraszewski (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcin Okraszewski updated NUTCH-677: - Attachment: MergeFilter_for_1.0.patch The patch ported to Nutch 1.0. The Java files

[jira] Updated: (NUTCH-490) Extension point with filters for Neko HTML parser (with patch)

2009-05-27 Thread Marcin Okraszewski (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcin Okraszewski updated NUTCH-490: - Attachment: NekoFilters_for_1.0.patch Patch ported to Nutch 1.0. It includes the two

[jira] Created: (NUTCH-677) Segment merge filering based on segment content

2009-01-08 Thread Marcin Okraszewski (JIRA)
Segment merge filering based on segment content --- Key: NUTCH-677 URL: https://issues.apache.org/jira/browse/NUTCH-677 Project: Nutch Issue Type: Improvement Affects Versions: 0.9.0

[jira] Updated: (NUTCH-677) Segment merge filering based on segment content

2009-01-08 Thread Marcin Okraszewski (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcin Okraszewski updated NUTCH-677: - Attachment: MergeFilter.patch The patch for 0.9 Segment merge filering based on segment

[jira] Updated: (NUTCH-677) Segment merge filering based on segment content

2009-01-08 Thread Marcin Okraszewski (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcin Okraszewski updated NUTCH-677: - Attachment: SegmentMergeFilter.java The filter interface (referred by the patch).

[jira] Updated: (NUTCH-677) Segment merge filering based on segment content

2009-01-08 Thread Marcin Okraszewski (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcin Okraszewski updated NUTCH-677: - Attachment: SegmentMergeFilters.java Merge filter aggregation which hides extension

[jira] Updated: (NUTCH-488) Avoid parsing uneccessary links and get a more relevant outlink list

2007-10-15 Thread Marcin Okraszewski (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcin Okraszewski updated NUTCH-488: - Attachment: ignore_tags_v3.patch OK, yet another approach based on Doğacan comments.

[jira] Updated: (NUTCH-490) Extension point with filters for Neko HTML parser (with patch)

2007-05-22 Thread Marcin Okraszewski (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcin Okraszewski updated NUTCH-490: - Attachment: HtmlParser.java.diff Patch for HtmlParser. Extension point with filters for

[jira] Updated: (NUTCH-490) Extension point with filters for Neko HTML parser (with patch)

2007-05-22 Thread Marcin Okraszewski (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcin Okraszewski updated NUTCH-490: - Attachment: nutch-extensionpoins_plugin.xml.diff Patch for plugin.xml in

[jira] Created: (NUTCH-487) Neko HTML parser goes on default settings.

2007-05-21 Thread Marcin Okraszewski (JIRA)
Neko HTML parser goes on default settings. -- Key: NUTCH-487 URL: https://issues.apache.org/jira/browse/NUTCH-487 Project: Nutch Issue Type: Bug Components: fetcher Affects Versions: