[jira] [Created] (NUTCH-1004) Do not index empty values for title field

2011-06-07 Thread Markus Jelsma (JIRA)
Do not index empty values for title field - Key: NUTCH-1004 URL: https://issues.apache.org/jira/browse/NUTCH-1004 Project: Nutch Issue Type: Bug Components: indexer Affects Versions:

[jira] [Created] (NUTCH-1005) Index headings h1 and h2

2011-06-07 Thread Markus Jelsma (JIRA)
Index headings h1 and h2 Key: NUTCH-1005 URL: https://issues.apache.org/jira/browse/NUTCH-1005 Project: Nutch Issue Type: New Feature Components: indexer, parser Reporter: Markus Jelsma Very

[jira] [Updated] (NUTCH-1005) Index headings h1 and h2

2011-06-07 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1005: - Attachment: HeadingsParseFilter.java HeadingsIndexingFilter.java Index headings

[jira] [Updated] (NUTCH-1004) Do not index empty values for title field

2011-06-07 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1004: - Description: Tika can generate multiple values for the title field for some files such as

dated issues in JIRA

2011-06-07 Thread lewis john mcgibbney
Hi, I'm trying to get an idea of the type of issues which are currently being addressed and was looking through trivial issues in JIRA. There appear to be outstanding items from 2005, 2008 e.g. nutch-62 nutch-623 etc... I'm assuming that these aren't being assigned as they are of no interest to

Re: dated issues in JIRA

2011-06-07 Thread Mattmann, Chris A (388J)
Hi Lewis, Thanks much! Feel free to resolve issues from 05' and 08' that you don't think anyone is tackling or is going to tackle and more importantly ones that no longer make sense to tackle given project direction, time, resources, etc. Thanks! Cheers, Chris On Jun 7, 2011, at 8:01 AM,

Re: dated issues in JIRA

2011-06-07 Thread Markus Jelsma
Great! I did this a few months ago as well, closing a lot of legacy. I obviously missed some such as 62 which can now be addressed using other plugins. My criterium mostly was closing issues involved with the old Lucene legacy and of course some other weird entries. Cheers On Tuesday 07

[jira] [Commented] (NUTCH-623) Change plugin source directory languageidentifier to language-identifier

2011-06-07 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045524#comment-13045524 ] Lewis John McGibbney commented on NUTCH-623: Having checked branch-1.3 in

Re: dated issues in JIRA

2011-06-07 Thread lewis john mcgibbney
to all On Tue, Jun 7, 2011 at 6:22 PM, lewis john mcgibbney lewis.mcgibb...@gmail.com wrote: OK I will begin a basic clear out of redundant issues when my transition to project group status for JIRA is approved. Thanks On Tue, Jun 7, 2011 at 5:13 PM, Markus Jelsma

[jira] [Created] (NUTCH-1006) meta equiv with single quotes not accepted

2011-06-07 Thread Markus Jelsma (JIRA)
meta equiv with single quotes not accepted -- Key: NUTCH-1006 URL: https://issues.apache.org/jira/browse/NUTCH-1006 Project: Nutch Issue Type: Bug Components: parser Affects Versions:

[jira] [Closed] (NUTCH-1002) Want to be able to filter url's through code, rather than through configuration file - crawl-urlfilter.txt

2011-06-07 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-1002. You can use the existing URL filter plugins as an example:

[jira] [Assigned] (NUTCH-1000) Add option not to commit to Solr

2011-06-07 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma reassigned NUTCH-1000: Assignee: Markus Jelsma Add option not to commit to Solr

[RESULT] [VOTE] Apache Nutch 1.3 Release Candidate #3

2011-06-07 Thread Mattmann, Chris A (388J)
Hi Folks, This VOTE has passed with the following tallies: +1 Nutch PMC Chris Mattmann Markus Jelsma Julien Nioche Lewis John McGibbney I'll go ahead and push the release to the mirrors and release the Maven repo to Central and then send an ANNOUNCE. Thanks! Cheers, Chris

[ANNOUNCE] Apache Nutch 1.3 released

2011-06-07 Thread Mattmann, Chris A (388J)
(...apologies for the cross posting...) The Apache Nutch project is pleased to announce the release of Apache Nutch 1.3 The release contents have been pushed out to the main Apache release site so the releases should be available as soon as the mirrors get the syncs. Apache Nutch is an

Build failed in Jenkins: Nutch-trunk #1511

2011-06-07 Thread Apache Jenkins Server
See https://builds.apache.org/job/Nutch-trunk/1511/ -- [...truncated 985 lines...] A src/plugin/subcollection/src/java/org/apache/nutch/collection A src/plugin/subcollection/src/java/org/apache/nutch/collection/Subcollection.java A