Do not index empty values for title field
-
Key: NUTCH-1004
URL: https://issues.apache.org/jira/browse/NUTCH-1004
Project: Nutch
Issue Type: Bug
Components: indexer
Affects Versions:
Index headings h1 and h2
Key: NUTCH-1005
URL: https://issues.apache.org/jira/browse/NUTCH-1005
Project: Nutch
Issue Type: New Feature
Components: indexer, parser
Reporter: Markus Jelsma
Very
[
https://issues.apache.org/jira/browse/NUTCH-1005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1005:
-
Attachment: HeadingsParseFilter.java
HeadingsIndexingFilter.java
Index headings
[
https://issues.apache.org/jira/browse/NUTCH-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1004:
-
Description: Tika can generate multiple values for the title field for some
files such as
Hi,
I'm trying to get an idea of the type of issues which are currently being
addressed and was looking through trivial issues in JIRA. There appear to be
outstanding items from 2005, 2008 e.g. nutch-62 nutch-623 etc... I'm
assuming that these aren't being assigned as they are of no interest to
Hi Lewis,
Thanks much! Feel free to resolve issues from 05' and 08' that you don't think
anyone is tackling or is going to tackle and more importantly ones that no
longer make sense to tackle given project direction, time, resources, etc.
Thanks!
Cheers,
Chris
On Jun 7, 2011, at 8:01 AM,
Great! I did this a few months ago as well, closing a lot of legacy. I
obviously missed some such as 62 which can now be addressed using other
plugins.
My criterium mostly was closing issues involved with the old Lucene legacy and
of course some other weird entries.
Cheers
On Tuesday 07
[
https://issues.apache.org/jira/browse/NUTCH-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045524#comment-13045524
]
Lewis John McGibbney commented on NUTCH-623:
Having checked branch-1.3 in
to all
On Tue, Jun 7, 2011 at 6:22 PM, lewis john mcgibbney
lewis.mcgibb...@gmail.com wrote:
OK I will begin a basic clear out of redundant issues when my transition to
project group status for JIRA is approved.
Thanks
On Tue, Jun 7, 2011 at 5:13 PM, Markus Jelsma
meta equiv with single quotes not accepted
--
Key: NUTCH-1006
URL: https://issues.apache.org/jira/browse/NUTCH-1006
Project: Nutch
Issue Type: Bug
Components: parser
Affects Versions:
[
https://issues.apache.org/jira/browse/NUTCH-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma closed NUTCH-1002.
You can use the existing URL filter plugins as an example:
[
https://issues.apache.org/jira/browse/NUTCH-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma reassigned NUTCH-1000:
Assignee: Markus Jelsma
Add option not to commit to Solr
Hi Folks,
This VOTE has passed with the following tallies:
+1 Nutch PMC
Chris Mattmann
Markus Jelsma
Julien Nioche
Lewis John McGibbney
I'll go ahead and push the release to the mirrors and release the Maven repo to
Central and then send an ANNOUNCE.
Thanks!
Cheers,
Chris
(...apologies for the cross posting...)
The Apache Nutch project is pleased to announce the release of Apache Nutch
1.3 The release contents have been pushed out to the main Apache release
site so the releases should be available as soon as the mirrors get the
syncs.
Apache Nutch is an
See https://builds.apache.org/job/Nutch-trunk/1511/
--
[...truncated 985 lines...]
A src/plugin/subcollection/src/java/org/apache/nutch/collection
A
src/plugin/subcollection/src/java/org/apache/nutch/collection/Subcollection.java
A
15 matches
Mail list logo