[
http://issues.apache.org/jira/browse/NUTCH-21?page=comments#action_12320763 ]
Stephan Strittmatter commented on NUTCH-21:
---
I will verify the Unit-Tests until next week!
> parser plugin for MS PowerPoint slides
> -
Project URL in JIRA
---
Key: NUTCH-77
URL: http://issues.apache.org/jira/browse/NUTCH-77
Project: Nutch
Type: Task
Reporter: Stephan Strittmatter
Priority: Trivial
The project URL on JIRA should be updated from
http://incubator.apache.or
[ http://issues.apache.org/jira/browse/NUTCH-20?page=all ]
Stephan Strittmatter updated NUTCH-20:
--
Attachment: OutlinkExtractor.java
anchor "null" causes NPE. changed to anchor as empty String.
> Extract urls from plain texts
> ---
[ http://issues.apache.org/jira/browse/NUTCH-20?page=all ]
Stephan Strittmatter updated NUTCH-20:
--
Description:
Some parsers have no Outlinks returned. E.g. the Word-Parser.
This class is able to extract (absolute) hyperlinks from a plain String
(c
[ http://issues.apache.org/jira/browse/NUTCH-21?page=all ]
Stephan Strittmatter updated NUTCH-21:
--
Attachment: parse-mspowerpoint.zip
Updated plugin sources in respect of changed Nutch interface
> parser plugin for MS PowerPoint slides
> --