[ 
https://issues.apache.org/jira/browse/TIKA-106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting resolved TIKA-106.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 0.1-incubator

Patch committed in revision 606140. Thanks!

> Remove dependency on Jakarta ORO - use JDK 1.4 Regex
> ----------------------------------------------------
>
>                 Key: TIKA-106
>                 URL: https://issues.apache.org/jira/browse/TIKA-106
>             Project: Tika
>          Issue Type: Improvement
>          Components: general
>            Reporter: Niall Pemberton
>            Assignee: Jukka Zitting
>            Priority: Minor
>             Fix For: 0.1-incubator
>
>         Attachments: TIKA-106-remove-ORO-dependency-v2.patch
>
>
> Jakarta ORO is only used in one place in Tika - the RegexUtils's extract() 
> method (which is only called in one place in ParserPostProcessor). JDK 1.4 
> introduced built in regular expression support and changing the RegexUtils to 
> use this would remove the need for Jakarta ORO as a dependency.
> From the comments in RegexUtils it apears that this code was copied from 
> Nutch's OutlinkExtractor[1] - there seems to have been a similar move in 
> Nutch back in March in r516754[2] - however it was reverted the next day in 
> r517015[3] - I couldn't really see anything on the Nutch dev list to explain 
> this, except possibly this post http://tinyurl.com/2s2y9r
> [1] 
> http://svn.apache.org/repos/asf/lucene/nutch/trunk/src/java/org/apache/nutch/parse/OutlinkExtractor.java
> [2] http://svn.apache.org/viewvc?view=rev&revision=516754
> [3] http://svn.apache.org/viewvc?view=rev&revision=517015

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to