[ 
https://issues.apache.org/jira/browse/TIKA-106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niall Pemberton updated TIKA-106:
---------------------------------

    Attachment: TIKA-106-remove-ORO-dependency-v2.patch

Attaching v2 of the patch (first version didn't remove the ORO dependency from 
pom.xml)

> Remove dependency on Jakarta ORO - use JDK 1.4 Regex
> ----------------------------------------------------
>
>                 Key: TIKA-106
>                 URL: https://issues.apache.org/jira/browse/TIKA-106
>             Project: Tika
>          Issue Type: Task
>          Components: general
>            Reporter: Niall Pemberton
>            Priority: Minor
>         Attachments: TIKA-106-remove-ORO-dependency-v2.patch
>
>
> Jakarta ORO is only used in one place in Tika - the RegexUtils's extract() 
> method (which is only called in one place in ParserPostProcessor). JDK 1.4 
> introduced built in regular expression support and changing the RegexUtils to 
> use this would remove the need for Jakarta ORO as a dependency.
> From the comments in RegexUtils it apears that this code was copied from 
> Nutch's OutlinkExtractor[1] - there seems to have been a similar move in 
> Nutch back in March in r516754[2] - however it was reverted the next day in 
> r517015[3] - I couldn't really see anything on the Nutch dev list to explain 
> this, except possibly this post http://tinyurl.com/2s2y9r
> [1] 
> http://svn.apache.org/repos/asf/lucene/nutch/trunk/src/java/org/apache/nutch/parse/OutlinkExtractor.java
> [2] http://svn.apache.org/viewvc?view=rev&revision=516754
> [3] http://svn.apache.org/viewvc?view=rev&revision=517015

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to