[ https://issues.apache.org/jira/browse/TIKA-106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Niall Pemberton updated TIKA-106: --------------------------------- Attachment: TIKA-106-remove-ORO-dependency-v2.patch Attaching v2 of the patch (first version didn't remove the ORO dependency from pom.xml) > Remove dependency on Jakarta ORO - use JDK 1.4 Regex > ---------------------------------------------------- > > Key: TIKA-106 > URL: https://issues.apache.org/jira/browse/TIKA-106 > Project: Tika > Issue Type: Task > Components: general > Reporter: Niall Pemberton > Priority: Minor > Attachments: TIKA-106-remove-ORO-dependency-v2.patch > > > Jakarta ORO is only used in one place in Tika - the RegexUtils's extract() > method (which is only called in one place in ParserPostProcessor). JDK 1.4 > introduced built in regular expression support and changing the RegexUtils to > use this would remove the need for Jakarta ORO as a dependency. > From the comments in RegexUtils it apears that this code was copied from > Nutch's OutlinkExtractor[1] - there seems to have been a similar move in > Nutch back in March in r516754[2] - however it was reverted the next day in > r517015[3] - I couldn't really see anything on the Nutch dev list to explain > this, except possibly this post http://tinyurl.com/2s2y9r > [1] > http://svn.apache.org/repos/asf/lucene/nutch/trunk/src/java/org/apache/nutch/parse/OutlinkExtractor.java > [2] http://svn.apache.org/viewvc?view=rev&revision=516754 > [3] http://svn.apache.org/viewvc?view=rev&revision=517015 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.