Tika 2.0 - Replace POI IOUtils with commons-io IOUtils

2016-03-27 Thread Bob Paulin
Hi, Currently the Apache POI dependency is in several modules and it's sort of a beast (> 2 MB in size). It appears many of the modules are only using the IOUtils library. The big exception is the office module which is responsible for parsing documents. These methods appear to also exist

Re: Tika 2.0 - Replace POI IOUtils with commons-io IOUtils

2016-03-27 Thread Bob Paulin
There is also org.apache.poi.util.StringUtil (in cad module) and org.apache.poi.util.LittleEndian (in code module) Neither of these seem to have commons libraries replacements from what I can see. Given the small amount of code in the methods that are actually used would it make sense to mov

Re: Tika 2.0 - Replace POI IOUtils with commons-io IOUtils

2016-03-27 Thread Nick Burch
On Sun, 27 Mar 2016, Bob Paulin wrote: Currently the Apache POI dependency is in several modules and it's sort of a beast (> 2 MB in size). You should've seen it before Jukka and Yegor spent a crazy ApacheCon hacking up the ooxml-lite support... ;-) It appears many of the modules are only u

Re: Tika 2.0 - Replace POI IOUtils with commons-io IOUtils

2016-03-27 Thread Bob Paulin
Hi Nick, On 3/27/2016 6:52 PM, Nick Burch wrote: On Sun, 27 Mar 2016, Bob Paulin wrote: Currently the Apache POI dependency is in several modules and it's sort of a beast (> 2 MB in size). You should've seen it before Jukka and Yegor spent a crazy ApacheCon hacking up the ooxml-lite support.

Re: Tika 2.0 - Replace POI IOUtils with commons-io IOUtils

2016-03-27 Thread Bob Paulin
Tika's IOUtils appears to be missing the readFully method. Should that be added? - Bob On 3/27/2016 6:52 PM, Nick Burch wrote: On Sun, 27 Mar 2016, Bob Paulin wrote: Currently the Apache POI dependency is in several modules and it's sort of a beast (> 2 MB in size). You should've seen it b

Re: Tika 2.0 - Replace POI IOUtils with commons-io IOUtils

2016-03-28 Thread Nick Burch
On Sun, 27 Mar 2016, Bob Paulin wrote: Yes I think overall if these functions can live in somewhere either inside tika or a smaller dependent library we're in a better place. I'll take a look at Ogg-Vorbis. The two util classes there, that spring to mind, are: https://github.com/Gagravarr/Vorb

Re: Tika 2.0 - Replace POI IOUtils with commons-io IOUtils

2016-03-28 Thread Nick Burch
On Sun, 27 Mar 2016, Bob Paulin wrote: Tika's IOUtils appears to be missing the readFully method. Should that be added? There was discussion about getting rid of the Tika IOUtils method in favour of depending on commons-io. If that method is on commons-io, then we could use that without need

RE: Tika 2.0 - Replace POI IOUtils with commons-io IOUtils

2016-03-28 Thread Ken Krugler
Hi Bob, > From: Nick Burch > Sent: March 28, 2016 6:49:09am PDT > To: dev@tika.apache.org > Subject: Re: Tika 2.0 - Replace POI IOUtils with commons-io IOUtils > > On Sun, 27 Mar 2016, Bob Paulin wrote: >> Tika's IOUtils appears to be missing the readFully meth

Re: Tika 2.0 - Replace POI IOUtils with commons-io IOUtils

2016-03-28 Thread Bob Paulin
library or within tika-core. - Bob On Mon, Mar 28, 2016 at 9:18 AM, Ken Krugler wrote: > Hi Bob, > > > From: Nick Burch > > Sent: March 28, 2016 6:49:09am PDT > > To: dev@tika.apache.org > > Subject: Re: Tika 2.0 - Replace POI IOUtils with commons-io IOUtils > >