Re: Error thrown with TikaConfig() constructor

2010-09-12 Thread Oleg Tikhonov
There are the situations, I could think about, where you would like to implement customized classloader: 1. You need different hierarchy to load classes, as OSGi for instance. Hollywood principle if you like. 2. When you need to run different versions of classes or jars. For example, you want to

Re: Error thrown with TikaConfig() constructor

2010-09-12 Thread Jukka Zitting
Hi, On Sun, Sep 12, 2010 at 5:46 PM, Ken Krugler kkrugler_li...@transpac.com wrote: But that also seems clunky. Any other suggestions? A simpler approach would be to simply pass a list of already instantiated Parser objects to AutoDetectParser, like this: public AutoDetectParser(Detector

Re: Error thrown with TikaConfig() constructor

2010-09-11 Thread Ken Krugler
On Fri, Sep 10, 2010 at 10:31 PM, Nick Burch nick.bu...@alfresco.com wrote: Quite a lot of OfficeParser does depend on poifs code though, as well as a few bits that depend on some of the less common POI text extractors. It looks like a number of our other new parsers also have direct

Re: Error thrown with TikaConfig() constructor

2010-09-10 Thread Nick Burch
On Thu, 9 Sep 2010, Ken Krugler wrote: I'm wondering how best to handle this type of configuration, in a way that's relatively resilient to Tika configuration changes and my target set of formats. Would it not make more sense to use the xml based TikaConfig constructor (file, inputstream

Re: Error thrown with TikaConfig() constructor

2010-09-10 Thread Ken Krugler
Hi Jukka, On Sep 10, 2010, at 5:35am, Jukka Zitting wrote: Hi, On Fri, Sep 10, 2010 at 5:22 AM, Ken Krugler kkrugler_li...@transpac.com wrote: With 0.8-SNAPSHOT, the TikaConfig(Classpath) constructor now finds and instantiates all Parser-based classes found on the classpath. Which, as

Re: Error thrown with TikaConfig() constructor

2010-09-10 Thread Nick Burch
On Fri, 10 Sep 2010, Ken Krugler wrote: The issue is that the definitions of the types that are supported come from POI: Collections.unmodifiableSet(new HashSetMediaType(Arrays.asList( POIFSDocumentType.WORKBOOK.type, POIFSDocumentType.OLE10_NATIVE.type,

Error thrown with TikaConfig() constructor

2010-09-09 Thread Ken Krugler
Hi all, In the past, we'd build our Hadoop job jars using a dependency on Tika- parsers but excluding the supporting jars for types that we know we don't need to process (e.g. Microsoft docs, PDFs, etc). This dramatically reduces the size of the resulting Hadoop job jar. With