Re: Error thrown with TikaConfig() constructor

2010-09-10 Thread Jukka Zitting
Hi, On Fri, Sep 10, 2010 at 10:31 PM, Nick Burch wrote: > Quite a lot of OfficeParser does depend on poifs code though, as well as a > few bits that depend on some of the less common POI text extractors. It looks like a number of our other new parsers also have direct dependencies to external li

Re: Error thrown with TikaConfig() constructor

2010-09-10 Thread Nick Burch
On Fri, 10 Sep 2010, Ken Krugler wrote: The issue is that the definitions of the types that are supported come from POI: Collections.unmodifiableSet(new HashSet(Arrays.asList( POIFSDocumentType.WORKBOOK.type, POIFSDocumentType.OLE10_NATIVE.type, POIFSDocumentType is actu

Re: Error thrown with TikaConfig() constructor

2010-09-10 Thread Ken Krugler
Hi Jukka, On Sep 10, 2010, at 5:35am, Jukka Zitting wrote: Hi, On Fri, Sep 10, 2010 at 5:22 AM, Ken Krugler wrote: With 0.8-SNAPSHOT, the TikaConfig(Classpath) constructor now finds and instantiates all Parser-based classes found on the classpath. Which, as expected, triggers a storm of

Re: svn commit: r995880 - /tika/trunk/tika-core/src/main/java/org/apache/tika/parser/AutoDetectParse r.java

2010-09-10 Thread Nick Burch
On Fri, 10 Sep 2010, Jukka Zitting wrote: Nice, good point! Even better, I'd simplify this to: // Ensure reliable mark support for type detection before parsing stream = TikaInputStream.get(stream); That would mean that BufferedInputStreams will end up double wrapped though? I'm tempted

[jira] Commented: (TIKA-509) Container contents extraction

2010-09-10 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908155#action_12908155 ] Jukka Zitting commented on TIKA-509: In revision 995946 I started drafting a possible sol

Re: svn commit: r995880 - /tika/trunk/tika-core/src/main/java/org/apache/tika/parser/AutoDetectParser.java

2010-09-10 Thread Jukka Zitting
Hi, On Fri, Sep 10, 2010 at 7:19 PM, wrote: > -        // We need (reliable!) mark support for type detection before parsing > -        stream = new BufferedInputStream(stream); > +        if(stream instanceof TikaInputStream || stream instanceof > BufferedInputStream) { > +           // Input

[jira] Commented: (TIKA-509) Container contents extraction

2010-09-10 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908103#action_12908103 ] Nick Burch commented on TIKA-509: - Support is now in place for .doc, .docx, .xls and .xlsx T

buildbot success in ASF Buildbot on tika-trunk

2010-09-10 Thread buildbot
The Buildbot has detected a restored build of tika-trunk on ASF Buildbot. Full details are available at: http://ci.apache.org/builders/tika-trunk/builds/127 Buildbot URL: http://ci.apache.org/ Buildslave for this Build: isis_ubuntu Build Reason: Build Source Stamp: [branch tika/trunk] 995883 B

buildbot failure in ASF Buildbot on tika-trunk

2010-09-10 Thread buildbot
The Buildbot has detected a new failure of tika-trunk on ASF Buildbot. Full details are available at: http://ci.apache.org/builders/tika-trunk/builds/125 Buildbot URL: http://ci.apache.org/ Buildslave for this Build: isis_ubuntu Build Reason: Build Source Stamp: [branch tika/trunk] 995876 Blam

Re: Error thrown with TikaConfig() constructor

2010-09-10 Thread Jukka Zitting
Hi, On Fri, Sep 10, 2010 at 5:22 AM, Ken Krugler wrote: > With 0.8-SNAPSHOT, the TikaConfig(Classpath) constructor now finds and > instantiates all Parser-based classes found on the classpath. Which, as > expected, triggers a storm of Exceptions and Errors. Which errors are you seeing? In TIKA-3

Re: Error thrown with TikaConfig() constructor

2010-09-10 Thread Oleg Tikhonov
+1 to Nick's suggestion. On Fri, Sep 10, 2010 at 12:35 PM, Nick Burch wrote: > On Thu, 9 Sep 2010, Ken Krugler wrote: > >> I'm wondering how best to handle this type of configuration, in a way >> that's relatively resilient to Tika configuration changes and my target set >> of formats. >> > > Wou

Re: Error thrown with TikaConfig() constructor

2010-09-10 Thread Nick Burch
On Thu, 9 Sep 2010, Ken Krugler wrote: I'm wondering how best to handle this type of configuration, in a way that's relatively resilient to Tika configuration changes and my target set of formats. Would it not make more sense to use the xml based TikaConfig constructor (file, inputstream etc)