Hi,
On Fri, Sep 10, 2010 at 10:31 PM, Nick Burch wrote:
> Quite a lot of OfficeParser does depend on poifs code though, as well as a
> few bits that depend on some of the less common POI text extractors.
It looks like a number of our other new parsers also have direct
dependencies to external li
On Fri, 10 Sep 2010, Ken Krugler wrote:
The issue is that the definitions of the types that are supported come from
POI:
Collections.unmodifiableSet(new HashSet(Arrays.asList(
POIFSDocumentType.WORKBOOK.type,
POIFSDocumentType.OLE10_NATIVE.type,
POIFSDocumentType is actu
Hi Jukka,
On Sep 10, 2010, at 5:35am, Jukka Zitting wrote:
Hi,
On Fri, Sep 10, 2010 at 5:22 AM, Ken Krugler
wrote:
With 0.8-SNAPSHOT, the TikaConfig(Classpath) constructor now finds
and
instantiates all Parser-based classes found on the classpath.
Which, as
expected, triggers a storm of
On Fri, 10 Sep 2010, Jukka Zitting wrote:
Nice, good point! Even better, I'd simplify this to:
// Ensure reliable mark support for type detection before parsing
stream = TikaInputStream.get(stream);
That would mean that BufferedInputStreams will end up double wrapped
though? I'm tempted
[
https://issues.apache.org/jira/browse/TIKA-509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908155#action_12908155
]
Jukka Zitting commented on TIKA-509:
In revision 995946 I started drafting a possible sol
Hi,
On Fri, Sep 10, 2010 at 7:19 PM, wrote:
> - // We need (reliable!) mark support for type detection before parsing
> - stream = new BufferedInputStream(stream);
> + if(stream instanceof TikaInputStream || stream instanceof
> BufferedInputStream) {
> + // Input
[
https://issues.apache.org/jira/browse/TIKA-509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908103#action_12908103
]
Nick Burch commented on TIKA-509:
-
Support is now in place for .doc, .docx, .xls and .xlsx
T
The Buildbot has detected a restored build of tika-trunk on ASF Buildbot.
Full details are available at:
http://ci.apache.org/builders/tika-trunk/builds/127
Buildbot URL: http://ci.apache.org/
Buildslave for this Build: isis_ubuntu
Build Reason:
Build Source Stamp: [branch tika/trunk] 995883
B
The Buildbot has detected a new failure of tika-trunk on ASF Buildbot.
Full details are available at:
http://ci.apache.org/builders/tika-trunk/builds/125
Buildbot URL: http://ci.apache.org/
Buildslave for this Build: isis_ubuntu
Build Reason:
Build Source Stamp: [branch tika/trunk] 995876
Blam
Hi,
On Fri, Sep 10, 2010 at 5:22 AM, Ken Krugler
wrote:
> With 0.8-SNAPSHOT, the TikaConfig(Classpath) constructor now finds and
> instantiates all Parser-based classes found on the classpath. Which, as
> expected, triggers a storm of Exceptions and Errors.
Which errors are you seeing? In TIKA-3
+1 to Nick's suggestion.
On Fri, Sep 10, 2010 at 12:35 PM, Nick Burch wrote:
> On Thu, 9 Sep 2010, Ken Krugler wrote:
>
>> I'm wondering how best to handle this type of configuration, in a way
>> that's relatively resilient to Tika configuration changes and my target set
>> of formats.
>>
>
> Wou
On Thu, 9 Sep 2010, Ken Krugler wrote:
I'm wondering how best to handle this type of configuration, in a way
that's relatively resilient to Tika configuration changes and my target
set of formats.
Would it not make more sense to use the xml based TikaConfig constructor
(file, inputstream etc)
12 matches
Mail list logo