[
https://issues.apache.org/jira/browse/TIKA-683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095644#comment-13095644
]
Uwe Schindler commented on TIKA-683:
XML SAX Handling does not validate the element name
[
https://issues.apache.org/jira/browse/TIKA-207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Curt Arnold updated TIKA-207:
-
Attachment: TIKA-207.patch
Replaces earlier patch which could throw a NullPointerException when rendering
[
https://issues.apache.org/jira/browse/TIKA-683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated TIKA-683:
Attachment: TIKA-683.patch
Attached patch, with a first cut at using a simple (shallow) token
[
https://issues.apache.org/jira/browse/TIKA-207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Curt Arnold updated TIKA-207:
-
Attachment: TIKA-207.patch
Refined fix to suppress deleted text in .doc files. Will follow up with test
ca
[
https://issues.apache.org/jira/browse/TIKA-687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jukka Zitting resolved TIKA-687.
Resolution: Duplicate
Assignee: Jukka Zitting
Right, sorry for overlooking this issue! The prop
[
https://issues.apache.org/jira/browse/TIKA-704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jeremy Anderson updated TIKA-704:
-
Attachment: TestWithPdf.docx
TestWithOutlook.docx
recursiveUsage.txt
PDF and Outlook docs embedded in MS Word documents not parsed
-
Key: TIKA-704
URL: https://issues.apache.org/jira/browse/TIKA-704
Project: Tika
Issue Type: Bug
Components:
[
https://issues.apache.org/jira/browse/TIKA-701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095422#comment-13095422
]
Paul Jakubik commented on TIKA-701:
---
This is a very important fix. Will it be released soo
Hi,
On Thu, Sep 1, 2011 at 5:08 PM, Michael McCandless
wrote:
> We might want to mark APIs like TemporaryResources "internal" in the
> javadocs, ie, that we reseve the right to suddenly change them and
> they are just public so that the sub-packages in Tika can use them.
The trouble is that we'l
On Sep 1, 2011, at 8:08 AM, Michael McCandless wrote:
> OK thanks Jukka.
>
> We might want to mark APIs like TemporaryResources "internal" in the
> javadocs, ie, that we reseve the right to suddenly change them and
> they are just public so that the sub-packages in Tika can use them.
> In Lucene
[
https://issues.apache.org/jira/browse/TIKA-701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095371#comment-13095371
]
Jukka Zitting commented on TIKA-701:
The idea behind that logic is that if the stream we
[
https://issues.apache.org/jira/browse/TIKA-701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095363#comment-13095363
]
Michael McCandless commented on TIKA-701:
-
These changes look great!
I like that TI
[
https://issues.apache.org/jira/browse/TIKA-687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095359#comment-13095359
]
Michael McCandless commented on TIKA-687:
-
I think this may have been fixed by TIKA-
Thank you, will try it soon :)
Mark
On Thu, Sep 1, 2011 at 10:32 AM, Jukka Zitting (JIRA) wrote:
>
> [
> https://issues.apache.org/jira/browse/TIKA-701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
>
> Jukka Zitting resolved TIKA-701.
>
>
[
https://issues.apache.org/jira/browse/TIKA-701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jukka Zitting resolved TIKA-701.
Resolution: Fixed
Assignee: Jukka Zitting
Fixed in a series of recent commits.
To summarize, I
>From this comment I see that one can tell whether this MS Word has "track
changes" on, is that true? -- Thank you.
Mark
On Thu, Sep 1, 2011 at 10:24 AM, Curt Arnold (JIRA) wrote:
>
>[
> https://issues.apache.org/jira/browse/TIKA-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comm
[
https://issues.apache.org/jira/browse/TIKA-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095346#comment-13095346
]
Curt Arnold commented on TIKA-207:
--
I also ran into this problem and at least the manifesta
On Thu, Sep 1, 2011 at 7:26 AM, Jukka Zitting wrote:
> Hi,
>
> [update subject, move to dev@]
>
> On Thu, Sep 1, 2011 at 12:41 PM, Uwe Schindler wrote:
>> With our internal Lucene IOUtils it's even simplier, see javadocs :-)
>
> Yep, Lucene's version is certainly better.
>
>> It's just a few line
OK thanks Jukka.
We might want to mark APIs like TemporaryResources "internal" in the
javadocs, ie, that we reseve the right to suddenly change them and
they are just public so that the sub-packages in Tika can use them.
In Lucene we added @lucene.internal javadoc tag for this (it expands
into des
Drop deprecated methods/classes/interfaces
--
Key: TIKA-703
URL: https://issues.apache.org/jira/browse/TIKA-703
Project: Tika
Issue Type: Improvement
Reporter: Jukka Zitting
Pri
Hi,
On Thu, Sep 1, 2011 at 12:23 PM, Michael McCandless
wrote:
> Can we just remove (not deprecate) TemporaryFiles...?
> (We are not at 1.0 release yet).
Yes, I think we should do that.
I didn't want to do this in the scope of TIKA-701 so I rather left a
deprecated backwards-compatible class th
Hi,
[update subject, move to dev@]
On Thu, Sep 1, 2011 at 12:41 PM, Uwe Schindler wrote:
> With our internal Lucene IOUtils it's even simplier, see javadocs :-)
Yep, Lucene's version is certainly better.
> It's just a few lines more code.
It's still at least 7 lines of wrapper code compared t
Can we just remove (not deprecate) TemporaryFiles...? (We are not at
1.0 release yet).
Mike McCandless
http://blog.mikemccandless.com
On Thu, Sep 1, 2011 at 5:38 AM, wrote:
> Author: jukka
> Date: Thu Sep 1 09:38:04 2011
> New Revision: 1163970
>
> URL: http://svn.apache.org/viewvc?rev=11639
Cannot compile Tika with Java 7 (ImageMetadataExtractor.java)
-
Key: TIKA-702
URL: https://issues.apache.org/jira/browse/TIKA-702
Project: Tika
Issue Type: Bug
Reporter:
On Tue, Aug 30, 2011 at 5:35 PM, Jukka Zitting wrote:
> Hi,
>
> On Tue, Aug 30, 2011 at 9:07 PM, wrote:
>> + assertContains("zażółć gęślą jaźń", content);
>> + assertContains("ZAŻÓŠĆ GĘŚLĄ JAŹŃ", content);
>
> I think it would be best if we used \u escapes for
25 matches
Mail list logo