[jira] [Created] (TIKA-1387) Add forbidden-apis checker to TIKA build

2014-08-06 Thread Uwe Schindler (JIRA)
Uwe Schindler created TIKA-1387: --- Summary: Add forbidden-apis checker to TIKA build Key: TIKA-1387 URL: https://issues.apache.org/jira/browse/TIKA-1387 Project: Tika Issue Type: Improvement

[jira] [Created] (TIKA-1386) Add forbidden-apis checker to TIKA build

2014-08-06 Thread Uwe Schindler (JIRA)
Uwe Schindler created TIKA-1386: --- Summary: Add forbidden-apis checker to TIKA build Key: TIKA-1386 URL: https://issues.apache.org/jira/browse/TIKA-1386 Project: Tika Issue Type: Improvement

[jira] [Updated] (TIKA-1387) Add forbidden-apis checker to TIKA build

2014-08-06 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated TIKA-1387: Attachment: TIKA-1387.patch Patch with Maven config. > Add forbidden-apis checker to TIKA build > -

[jira] [Closed] (TIKA-1386) Add forbidden-apis checker to TIKA build

2014-08-06 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler closed TIKA-1386. --- Resolution: Duplicate JIRA hung and created the issue 2 times. > Add forbidden-apis checker to TIKA b

[jira] [Commented] (TIKA-1386) Add forbidden-apis checker to TIKA build

2014-08-06 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087476#comment-14087476 ] Uwe Schindler commented on TIKA-1386: - It would be good if somebody can delete this dup

[jira] [Updated] (TIKA-1387) Add forbidden-apis checker to TIKA build

2014-08-06 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated TIKA-1387: Attachment: TIKA-1387.patch This patch refactors the tika-java7 module a bit, so the forbidden-api c

[jira] [Commented] (TIKA-1387) Add forbidden-apis checker to TIKA build

2014-08-06 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087489#comment-14087489 ] Uwe Schindler commented on TIKA-1387: - One suggestion: The official name of the propert

[jira] [Updated] (TIKA-1387) Add forbidden-apis checker to TIKA build

2014-08-06 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated TIKA-1387: Attachment: TIKA-1387.patch Patch with renamed properties to conform to Maven standards. > Add forb

[jira] [Commented] (TIKA-1387) Add forbidden-apis checker to TIKA build

2014-08-06 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088088#comment-14088088 ] Uwe Schindler commented on TIKA-1387: - Hi I left a comment in the review. Was out for d

[jira] [Reopened] (TIKA-1387) Add forbidden-apis checker to TIKA build

2014-08-06 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler reopened TIKA-1387: - I disagree wth some fixes, because they just workaround the forbidden-checks by still using system de

[jira] [Commented] (TIKA-1387) Add forbidden-apis checker to TIKA build

2014-08-13 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095798#comment-14095798 ] Uwe Schindler commented on TIKA-1387: - I think, for "messages" written in english langu

[jira] [Commented] (TIKA-1387) Add forbidden-apis checker to TIKA build

2014-08-13 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095853#comment-14095853 ] Uwe Schindler commented on TIKA-1387: - Nick: in ImageMetadataExtractor.java, the date f

[jira] [Commented] (TIKA-1387) Add forbidden-apis checker to TIKA build

2014-10-25 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184073#comment-14184073 ] Uwe Schindler commented on TIKA-1387: - I think this is already committed an working. I

[jira] [Commented] (TIKA-1457) NullPointerException in tika-app, parsing PDF content

2014-10-28 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14186533#comment-14186533 ] Uwe Schindler commented on TIKA-1457: - Hi, the next version of Solr with TIKA 1.6 will

[jira] [Comment Edited] (TIKA-1457) NullPointerException in tika-app, parsing PDF content

2014-10-28 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14186533#comment-14186533 ] Uwe Schindler edited comment on TIKA-1457 at 10/28/14 7:49 AM: --

[jira] [Comment Edited] (TIKA-1457) NullPointerException in tika-app, parsing PDF content

2014-10-28 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14186533#comment-14186533 ] Uwe Schindler edited comment on TIKA-1457 at 10/28/14 7:50 AM: --

[jira] [Commented] (TIKA-1523) metadata extractor gets the wrong number of pages of some documents Microsoft Word 9.0

2015-01-19 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14283089#comment-14283089 ] Uwe Schindler commented on TIKA-1523: - Hi, I downloaded the given file to Windows 7. Ri

[jira] [Updated] (TIKA-1523) metadata extractor gets the wrong number of pages of some documents Microsoft Word 9.0

2015-01-19 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated TIKA-1523: Attachment: screenshot-1.png > metadata extractor gets the wrong number of pages of some documents Mi

[jira] [Commented] (TIKA-1523) metadata extractor gets the wrong number of pages of some documents Microsoft Word 9.0

2015-01-19 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14283092#comment-14283092 ] Uwe Schindler commented on TIKA-1523: - If I save the file with Office 2010, the page nu

[jira] [Commented] (TIKA-1523) metadata extractor gets the wrong number of pages of some documents Microsoft Word 9.0

2015-01-19 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14283116#comment-14283116 ] Uwe Schindler commented on TIKA-1523: - Yes. I extracts just the metadata. So I think th

[jira] [Updated] (TIKA-1523) metadata extractor gets the wrong number of pages of some documents Microsoft Word 9.0

2015-01-19 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated TIKA-1523: Attachment: screenshot-2.png > metadata extractor gets the wrong number of pages of some documents Mi

[jira] [Updated] (TIKA-1523) metadata extractor gets the wrong number of pages of some documents Microsoft Word 9.0

2015-01-19 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated TIKA-1523: Attachment: (was: screenshot-2.png) > metadata extractor gets the wrong number of pages of some d

[jira] [Updated] (TIKA-1523) metadata extractor gets the wrong number of pages of some documents Microsoft Word 9.0

2015-01-19 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated TIKA-1523: Attachment: screenshot-2.png > metadata extractor gets the wrong number of pages of some documents Mi

[jira] [Comment Edited] (TIKA-1523) metadata extractor gets the wrong number of pages of some documents Microsoft Word 9.0

2015-01-19 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14283116#comment-14283116 ] Uwe Schindler edited comment on TIKA-1523 at 1/19/15 10:50 PM: --

[jira] [Commented] (TIKA-1523) metadata extractor gets the wrong number of pages of some documents Microsoft Word 9.0

2015-01-19 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14283148#comment-14283148 ] Uwe Schindler commented on TIKA-1523: - Hi, I did some recherche: This is a bug in Word

[jira] [Comment Edited] (TIKA-1523) metadata extractor gets the wrong number of pages of some documents Microsoft Word 9.0

2015-01-19 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14283148#comment-14283148 ] Uwe Schindler edited comment on TIKA-1523 at 1/19/15 11:16 PM: --

[jira] [Commented] (TIKA-1435) Update rome dependency to 1.5

2015-01-20 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14283723#comment-14283723 ] Uwe Schindler commented on TIKA-1435: - Indeed this confused me while doing the Apache S

[jira] [Commented] (TIKA-1526) ExternalParser should trap/ignore/workarround JDK-8047340 & JDK-8055301 so Turkish Tika users can still use non-external parsers

2015-01-22 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287820#comment-14287820 ] Uwe Schindler commented on TIKA-1526: - FYI: The underlying bug in the JVM will never be

[jira] [Comment Edited] (TIKA-1526) ExternalParser should trap/ignore/workarround JDK-8047340 & JDK-8055301 so Turkish Tika users can still use non-external parsers

2015-01-22 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287824#comment-14287824 ] Uwe Schindler edited comment on TIKA-1526 at 1/22/15 5:36 PM: --

[jira] [Commented] (TIKA-1526) ExternalParser should trap/ignore/workarround JDK-8047340 & JDK-8055301 so Turkish Tika users can still use non-external parsers

2015-01-22 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287824#comment-14287824 ] Uwe Schindler commented on TIKA-1526: - Tim: Linux does not use posis spawn. You ned Mac

[jira] [Commented] (TIKA-1526) ExternalParser should trap/ignore/workarround JDK-8047340 & JDK-8055301 so Turkish Tika users can still use non-external parsers

2015-01-22 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287850#comment-14287850 ] Uwe Schindler commented on TIKA-1526: - There is also a second problem: The bug is in th

[jira] [Commented] (TIKA-1526) ExternalParser should trap/ignore/workarround JDK-8047340 & JDK-8055301 so Turkish Tika users can still use non-external parsers

2015-01-22 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287944#comment-14287944 ] Uwe Schindler commented on TIKA-1526: - Commit looks fine! I can try later with MacOSX.

[jira] [Commented] (TIKA-1526) ExternalParser should trap/ignore/workarround JDK-8047340 & JDK-8055301 so Turkish Tika users can still use non-external parsers

2015-01-22 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288444#comment-14288444 ] Uwe Schindler commented on TIKA-1526: - Hi Tylor: The problem is explained above. To rep

[jira] [Comment Edited] (TIKA-1526) ExternalParser should trap/ignore/workarround JDK-8047340 & JDK-8055301 so Turkish Tika users can still use non-external parsers

2015-01-22 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288444#comment-14288444 ] Uwe Schindler edited comment on TIKA-1526 at 1/22/15 11:29 PM: --

[jira] [Commented] (TIKA-1526) ExternalParser should trap/ignore/workarround JDK-8047340 & JDK-8055301 so Turkish Tika users can still use non-external parsers

2015-01-23 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288963#comment-14288963 ] Uwe Schindler commented on TIKA-1526: - I tried it with maven, but this is all too funny

[jira] [Commented] (TIKA-1526) ExternalParser should trap/ignore/workarround JDK-8047340 & JDK-8055301 so Turkish Tika users can still use non-external parsers

2015-01-23 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14289125#comment-14289125 ] Uwe Schindler commented on TIKA-1526: - [~grossws]: This bug is not in Maven itsself, th

[jira] [Commented] (TIKA-1526) ExternalParser should trap/ignore/workarround JDK-8047340 & JDK-8055301 so Turkish Tika users can still use non-external parsers

2015-01-23 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14289182#comment-14289182 ] Uwe Schindler commented on TIKA-1526: - To work around this bug you can in fact do this.

[jira] [Comment Edited] (TIKA-1526) ExternalParser should trap/ignore/workarround JDK-8047340 & JDK-8055301 so Turkish Tika users can still use non-external parsers

2015-01-23 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14289182#comment-14289182 ] Uwe Schindler edited comment on TIKA-1526 at 1/23/15 12:32 PM: --

[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on

2015-01-23 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14289438#comment-14289438 ] Uwe Schindler commented on TIKA-1529: - If you just check for ASCII chars in some string

[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on

2015-01-23 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290093#comment-14290093 ] Uwe Schindler commented on TIKA-1529: - Hi, I can confirm, the patch works here. Eclipse

[jira] [Commented] (TIKA-1555) posix_spawn is not a supported process launch mechanism on this platform

2015-02-20 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329272#comment-14329272 ] Uwe Schindler commented on TIKA-1555: - This is a duplicate of TIKA-1526. > posix_spawn

[jira] [Commented] (TIKA-1555) posix_spawn is not a supported process launch mechanism on this platform

2015-02-20 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329276#comment-14329276 ] Uwe Schindler commented on TIKA-1555: - Also, this issue in the JDK is already fixed in

[jira] [Commented] (TIKA-1555) posix_spawn is not a supported process launch mechanism on this platform

2015-02-20 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329282#comment-14329282 ] Uwe Schindler commented on TIKA-1555: - @UweSays: https://twitter.com/UweSays/status/501

[jira] [Commented] (TIKA-1555) posix_spawn is not a supported process launch mechanism on this platform

2015-02-20 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329344#comment-14329344 ] Uwe Schindler commented on TIKA-1555: - Hi David, can you try to compile Tika from curre

[jira] [Commented] (TIKA-1526) ExternalParser should trap/ignore/workarround JDK-8047340 & JDK-8055301 so Turkish Tika users can still use non-external parsers

2015-02-20 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329350#comment-14329350 ] Uwe Schindler commented on TIKA-1526: - I was not able to test this, because I have no M

[jira] [Commented] (TIKA-1555) posix_spawn is not a supported process launch mechanism on this platform

2015-02-20 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329364#comment-14329364 ] Uwe Schindler commented on TIKA-1555: - bq. BTW I wonder if we could add a setting which

[jira] [Commented] (TIKA-1555) posix_spawn is not a supported process launch mechanism on this platform

2015-02-20 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329474#comment-14329474 ] Uwe Schindler commented on TIKA-1555: - bq. You can also disable OCR by setting the Tess

[jira] [Commented] (TIKA-1557) Create TesseractOCR Option to Never Run

2015-02-20 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329523#comment-14329523 ] Uwe Schindler commented on TIKA-1557: - I would not make this a special option only for

[jira] [Comment Edited] (TIKA-1557) Create TesseractOCR Option to Never Run

2015-02-20 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329523#comment-14329523 ] Uwe Schindler edited comment on TIKA-1557 at 2/20/15 8:42 PM: --

[jira] [Comment Edited] (TIKA-1557) Create TesseractOCR Option to Never Run

2015-02-20 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329523#comment-14329523 ] Uwe Schindler edited comment on TIKA-1557 at 2/20/15 9:05 PM: --

[jira] [Commented] (TIKA-1558) Create a Parser Blacklist

2015-02-23 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14333400#comment-14333400 ] Uwe Schindler commented on TIKA-1558: - Hi, Lucene uses SPI for its index codecs, so we

[jira] [Comment Edited] (TIKA-1558) Create a Parser Blacklist

2015-02-23 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14333400#comment-14333400 ] Uwe Schindler edited comment on TIKA-1558 at 2/23/15 4:06 PM: --

[jira] [Commented] (TIKA-1526) ExternalParser should trap/ignore/workarround JDK-8047340 & JDK-8055301 so Turkish Tika users can still use non-external parsers

2015-02-23 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14333628#comment-14333628 ] Uwe Schindler commented on TIKA-1526: - Thanks David! > ExternalParser should trap/igno

[jira] [Commented] (TIKA-1511) Create a parser for SQLite3

2015-03-29 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385803#comment-14385803 ] Uwe Schindler commented on TIKA-1511: - Solr uses ANT + IVY to build. We don't use trans

[jira] [Commented] (TIKA-1582) Mime Detection based on neural networks with Byte-frequency-histogram

2015-05-02 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14525112#comment-14525112 ] Uwe Schindler commented on TIKA-1582: - Hi Chris, there is already forbidden-apis 1.8 av

[jira] [Comment Edited] (TIKA-1582) Mime Detection based on neural networks with Byte-frequency-histogram

2015-05-02 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14525112#comment-14525112 ] Uwe Schindler edited comment on TIKA-1582 at 5/2/15 7:35 AM: - H

[jira] [Commented] (TIKA-1628) ExternalParser.check should return false if it hits SecurityException

2015-05-12 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539838#comment-14539838 ] Uwe Schindler commented on TIKA-1628: - +1 to the patch. I don't think we need a test!

[jira] [Commented] (TIKA-1637) Oracle internal API jdeps request for information

2015-05-25 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14558087#comment-14558087 ] Uwe Schindler commented on TIKA-1637: - Hi Dave, forbidden-apis already forbids use of

[jira] [Comment Edited] (TIKA-1637) Oracle internal API jdeps request for information

2015-05-25 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14558087#comment-14558087 ] Uwe Schindler edited comment on TIKA-1637 at 5/25/15 10:18 AM: --

[jira] [Commented] (TIKA-1675) please avoid xmlbeans dependency

2015-07-07 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14617578#comment-14617578 ] Uwe Schindler commented on TIKA-1675: - There was already an issue/discussion open on PO

[jira] [Comment Edited] (TIKA-1675) please avoid xmlbeans dependency

2015-07-07 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14617578#comment-14617578 ] Uwe Schindler edited comment on TIKA-1675 at 7/7/15 10:53 PM: --

[jira] [Commented] (TIKA-1675) please avoid xmlbeans dependency

2015-07-07 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14617588#comment-14617588 ] Uwe Schindler commented on TIKA-1675: - kiwiwings already proposed this for POI: [http

[jira] [Commented] (TIKA-623) Add support for Outlook PST

2011-03-29 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012746#comment-13012746 ] Uwe Schindler commented on TIKA-623: >From looking at the code of this library, it looks

[jira] [Updated] (TIKA-651) Unescaped attribute value generated

2011-05-01 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated TIKA-651: --- Attachment: XHTMLSerializer.java Yes, as per SAX spec, the characters() event gets unescaped text (also

[jira] [Commented] (TIKA-651) Unescaped attribute value generated

2011-05-01 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027443#comment-13027443 ] Uwe Schindler commented on TIKA-651: One addition: This content handler can also be perf

[jira] [Commented] (TIKA-651) Unescaped attribute value generated

2011-05-01 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027482#comment-13027482 ] Uwe Schindler commented on TIKA-651: This is why I added the one I use quite everywhere

[jira] [Commented] (TIKA-651) Unescaped attribute value generated

2011-05-03 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028208#comment-13028208 ] Uwe Schindler commented on TIKA-651: I have no preference. I just provide the code here

[jira] [Commented] (TIKA-692) TikaCLI -x or -h on a Word doc sometimes adds newline after tag

2011-08-20 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088213#comment-13088213 ] Uwe Schindler commented on TIKA-692: If this SAXTransformerFactory should disable the IN

[jira] [Commented] (TIKA-692) TikaCLI -x or -h on a Word doc sometimes adds newline after tag

2011-08-20 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088216#comment-13088216 ] Uwe Schindler commented on TIKA-692: Jukka, thats the correct way. But in general there

[jira] [Commented] (TIKA-692) TikaCLI -x or -h on a Word doc sometimes adds newline after tag

2011-08-20 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088222#comment-13088222 ] Uwe Schindler commented on TIKA-692: I just point again to this one: TIKA-651 We have a

[jira] [Commented] (TIKA-651) Unescaped attribute value generated

2011-08-20 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088246#comment-13088246 ] Uwe Schindler commented on TIKA-651: ...or XALAN :-) It's hard to implement without tha

[jira] [Commented] (TIKA-683) RTF Parser issues with non european characters

2011-09-01 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095644#comment-13095644 ] Uwe Schindler commented on TIKA-683: XML SAX Handling does not validate the element name

[jira] [Commented] (TIKA-722) Arabic PDF doesn't extract correctly

2011-09-19 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107988#comment-13107988 ] Uwe Schindler commented on TIKA-722: I dont think there is much we can do. Some PDF file

[jira] [Updated] (TIKA-722) Arabic PDF doesn't extract correctly

2011-09-19 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated TIKA-722: --- Attachment: metadata.png I checked this file: Thats exactly this type of file I am talking about, here

[jira] [Updated] (TIKA-722) Arabic PDF doesn't extract correctly

2011-09-19 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated TIKA-722: --- Attachment: JUFO96.PDF Here is a non-persian example (which is actually a very-very early writeup from

[jira] [Commented] (TIKA-730) WriteOutContentHandler concatenates title tag and body text.

2011-09-25 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114295#comment-13114295 ] Uwe Schindler commented on TIKA-730: The content handler should be used after BodyConten

[jira] [Commented] (TIKA-888) NetCDF parser uses Java 6 JAR file and test/compilation fails with Java 1.5, although TIKA is Java 1.5

2012-04-28 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264369#comment-13264369 ] Uwe Schindler commented on TIKA-888: That was exactly my problem (it fails once it start

[jira] [Commented] (TIKA-888) NetCDF parser uses Java 6 JAR file and test/compilation fails with Java 1.5, although TIKA is Java 1.5

2012-04-28 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264374#comment-13264374 ] Uwe Schindler commented on TIKA-888: I am trying: with latest svn I get: {noformat} [d

[jira] [Issue Comment Edited] (TIKA-888) NetCDF parser uses Java 6 JAR file and test/compilation fails with Java 1.5, although TIKA is Java 1.5

2012-04-28 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264374#comment-13264374 ] Uwe Schindler edited comment on TIKA-888 at 4/28/12 6:12 PM: - I

[jira] [Commented] (TIKA-888) NetCDF parser uses Java 6 JAR file and test/compilation fails with Java 1.5, although TIKA is Java 1.5

2012-04-28 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264387#comment-13264387 ] Uwe Schindler commented on TIKA-888: Not all of them, I still get millions of String.isE

[jira] [Commented] (TIKA-888) NetCDF parser uses Java 6 JAR file and test/compilation fails with Java 1.5, although TIKA is Java 1.5

2012-04-29 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264604#comment-13264604 ] Uwe Schindler commented on TIKA-888: Passes now :-) > NetCDF parser use

[jira] [Commented] (TIKA-860) Make ZIP bomb detection configureable

2012-06-30 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404538#comment-13404538 ] Uwe Schindler commented on TIKA-860: OK! > Make ZIP bomb detection conf

[jira] [Commented] (TIKA-1053) Upgrade Tika Parsers to use ASM 4.x

2013-01-07 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13545906#comment-13545906 ] Uwe Schindler commented on TIKA-1053: - Apache Solr also disabled the CLASS file parser

[jira] [Comment Edited] (TIKA-1053) Upgrade Tika Parsers to use ASM 4.x

2013-01-07 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13545906#comment-13545906 ] Uwe Schindler edited comment on TIKA-1053 at 1/7/13 2:19 PM: - A

[jira] [Commented] (TIKA-1053) Upgrade Tika Parsers to use ASM 4.x

2013-02-07 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13573537#comment-13573537 ] Uwe Schindler commented on TIKA-1053: - +1 One note from my experience with ASM: asm-de

[jira] [Commented] (TIKA-1080) Arabic characters under windows

2013-02-07 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13573587#comment-13573587 ] Uwe Schindler commented on TIKA-1080: - In any case, I don't think the TIKA server shoul

[jira] [Commented] (TIKA-1074) Extraction should continue if an exception is hit visiting an embedded document

2013-02-20 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582408#comment-13582408 ] Uwe Schindler commented on TIKA-1074: - Hi, catching Throwable is bad practice, it shoul

[jira] [Commented] (TIKA-1145) classloaders issue loading resources when extending Tika

2013-07-03 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13699793#comment-13699793 ] Uwe Schindler commented on TIKA-1145: - I think the main problem is ServiceLoader's defi

[jira] [Commented] (TIKA-1145) classloaders issue loading resources when extending Tika

2013-07-04 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13699883#comment-13699883 ] Uwe Schindler commented on TIKA-1145: - OK, I misunderstood the original problem. If you

[jira] [Commented] (TIKA-1145) classloaders issue loading resources when extending Tika

2013-07-04 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13699888#comment-13699888 ] Uwe Schindler commented on TIKA-1145: - It is still strange that you see this behaviour:

[jira] [Commented] (TIKA-1134) ContentHandler gets ignorable whitespace for tags when parsing HTML

2013-08-08 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13733344#comment-13733344 ] Uwe Schindler commented on TIKA-1134: - Hi Hoss, the "rule" in TIKA is: - TIKA inserts i

[jira] [Commented] (TIKA-1134) ContentHandler gets ignorable whitespace for tags when parsing HTML

2013-08-08 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13733348#comment-13733348 ] Uwe Schindler commented on TIKA-1134: - I think this issue is "Won't fix". The issues de

[jira] [Commented] (TIKA-1134) ContentHandler gets ignorable whitespace for tags when parsing HTML

2013-08-09 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13734769#comment-13734769 ] Uwe Schindler commented on TIKA-1134: - Hoss: I agree to fix this in the documentation.

[jira] [Commented] (TIKA-1181) RTFParser not keeping HTML font colors and underscore tags.

2013-10-07 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13788171#comment-13788171 ] Uwe Schindler commented on TIKA-1181: - Other parsers like OpenOffice do not preserve co

[jira] [Created] (TIKA-1211) OpenDocument (ODF) parser produces multipe startDocument() events

2013-12-17 Thread Uwe Schindler (JIRA)
Uwe Schindler created TIKA-1211: --- Summary: OpenDocument (ODF) parser produces multipe startDocument() events Key: TIKA-1211 URL: https://issues.apache.org/jira/browse/TIKA-1211 Project: Tika I

[jira] [Updated] (TIKA-1211) OpenDocument (ODF) parser produces multiple startDocument() events

2013-12-17 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated TIKA-1211: Summary: OpenDocument (ODF) parser produces multiple startDocument() events (was: OpenDocument (ODF

[jira] [Commented] (TIKA-1211) OpenDocument (ODF) parser produces multiple startDocument() events

2013-12-17 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850416#comment-13850416 ] Uwe Schindler commented on TIKA-1211: - There are multiple ways to fix this: - Make XHTM

[jira] [Commented] (TIKA-1240) IncompatibleClassChangeError with -> new Tika().parseToString(stream);

2014-02-13 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13900650#comment-13900650 ] Uwe Schindler commented on TIKA-1240: - Hi Sudheshna, this is a well-known issue caused

[jira] [Commented] (TIKA-1252) Tika is not indexing all authors of a PDF

2014-03-03 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918634#comment-13918634 ] Uwe Schindler commented on TIKA-1252: - This could be a problem in Solr's DataImportHand

[jira] [Commented] (TIKA-1252) Tika is not indexing all authors of a PDF

2014-03-03 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918643#comment-13918643 ] Uwe Schindler commented on TIKA-1252: - I did a quick check in [https://svn.apache.org/

  1   2   >