[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-04-19 Thread Andreas Beeker (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325386#comment-17325386 ] Andreas Beeker commented on TIKA-3164: -- Added more .xsbs and classes - the old

[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-03-23 Thread Andreas Beeker (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307472#comment-17307472 ] Andreas Beeker commented on TIKA-3164: -- I'm now recursing through .xlsx and

[jira] [Closed] (TIKA-1707) Upgrade to Apache POI 3.13 Beta 2

2020-08-17 Thread Andreas Beeker (Jira)
[ https://issues.apache.org/jira/browse/TIKA-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Beeker closed TIKA-1707. close forgotten ticket > Upgrade to Apache POI 3.13 Bet

Re: Preferred logging implementation

2019-01-07 Thread Andreas Beeker
Would POI actually ship with an implementation (e.g. log4j2) or only with slf4j? I guess, in the src/bin bundles we need to provide everything you would need to get the library working - if this is only the log4j2-api or slf4j, then we probably don't provide the implementation part. But providi

Preferred logging implementation

2019-01-07 Thread Andreas Beeker
Hi *, we currently have a discussion on our logging implementation in POI [1]. Do you have any preference? ... also besides log4j2 / slf4j, but with integrating as a library in mind. Sorry, if this kind of discussion is bothersome to you, as there's often a bridge x-to-y when it comes to loggi

[jira] [Commented] (TIKA-2765) Regression extracting text from corrupted docx files

2018-12-18 Thread Andreas Beeker (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16724571#comment-16724571 ] Andreas Beeker commented on TIKA-2765: -- For the truncated/EOF case, I've fi

[jira] [Commented] (TIKA-2789) Apache tika - java.lang.NoClassDefFoundError

2018-12-12 Thread Andreas Beeker (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16719038#comment-16719038 ] Andreas Beeker commented on TIKA-2789: -- Please share the name/s of the .

[jira] [Commented] (TIKA-2693) Tika 1.17 uses the wrong classloader for reflection

2018-08-08 Thread Andreas Beeker (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16573816#comment-16573816 ] Andreas Beeker commented on TIKA-2693: -- Karl or his user needs to use the snaps

[jira] [Commented] (TIKA-2693) Tika 1.17 uses the wrong classloader for reflection

2018-08-08 Thread Andreas Beeker (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16573783#comment-16573783 ] Andreas Beeker commented on TIKA-2693: -- I've (fixed/) modified the cl

[jira] [Commented] (TIKA-2693) Tika 1.17 uses the wrong classloader for reflection

2018-07-26 Thread Andreas Beeker (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558986#comment-16558986 ] Andreas Beeker commented on TIKA-2693: -- The NoClassDefFoundError happens, becau

[jira] [Comment Edited] (TIKA-2666) Document last printed in the year 27321

2018-06-17 Thread Andreas Beeker (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16515140#comment-16515140 ] Andreas Beeker edited comment on TIKA-2666 at 6/17/18 5:0

[jira] [Commented] (TIKA-2666) Document last printed in the year 27321

2018-06-17 Thread Andreas Beeker (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16515140#comment-16515140 ] Andreas Beeker commented on TIKA-2666: -- I've fixed the unsigned handling in

[jira] [Commented] (TIKA-2666) Document last printed in the year 27321

2018-06-14 Thread Andreas Beeker (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16512988#comment-16512988 ] Andreas Beeker commented on TIKA-2666: -- After I'll have finished [#60

[jira] [Commented] (TIKA-2523) Regression in ppt parsing -- "typeface can't be null or empty"

2017-12-09 Thread Andreas Beeker (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16284710#comment-16284710 ] Andreas Beeker commented on TIKA-2523: -- So [#61881|https://bz.apache.org/bugz

POI 4.0 and Java 8

2017-08-22 Thread Andreas Beeker
Hi Tika devs, at POI, we are about to have a major change in the upcoming version after POI 3.17 is out, which will be probably in the next week or so. Up till now we had a discussion [1], about the next Java SE, i.e. Java 7 or 8, and it looks like Java 8 is the preferred version - with the rest

[jira] [Commented] (TIKA-2164) HSLFException from ZipException "invalid stored block lengths" on a valid Powerpoint file

2016-11-05 Thread Andreas Beeker (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15640898#comment-15640898 ] Andreas Beeker commented on TIKA-2164: -- I've provided a POI patch under [1

[jira] [Commented] (TIKA-2142) ArrayIndexOutOfBoundsException

2016-10-25 Thread Andreas Beeker (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15605421#comment-15605421 ] Andreas Beeker commented on TIKA-2142: -- I'll have a look onto it ... if this

[jira] [Commented] (TIKA-1799) Upgrade to POI 3.14-Beta1 when available

2016-01-19 Thread Andreas Beeker (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107690#comment-15107690 ] Andreas Beeker commented on TIKA-1799: -- I have no idea how osgi bundling works,

[jira] [Comment Edited] (TIKA-1833) NoClassDefFoundError for POIXMLTypeLoader

2016-01-17 Thread Andreas Beeker (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15103861#comment-15103861 ] Andreas Beeker edited comment on TIKA-1833 at 1/17/16 6:5

[jira] [Commented] (TIKA-1833) NoClassDefFoundError for POIXMLTypeLoader

2016-01-17 Thread Andreas Beeker (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15103861#comment-15103861 ] Andreas Beeker commented on TIKA-1833: -- POIXMLTypeLoader comes with POI 3.14-B

WMF extraction

2016-01-14 Thread Andreas Beeker
Hi, POI will have a WMF module (org.apache.poi.hwmf.*) in the next beta. Looking over the govdocs collection, those embedded wmfs might contain interesting information for TIKA. Although my main goal is to integrate the rendering for common sl, it shouldn't be to laborious to provide something a

[jira] [Updated] (TIKA-1707) Upgrade to Apache POI 3.13 Beta 2

2015-10-21 Thread Andreas Beeker (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Beeker updated TIKA-1707: - Attachment: dont_trim_and_bullets.patch Patch for trim-replacement and bullet lists > Upgrade

[jira] [Commented] (TIKA-1707) Upgrade to Apache POI 3.13 Beta 2

2015-10-21 Thread Andreas Beeker (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14968118#comment-14968118 ] Andreas Beeker commented on TIKA-1707: -- I would replace it with the empty string

[jira] [Commented] (TIKA-1755) Make ppt and pptx paragraph/div breaks more consistent

2015-09-30 Thread Andreas Beeker (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14938860#comment-14938860 ] Andreas Beeker commented on TIKA-1755: -- I think, the goal would be, to modify co

[jira] [Commented] (TIKA-1748) Upgrade to POI 3.13-final when available

2015-09-29 Thread Andreas Beeker (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14935840#comment-14935840 ] Andreas Beeker commented on TIKA-1748: -- at HSLFExtractor: - as the extractio

[jira] [Commented] (TIKA-1748) Upgrade to POI 3.13-final when available

2015-09-29 Thread Andreas Beeker (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14935760#comment-14935760 ] Andreas Beeker commented on TIKA-1748: -- Hi Tim, I hope you used tika-1707 as a

[jira] [Commented] (TIKA-1707) Upgrade to Apache POI 3.13 Beta 2

2015-08-15 Thread Andreas Beeker (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14698406#comment-14698406 ] Andreas Beeker commented on TIKA-1707: -- The affected test cases are ok now .

[jira] [Updated] (TIKA-1707) Upgrade to Apache POI 3.13 Beta 2

2015-08-14 Thread Andreas Beeker (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Beeker updated TIKA-1707: - Attachment: common_sl.diff ok ... I've changed the import settings and executed organize im

[jira] [Updated] (TIKA-1707) Upgrade to Apache POI 3.13 Beta 2

2015-08-14 Thread Andreas Beeker (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Beeker updated TIKA-1707: - Attachment: (was: common_sl.diff) > Upgrade to Apache POI 3.13 Bet

[jira] [Updated] (TIKA-1707) Upgrade to Apache POI 3.13 Beta 2

2015-08-13 Thread Andreas Beeker (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Beeker updated TIKA-1707: - Attachment: common_sl.diff patch for POI 3.13 Beta 2 integration. (Although its origin is the

[jira] [Created] (TIKA-1707) Upgrade to Apache POI 3.13 Beta 2

2015-08-13 Thread Andreas Beeker (JIRA)
Andreas Beeker created TIKA-1707: Summary: Upgrade to Apache POI 3.13 Beta 2 Key: TIKA-1707 URL: https://issues.apache.org/jira/browse/TIKA-1707 Project: Tika Issue Type: Improvement

Re: FW: Any interest in running Apache Tika as part of CommonCrawl?

2015-04-03 Thread Andreas Beeker
Hi, similar to Dominiks approach of checking the file base for parsing errors, I'd like to scan for certain file constellations, for the typically "left over bytes" error or other record combinations which I can't reproduce with my MS/Libre office versions. I haven't thought about how it's actu

[jira] [Updated] (TIKA-1380) Upgrade to Apache POI 3.11 beta 1

2014-08-02 Thread Andreas Beeker (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Beeker updated TIKA-1380: - Attachment: tika-commentstable-missing.diff Nicks patch missed the comments table on a few lines

Tika regression test on POI 3.11 Beta 1

2014-08-01 Thread Andreas Beeker
Hi Tim, (thread base [1]) I found one regression in the handling of an xlsx file: http://digitalcorpora.org/corp/nps/files/govdocs1/598/598948.xlsx Tika 1.6 w/ POI 3.11 Beta 1 is not extracting the comments in this file, whereas Tika >1.5 (and Tika 1.6 w/ POI 3.10-Final) did extract the com