Re: [VOTE] Release Apache POI 5.2.5 (RC1)

2023-11-20 Thread Tim Allison
+1 Confirmed digests, built locally and integrated into a local build of Apache Tika's main branch. Ran regression tests earlier and found improvements on items identified in 5.2.4. Thank you, PJ, Dominik and team! On Sun, Nov 19, 2023 at 3:30 PM Dominik Stadler wrote: > Hi, > > Verified

Re: [DISCUSS] POI 5.2.5 release

2023-11-17 Thread Tim Allison
yahoo.com> wrote: > > > > > > The build is not stable at the moment. Looks like there are some build > fixes needed before we can get an RC ready. > > > > > > > On Thursday 16 November 2023 at 22:41:28 GMT+1, Tim Allison < > talli...@apache.org> wrote:

Re: [DISCUSS] POI 5.2.5 release

2023-11-16 Thread Tim Allison
pdate POI to use the new XMLBeans version. > > I think we can then create an RC1 for POI 5.2.5. I can do this. Maybe > tomorrow. > > According to Tim Allison, Apache Tika are waiting for this release [1]. > > The changes are listed here [2]. > > [1] https://lists.apache.org/thr

Re: [DISCUSS] XMLBeans 5.2.0 release

2023-11-16 Thread Tim Allison
Thank you, PJ, for running the XMLBeans 5.2.0 release! We are holding the Tika 3.0.0-BETA release for POI 5.2.5. I agree there's not a major rush, but it would be great to get that out. Let me know if/when I should run our regression tests with 5.2.5. Thank you, again! Best, Tim On

Re: [VOTE] Apache XmlBeans 5.2.0 release (RC1)

2023-11-15 Thread Tim Allison
+1 Thank you, PJ! I verified the checksums. I did get two rat failures that don't concern me (user error?) when I ran `gradle build test`: ...xmlbeans-5.2.0/javadocs/package-list ...xmlbeans-5.2.0/javadocs/script.js On Wed, Nov 8, 2023 at 4:48 PM Dominik Stadler wrote: > Hi, > > did a

Re: POI 5.2.5 release

2023-10-16 Thread Tim Allison
This just bit us on Tika: https://bz.apache.org/bugzilla/show_bug.cgi?id=67767 The fix is easy. I can patch it today. It would be great to get it into 5.2.5. I'm sorry that I didn't catch it during the earlier regression tests...my fault. On Sun, Oct 15, 2023 at 4:34 PM Dominik Stadler wrote:

Re: [VOTE] Release Apache POI 5.2.4 (RC1)

2023-09-22 Thread Tim Allison
+1 Reports are here: There's surprisingly little difference: https://corpora.tika.apache.org/base/reports/poi-reports.tgz I only had time to glance briefly. Thank you PJ and team! On Fri, Sep 22, 2023 at 4:09 AM PJ Fanning wrote: > Thanks Alex. The pdfbox issue is tracked at >

DirectoryNode's getEntry() and IllegalArgumentException

2023-09-22 Thread Tim Allison
All, First, I'm not proposing any changes for 5.2.4 (many thanks PJ for running the release!). In looking at DirectoryNode's getEntry, I see this: @Override public Entry getEntry(final String name) throws FileNotFoundException { Entry rval = null; if (name != null) { rval =

Re: poi 5.2.4 release

2023-09-21 Thread Tim Allison
Sounds great. I’ll try to make a run against our corpus as well. Thank you! On Thu, Sep 21, 2023 at 2:58 AM Dominik Stadler wrote: > Hi, > > Yes, I agree, a release soon would be good to get the many many > improvements out to users. > > P.J., could you run the process once more and maybe

Fwd: [jira] [Created] (TIKA-4015) Extract symbols as symbols from .docx

2023-04-12 Thread Tim Allison
there?)? And of course, in the email below the characters have been modified back to the underlying text, but they should be "alpha" "beta" "chi", etc... see the screenshot on the issue Thank you! Best, Tim -- Forwarded message - From: Tim All

Re: POI PMC roll call

2023-03-03 Thread Tim Allison
Similar to Nick and others. I have time to pay attention, but not as much as I'd like to contribute. Always hopeful... So, y, I'm still interested. Thank you for calling roll and all of your work on POI and beyond! On Fri, Mar 3, 2023 at 12:06 PM Nick Burch wrote: > > On Fri, 3 Mar 2023, PJ

docx attachment names only appear in EMF?!

2023-02-06 Thread Tim Allison
Fellow Devs, I recently came across this issue: https://issues.apache.org/jira/browse/TIKA-3968. Has anyone else seen this? Am I missing an easy way to associate embedded file names with the actual embedded file? I'm sure there's a reason to do this, but it feels to me like docx is giving

Re: [VOTE] Apache POI 5.2.3 release (RC1)

2022-09-15 Thread Tim Allison
+1 There's one new pptx exception, and a small number of fixed emf/wmf exceptions. Reports are here: https://corpora.tika.apache.org/base/reports/tika-2.5.0-poi-reports.tgz Let me know if you have any questions! Cheers, Tim On Fri, Sep 9, 2022 at 4:51 PM PJ Fanning wrote: > > Hi

Re: poi 5.2.3 release

2022-09-08 Thread Tim Allison
+1 I'll have time next week to run against our regression corpus, too. If there's interest. On Wed, Sep 7, 2022 at 4:35 PM PJ Fanning wrote: > Hi everyone, > > Is it time for new POI release? It's about 6 months since the last one and > the change list is fairly big -

Re: [VOTE] Apache POI 5.2.1 release (RC1)

2022-03-01 Thread Tim Allison
+1 I didn't have time to run any regression tests, but Tika builds with these artifacts. Thank you, PJ and team! On Sat, Feb 26, 2022 at 5:15 PM Andreas Beeker wrote: > > Hi, > > thank you for preparing the release, PJ! > > I've done some rudimentary checks - here is my +1. > > Andi > > On

Re: POI 5.1.0 RC2?

2021-10-19 Thread Tim Allison
Apologies for being absent... The xsb issue is why we haven't upgraded to 5.x on Tika yet. I _think_ we'd like to avoid the ooxml-full jar, but if that's the most robust option, we'll have to go with that. I'm also happy to grab new files, or run against our corpus if that'd be of any use. Many

Re: Tika, POI and PDFBOX used in Pandora Papers

2021-10-12 Thread Tim Allison
Autocorrect!!! Tika On Tue, Oct 12, 2021 at 4:42 PM Tim Allison wrote: > > https://www.wired.co.uk/article/pandora-papers-leak > > Repo: > https://github.com/ICIJ/datashare/ - To unsubscribe, e-mai

Tim’s, POI and PDFBOX used in Pandora Papers

2021-10-12 Thread Tim Allison
https://www.wired.co.uk/article/pandora-papers-leak Repo: https://github.com/ICIJ/datashare/

Re: Building with Java 11?

2021-05-11 Thread Tim Allison
the difference in build-setup is when they are created > differently. > > Thanks... Dominik. > > On Fri, May 7, 2021 at 6:13 PM Tim Allison wrote: >> >> Hi All, >>I recently tried to build with Java 11 because of [1], I found that >> the build was modifying modu

Building with Java 11?

2021-05-07 Thread Tim Allison
Hi All, I recently tried to build with Java 11 because of [1], I found that the build was modifying module-info.java and module-info.class. Is this expected? Is the combination of the Java issue and this item a sign I should put down the keyboard for the weekend a bit early? Cheers,

Re: missing oleobjectelement.xsb in ooxml-lite?

2021-03-23 Thread Tim Allison
All seems to work if I uncomment this line in build.xml: Any objections? On Tue, Mar 23, 2021 at 10:24 AM Tim Allison wrote: > > Going back to Andi's point [1]...trying this now. > > [1] > https://lists.apache.org/x/thread.html/ra9ff58e6af046a51ba459915fe536a2ea1fe71e85

Re: missing oleobjectelement.xsb in ooxml-lite?

2021-03-23 Thread Tim Allison
Going back to Andi's point [1]...trying this now. [1] https://lists.apache.org/x/thread.html/ra9ff58e6af046a51ba459915fe536a2ea1fe71e85329abc4e513711e@%3Cuser.poi.apache.org%3E On Tue, Mar 23, 2021 at 10:17 AM Tim Allison wrote: > > All, > Over on Tika [1], I'm getting an

missing oleobjectelement.xsb in ooxml-lite?

2021-03-23 Thread Tim Allison
All, Over on Tika [1], I'm getting an exception that oleobjectelement.xsb can't be found. When I look in the ooxml-lite.jar, I see there's an oleobjelement.xsb, but no oleobjectelement.xsb. I tried adding the triggering document (EmbeddedDocument.docx) to a poi unit test[2] and rebuilding

Re: build?

2021-02-23 Thread Tim Allison
org/apache/poi/xddf/usermodel/XDDFSolidFillProperties.java:38: error: recursive constructor invocation public XDDFSolidFillProperties(XDDFColor color) { ^ On Tue, Feb 23, 2021 at 7:53 AM Tim Allison wrote: > > ant test seems to be working (waiting for completion, but it at leas

Re: build?

2021-02-23 Thread Tim Allison
23, 2021 at 7:43 AM Tim Allison wrote: > > All, > Many apologies...it has been too long since I've worked with our > codebase. I recently did a fresh pull and can't get a clean > build...ant compile works, but I get a failure with ant test. See link > below for system, versi

build?

2021-02-23 Thread Tim Allison
All, Many apologies...it has been too long since I've worked with our codebase. I recently did a fresh pull and can't get a clean build...ant compile works, but I get a failure with ant test. See link below for system, versions and stacktrace [1]. User error? Thank you!

Fwd: [OT] Looking for Apache POI help

2020-10-20 Thread Tim Allison
-- Forwarded message - From: Sergey Beryozkin Date: Tue, Oct 20, 2020 at 7:54 AM Subject: [OT] Looking for Apache POI help To: Hi All, sorry for this off-topic post, it is a little bit relevant to Tika dev, but only a little bit :-), We are having some good interest in making

Re: XLSX wrapped in an OLE2 CompObj/Package - should WorkbookFactory handle it?

2020-10-13 Thread Tim Allison
Does this meet the needs? https://github.com/apache/tika/blob/main/tika-parser-modules/tika-parser-microsoft-module/src/test/resources/test-documents/testPPT_oleWorkbook.ppt On Sun, Oct 11, 2020 at 5:09 PM Andreas Beeker wrote: > Hi Nick, > > > Should we have WorkbookFactory spot this case,

Re: dependency on ooxml-schemas?

2020-08-14 Thread Tim Allison
wishes, > Andi > > > [1] > https://builds.apache.org/view/P/view/POI/job/POI-XMLBeans-DSL-1.8/lastSuccessfulBuild/artifact/build/ > > On 13.08.20 20:06, Tim Allison wrote: > > All, > > > > I've been away from POI for a bit, and Andi has done some amazing work.

dependency on ooxml-schemas?

2020-08-13 Thread Tim Allison
All, I've been away from POI for a bit, and Andi has done some amazing work. THANK YOU! The build works as it should on the commandline, but what's the recommendation for adding ooxml-schemas as a dependency in the IDE? Should I run a full build and then create my own

Re: Next version? - was Re: Missing commons-compress jar in dist

2020-06-23 Thread Tim Allison
his too. > I guess this will take another few weeks to be completed. > > Best wishes, > Andi > > > On 22.06.20 22:28, Tim Allison wrote: > > All, > >From a Tika perspective, I'm happy with 5.0.0 as well...any idea when > > the next release will be? Last relea

Re: Next version? - was Re: Missing commons-compress jar in dist

2020-06-22 Thread Tim Allison
All, From a Tika perspective, I'm happy with 5.0.0 as well...any idea when the next release will be? Last release was in February. Now that we have the regression testing vm back up and running, I can kick off tests whenever... Thank you! Cheers, Tim On

new mailing list for corpora vm

2020-06-05 Thread Tim Allison
All, If you have an interest in guiding the ongoing development of the regression corpus vm, please join the new mailing list: corpora-...@tika.apache.org via the usual means: corpora-dev-subscr...@tika.apache.org Unless there are objections, we can continue to use the regular Tika JIRA to

Fwd: New mailing list queued for creation: corpora-...@tika.apache.org

2020-06-04 Thread Tim Allison
Should have cc'd you all...this should be up and running in the next 24 hours. Please subscribe if you'd like to discuss/collaborate on the vm and regression corpora. -- Forwarded message - From: Tim Allison Date: Thu, Jun 4, 2020 at 8:56 AM Subject: Fwd: New mailing list

Vm slack channel

2020-02-29 Thread Tim Allison
All, I started #tika-vm on the ASF’s Slack for informal discussion/coordination of the regression corpus and vm. Cheers, Tim

Re: [COMPRESS and Tika/PDFBox/POI] files from bug trackers

2020-02-27 Thread Tim Allison
gt; On Fri, Feb 14, 2020 at 10:48 PM Tim Allison wrote: > >> All, >> >> I recently downloaded attachments from the following bug trackers: >> COMPRESS, TIKA, PDFBox, POI, Open Office, Libre Office and ghostscript: >> http://162.242.228.174/docs/bugtrackers/ >&

[COMPRESS and Tika/PDFBox/POI] files from bug trackers

2020-02-14 Thread Tim Allison
All, I recently downloaded attachments from the following bug trackers: COMPRESS, TIKA, PDFBox, POI, Open Office, Libre Office and ghostscript: http://162.242.228.174/docs/bugtrackers/ I then unpackaged/uncompressed all of the package/compressed files so: COMPRESS-115-1.zip is the second

Re: [VOTE] Apache POI 4.1.2 release (RC3)

2020-02-11 Thread Tim Allison
+1 Thank you, Andi (and team)! http://162.242.228.174/reports/reports_poi_4.1.2-rc3.tgz On Mon, Feb 10, 2020 at 3:38 PM Andreas Beeker wrote: > Hi *, > > I've prepared artifacts for the release of Apache POI 4.1.2 (RC3). > > The most notable changes in this release are: > > - XDDF - some work

Re: [VOTE] Apache POI 4.1.2 release (RC2)

2020-02-10 Thread Tim Allison
Sorry for the late reply. See Bug 64130 for a regression in parsing old excel spreadsheets that have worksheets without names. There were about 550 new exceptions caused by this in our regression corpus. On Sat, Feb 8, 2020 at 5:30 PM Tim Allison wrote: > I’m afk, but it looked l

Re: [VOTE] Apache POI 4.1.2 release (RC2)

2020-02-08 Thread Tim Allison
afk. On Sat, Feb 8, 2020 at 1:21 PM Andreas Beeker wrote: > Hi *, > > just to be sure ... I'm waiting for Tims second +1 or should I release the > artifacts? > I.e. as far as I understand the reports we only have marginal differences. > > Andi > > On 07.02.20 13:05, Ti

Re: [DISCUSS] Re: [VOTE] Apache POI 4.1.2 release (RC2)

2020-02-07 Thread Tim Allison
nsion PixelAspectRatio": "1.0", "Dimension VerticalPhysicalPixelSpacing": "0.26462027", "X-Parsed-By": [ "org.apache.tika.parser.CompositeParser", "org.apache.tika.parser.DefaultParser", "org.apach

Re: [DISCUSS] Re: [VOTE] Apache POI 4.1.2 release (RC2)

2020-02-07 Thread Tim Allison
on different runs. The key for me is the rollup by parse time suggests _overall_ for ppt, the time is nearly identical. > On 07.02.20 13:05, Tim Allison wrote: > > Hi All,, > > I haven't had the chance to look, but will do so later today:: > > http://16

Re: [DISCUSS] Re: [VOTE] Apache POI 4.1.2 release (RC2)

2020-02-07 Thread Tim Allison
ave ASF infrastructure > provision > > a VM to be managed by POI PMC. > > > > Regards, > > Dave > > > > Sent from my iPhone > > > > > On Feb 5, 2020, at 3:38 PM, Andreas Beeker > wrote: > > > > > > Hi Tim, > > >

Re: [VOTE] Apache POI 4.1.2 release (RC2)

2020-02-07 Thread Tim Allison
Hi All,, I haven't had the chance to look, but will do so later today:: http://162.242.228.174/reports/poi_4.1.2_reports.tgz On Wed, Feb 5, 2020 at 7:47 PM Tim Allison wrote: > Might be faster than I thought...results tomorrow...perhaps. > > On Wed, Feb 5, 2020 at 5:51 PM Tim Allis

Re: [VOTE] Apache POI 4.1.2 release (RC2)

2020-02-05 Thread Tim Allison
Might be faster than I thought...results tomorrow...perhaps. On Wed, Feb 5, 2020 at 5:51 PM Tim Allison wrote: > I did not. I can kick it off now, but with travel and other stuff, > wouldn't have results until Monday. Happy to do so if desired. > > On Wed, Feb 5, 2020 at 12:3

Re: [VOTE] Apache POI 4.1.2 release (RC2)

2020-02-05 Thread Tim Allison
unavailable. > > Andi > > On 05.02.20 01:05, Tim Allison wrote: > > +1 > > > > built without surprises, digests check out and Tika builds. Thank you, > > Andi and team! > > > > On Tue, Feb 4, 2020 at 2:20 PM Andreas Beeker > wrote: >

Re: [VOTE] Apache POI 4.1.2 release (RC2)

2020-02-04 Thread Tim Allison
+1 built without surprises, digests check out and Tika builds. Thank you, Andi and team! On Tue, Feb 4, 2020 at 2:20 PM Andreas Beeker wrote: > +1 ... the NOTICE file was still on 2019, but I don't think this matters. > Apart of it, my sample application works. > > On 03.02.20 22:55, PJ

Re: next release?

2020-01-23 Thread Tim Allison
en ... > > Andi > > > On 23.01.20 15:41, Tim Allison wrote: > > Hi All, > > We're getting pinged over on Tika for when the next release of POI will > > be available. Any plans? > > > > https://issues.apache.org/jira/browse/TIKA-3017 > > > > Thank you! > > > > >

next release?

2020-01-23 Thread Tim Allison
Hi All, We're getting pinged over on Tika for when the next release of POI will be available. Any plans? https://issues.apache.org/jira/browse/TIKA-3017 Thank you!

Re: [ANNOUNCE] Apache POI 4.1.1 released

2019-10-21 Thread Tim Allison
All, Thank you for this release! I'm sorry that I was mostly AWOL. Andi, Thank you for running this release! Cheers, Tim On Sun, Oct 20, 2019 at 3:52 PM Andreas Beeker wrote: > The Apache POI project is pleased to announce the release of POI 4.1.1. > Featured are a

Re: POI 4.1.1

2019-10-07 Thread Tim Allison
, Tim On Sat, Oct 5, 2019 at 8:38 AM Tim Allison wrote: > > Andi, > I’m sorry for my delay. I’ve booked a chunk of time on Monday to look at > this...data is prepped...just need to run latest code and compare. I don’t > want to hold up the release tho...please move forward

Re: POI 4.1.1

2019-10-05 Thread Tim Allison
wrote: > Hi Tim, > > On 20.09.19 13:55, Tim Allison wrote: > > I think I remember a regression in emf/wmf...could be spurious or my > fault > > at the Tika level. > > I've just checked my mails for the original emf/wmf issue, which you've > (partly) fixed via #63327.

Next release?

2019-07-24 Thread Tim Allison
Hi All, Do we have any sense of when the next release will be? IIRC I have a bit of work to do w emf[1], what else do we want to include? Thank you! Cheers, Tim [1] I have a vague memory of slight regressions in text extraction, but I have to test w latest.

[COMPRESS] zip-based entry names/metadata data set available

2019-04-22 Thread Tim Allison
All, For some recent work on Apache Tika, I used commons-compress to extract entry names and metadata via a streaming read from roughly 500k zip-based files we have in Tika's regression corpus. I was happy to see we have some POI-generated files in there. :) I noticed some areas for

regression results

2019-04-10 Thread Tim Allison
All, Again, my apologies for being late, but the results might still be useful for work towards 4.1.1. http://162.242.228.174/reports/poi-4.1.0-reports.zip Some tentative observations: 1) there was the new and non-replicable set of problems with the XSSFBParser. 2) The emf/wmf regressions

Re: [VOTE] Apache POI 4.1.0 release (RC3)

2019-04-08 Thread Tim Allison
Hi Andi, Y, to be clear, I really like what you’ve done and it is all a bunch cleaner than my earlier stuff; I wasn’t at all questioning the design. The question was more to back compat. There was quite a bit of red when I made the upgrade and before I modernized our code on Tika. As long as

Re: [VOTE] Apache POI 4.1.0 release (RC3)

2019-04-08 Thread Tim Allison
On Mon, Apr 8, 2019 at 4:55 PM Andreas Beeker wrote: > Hi Tim, > > I've made that changes on purpose, as I wanted to make the EMF API similar > to the WMF one. > > > oap.hemf.extractor.HemfExtractor -> oap.hemf.usermodel.HemfPicture > All (?) our user models are called by their content and being

Re: [VOTE] Apache POI 4.1.0 release (RC3)

2019-04-08 Thread Tim Allison
ial 4.0.2 to 4.1.0, but that's not an area of code > I'm familiar with. > > On Mon, Apr 8, 2019 at 6:07 AM Tim Allison wrote: > > > Are we ok with the backward incompatibilities in EMF...These are just > > a few. I realize these classes are @Interna

Re: [VOTE] Apache POI 4.1.0 release (RC3)

2019-04-08 Thread Tim Allison
Are we ok with the backward incompatibilities in EMF...These are just a few. I realize these classes are @Internal, and the updates look great. HwmfRecord.getRecordType() -> getWmfRecordType() oap.hemf.record.AbstractHemfComment -> oap.hemf.record.hemf.Comment oap.hemf.record.HemfRecord ->

Re: [VOTE] Apache POI 4.1.0 release (RC3)

2019-04-06 Thread Tim Allison
Sorry for being late to the game. I won’t have time to run regression tests until Monday or so... thank you Dominik and Greg! On Sat, Apr 6, 2019 at 4:27 AM Dominik Stadler wrote: > Hi Greg, > > thanks for running the release and removing all the obstacles on the way, > always good if as many

Re: Event Based APIs for parsing docx,doc,pptx,ppt files

2019-02-15 Thread Tim Allison
I've added SAX parsers for pptx and docx over on Apache Tika. These rely on POI for OPCPackage, a bunch of other classes and overall design. I've thought about moving that code into POI, but I haven't found the time or need, and the code is my typical kludgy-mess...and I don't want to pollute

Re: [VOTE] Apache POI 4.0.1 release (RC2)

2018-11-27 Thread Tim Allison
+1 Reports are available here: http://162.242.228.174/reports/reports_poi_4_0_1-rc2.tgz Thank you, Andi! On Mon, Nov 26, 2018 at 6:01 PM Andreas Beeker wrote: > > Hi, > > I've prepared artifacts for the release of Apache POI 4.0.1 (RC2). > > The most notable changes in this release are: > > -

Re: [VOTE] Apache POI 4.0.1 release (RC1)

2018-11-23 Thread Tim Allison
Sorry, now that I've figured out what the problem was, I'm -1. Y, let's respin. On Thu, Nov 22, 2018 at 4:34 PM Andreas Beeker wrote: > > Hi Tim, > > On 21.11.18 19:26, Tim Allison wrote: > > This looks like a regression. > > > Please make your mind up, if this is a

Re: [VOTE] Apache POI 4.0.1 release (RC1)

2018-11-21 Thread Tim Allison
a regression. On Wed, Nov 21, 2018 at 12:56 PM Tim Allison wrote: > > >These were in the header...I have to step away from the keyboard for > now...any ideas? > > I confirmed this by flipping btwn 4.0.0 and 4.0.1 in our dependencies > and using our Tika's SNAPSHOT for both

Re: [VOTE] Apache POI 4.0.1 release (RC1)

2018-11-21 Thread Tim Allison
>These were in the header...I have to step away from the keyboard for now...any ideas? I confirmed this by flipping btwn 4.0.0 and 4.0.1 in our dependencies and using our Tika's SNAPSHOT for both. This is not caused by a different version of Tika. On Wed, Nov 21, 2018 at 12:53 PM Tim Alli

Re: [VOTE] Apache POI 4.0.1 release (RC1)

2018-11-21 Thread Tim Allison
| 03: 1 | 06: 1 | 1: 1 | 16: 1 | 2009: 1 | 3: 1 | conciencia: 1 | despertar: 1 These were in the header...I have to step away from the keyboard for now...any ideas? On Wed, Nov 21, 2018 at 12:37 PM Tim Allison wrote: > > Reports are available here: > http://162.242.228.174/reports/reports_

Re: [VOTE] Apache POI 4.0.1 release (RC1)

2018-11-21 Thread Tim Allison
Reports are available here: http://162.242.228.174/reports/reports_poi_4_0_1-rc1.tgz We have a bunch less content in ppt, but I _think_ this is because at the Tika level we used to duplicate notes content, and we've fixed that bug. So, I think this is an improvement, but I need to check. On Wed,

Re: [VOTE] Apache POI 4.0.1 release (RC1)

2018-11-20 Thread Tim Allison
release process was too smooth. > Only my local version of the commons-openpgp needed to be used. [1] > > Andi > > [1] https://issues.apache.org/jira/browse/SANDBOX-508 > > > On 20.11.18 22:33, Tim Allison wrote: > > Andi, > >Thank you! I've built this locally

Re: [VOTE] Apache POI 4.0.1 release (RC1)

2018-11-20 Thread Tim Allison
Andi, Thank you! I've built this locally and integrated it into Tika, and I've kicked off the regression tests. The one small glitch I noticed so far is that poi-ooxml-schemas jar has an extra ".jar" in it: build/dist/maven/poi-ooxml-schemas/poi-ooxml-schemas-4.0.1.jar.jar I'll let you all

Dave's POI talk at COSCON

2018-11-06 Thread Tim Allison
W00t!!! Here's Dave's talk on POI at COSCON in Shenzhen, China on October 20, 2018: https://www.youtube.com/watch?v=N7_Y3zNb_-w - To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org For additional commands, e-mail:

Re: Build failed in Jenkins: POI-DSL-OpenJDK #545

2018-11-02 Thread Tim Allison
Autoboxing?! On Fri, Nov 2, 2018 at 7:27 AM Tim Allison wrote: > > Colleagues, any idea what might be going on? How can -1 != -1?! > > Error: Test with 2/3: Should not find 3 but found it at -1 in 0 1 2 > at org.apache.poi.hwpf.usermodel.TestBug47563.test(TestBug47563.java:80)

Re: Build failed in Jenkins: POI-DSL-OpenJDK #545

2018-11-02 Thread Tim Allison
Colleagues, any idea what might be going on? How can -1 != -1?! Error: Test with 2/3: Should not find 3 but found it at -1 in 0 1 2 at org.apache.poi.hwpf.usermodel.TestBug47563.test(TestBug47563.java:80) assertTrue("Test with " + rows + "/" + columns + ": Should not find " + i + " but found it

Re: POI 4.0.1 release

2018-10-30 Thread Tim Allison
+1 "end of this week" that'll work well for my issues, too. I want to confirm I didn't break anything in my recent commits via large scale regression testing. On Tue, Oct 30, 2018 at 8:31 AM Yegor Kozlov wrote: > > +1 > > Bug 62836 is pending. I'm going to check in the code anyway, just waiting

Re: Apache POI

2018-10-10 Thread Tim Allison
Dejan, Thank you for letting us know about this problem. I was able to reproduce it, and I've opened a ticket: https://bz.apache.org/bugzilla/show_bug.cgi?id=62815 On Wed, Sep 12, 2018 at 5:58 AM dejan ikodinovic wrote: > > Hi guys, > > I m working on parsing Excel xlsb files using Apache POI

Re: EMF corpus

2018-10-09 Thread Tim Allison
Turns out that's a subset. It looks like there should be ~200k emfs. I'll try to dig up the extraction code and re-run. On Tue, Oct 9, 2018 at 8:55 AM Tim Allison wrote: > > Y. Turns out I extracted a bunch a while ago. See the 'emfs' > directory in this tar.bz2 file: > http://162

Re: EMF corpus

2018-10-09 Thread Tim Allison
Y. Turns out I extracted a bunch a while ago. See the 'emfs' directory in this tar.bz2 file: http://162.242.228.174/embedded_files/xmfs.tar.bz2 Let me know if you have any questions and/or if I can make that any more useful for you. Cheers, Tim On Mon, Oct 8, 2018 at 7:37 AM Tim

Re: EMF corpus

2018-10-08 Thread Tim Allison
At some point I extracted all emfs from our corpus. I’ll see if that data is still around and/or re-extract...prob have time tomorrow/ Wednesday On Sun, Oct 7, 2018 at 5:01 PM Dominik Stadler wrote: > Hi Andi > > It is easy to change CommonCrawlDocumentDownload to fetch other mime-types, > see

updating data on the regression corpus

2018-10-05 Thread Tim Allison
All, I opened https://issues.apache.org/jira/browse/TIKA-2750 to track updating data on the regression corpus. Please track/join the conversation there if you'd like to participate. Cheers, Tim

Re: Welcome to the regression vm!

2018-10-05 Thread Tim Allison
Tobias, I just gave you access to the vm and sent login stuff to you personally. I have to update some groups and permissions, but I'll let you know when that is ready. Let me know if you have problems getting on. Best, Tim > 1. Is it OK that 100% CPU is used

Welcome to the regression vm!

2018-09-28 Thread Tim Allison
Tobias, I'm sorry for my delay. We welcome you to use our regression vm hosted by Rackspace for fuzzing work to identify vulnerabilities. Our one request: we ask that you pause/stop your processes when we need to run regression tests before a release. Email me privately with your desired

Re: Worth doing a 4.0.1 release soon?

2018-09-24 Thread Tim Allison
All, I broke our mp3 parser w changes in Tika 1.19. We're about to roll a 1.19.1. Is there anything catastrophic in 4.0.0 that would lead us to wait for 4.0.1? I noticed the 62692 (wildfly xml parser)...is there anything else? Thank you! Cheers, Tim On Wed, Sep 19, 2018 at

Re: Speaking on POI at China Open Source Conference in October

2018-09-22 Thread Tim Allison
Let me know if these are of any use... https://github.com/centic9/CommonCrawlDocumentDownload http://openpreservation.org/blog/2016/10/04/apache-tikas-regression-corpus-tika-1302/ https://events.static.linuxfound.org/sites/events/files/slides/ApacheConMiami2017_tallison_v2.pdf

Re: Apache POI

2018-09-15 Thread Tim Allison
Can you open an issue on out bugzilla and post a test file w a unit test? Thank you for sharing this w us! On Wed, Sep 12, 2018 at 5:58 AM dejan ikodinovic wrote: > Hi guys, > > I m working on parsing Excel xlsb files using Apache POI 3.17 version and > have problem for some numbers. > The

Re: Speaking on POI at China Open Source Conference in October

2018-09-15 Thread Tim Allison
Looks great! If at all possible, I’d appreciate a bullet or two on Dominik’s and my large scale regression tests... More input on test files for the corpus would be useful. Complete understand if this is off topic. Thank you! On Fri, Sep 14, 2018 at 5:27 PM Dave Fisher wrote: > Hi Team, > >

Re: [VOTE] Apache POI 4.0.0 release (RC1)

2018-09-05 Thread Tim Allison
+1 Reports are here: http://162.242.228.174/reports/poi-4.0.0-reports-e.tgz These reports compare 3.17 with 4.0.0-RC1. There are numerous fixed exceptions. The new exceptions appear to be caused by better exception reporting for truncated files. Two small issues that I'm ok with for now: 1)

Re: [VOTE] Apache POI 4.0.0 release (RC1)

2018-09-04 Thread Tim Allison
Sorry for my delay. I'm kicking off our regression tests now. On Sat, Sep 1, 2018 at 11:46 AM Dominik Stadler wrote: > > Hi, > > Content of release-archives look good compared to 3.17. > > Only found a very minor glitch: osgi/build.xml and sonar/**/pom.xml still > contain "4.0.0-SNAPSHOT", but I

Re: Remove OPOIFSFileSystem for 4.0.0?

2018-08-27 Thread Tim Allison
+1. Thank you, Andi! On Mon, Aug 27, 2018 at 5:52 AM Alain FAGOT BÉAREZ wrote: > > +1 for full refactoring to POIFS* > > ⁣Gesendet mit BlueMail > > > Originale Nachricht > Von: Andreas Beeker > Gesendet: Sun Aug 26 19:06:02 GMT-03:00 2018 > An: dev@poi.apache.org > Betreff:

Re: Prepare POI 4.0.0 RC 1

2018-08-17 Thread Tim Allison
Despite that gaffe -- thank you, again, Andi -- I compared the output after some recent modifications, and there are no differences: http://162.242.228.174/reports/poi-4.0.0-reports-d.tgz On Fri, Aug 17, 2018 at 11:22 AM Tim Allison wrote: > > Ugh, and thank you! > On Fri, Aug 17, 201

Re: Prepare POI 4.0.0 RC 1

2018-08-17 Thread Tim Allison
Ugh, and thank you! On Fri, Aug 17, 2018 at 2:43 AM kiwiwings wrote: > > Hi, > > I've checked with kwright about #62564 / TIKA-2693 and he signaled green > light for the release. > I'll prepare a RC tomorrow evening - including the fixed commons-compress > links ;) > > Andi > > > > -- > Sent

Re: upgrading to 4.0.0

2018-08-14 Thread Tim Allison
else find any blockers or other issues? On Fri, Aug 10, 2018 at 2:29 PM Tim Allison wrote: > > Updated reports are here: > http://162.242.228.174/reports/poi-4.0.0-reports-b.tgz > > Will turn to them shortly... > On Thu, Aug 9, 2018 at 11:54 AM Tim Allison wrote: > >

Re: upgrading to 4.0.0

2018-08-10 Thread Tim Allison
Updated reports are here: http://162.242.228.174/reports/poi-4.0.0-reports-b.tgz Will turn to them shortly... On Thu, Aug 9, 2018 at 11:54 AM Tim Allison wrote: > > All, > I fixed the three areas for improvement that I found in my first run > of regression tests. I'm going to kick

Re: upgrading to 4.0.0

2018-08-09 Thread Tim Allison
in the regression tests? Thank you! Cheers, Tim On Wed, Aug 8, 2018 at 10:24 AM Tim Allison wrote: > > The reports from 3.17 vs 4.0.0-SNAPSHOT are here: > http://162.242.228.174/reports/poi-4.0.0_reports.tar.gz > > Aside from the two issues I've already identified

Re: Jenkins build is back to normal : POI-DSL-1.8 #490

2018-08-09 Thread Tim Allison
PJ, Thank you for fixing this! I should have realized that I still need to run `ant test-integration`. Tim On Thu, Aug 9, 2018 at 7:36 AM Apache Jenkins Server wrote: > > See > > > >

Re: upgrading to 4.0.0

2018-08-08 Thread Tim Allison
-ooxml than .docx or pptx. This may be a Tika-level issue, but I want to look into that. If anyone notices anything else, please let me know! On Wed, Aug 8, 2018 at 7:25 AM Tim Allison wrote: > > Hi Andi, > I think I'm mostly good. If you could take a look at: > https://bz.apache.

Re: [Bug 62591] Trivial regression in trunk in isPlaceHolder in newer sl

2018-08-08 Thread Tim Allison
Will attach it once bugzilla is back up. Meanwhile, you can grab it here: https://github.com/apache/tika/blob/master/tika-parsers/src/test/resources/test-documents/testPPT_masterText.ppt On Tue, Aug 7, 2018 at 5:07 PM wrote: > > https://bz.apache.org/bugzilla/show_bug.cgi?id=62591 > > Andreas

Re: upgrading to 4.0.0

2018-08-08 Thread Tim Allison
Hi Tim, > > On 7/31/18 9:49 PM, Tim Allison wrote: > > I'm trying to upgrade Tika to 4.0.0-SNAPSHOT. > > > > 2) To confirm OLEShape has become HSLFObjectShape? > > You are correct. > >

Re: upgrading to 4.0.0

2018-07-31 Thread Tim Allison
Doh. Please disregard 1). On Tue, Jul 31, 2018 at 3:49 PM Tim Allison wrote: > > All, > I'm trying to upgrade Tika to 4.0.0-SNAPSHOT. This is an amazing > amount of work you all did. Thank you! > Some questions: > > 1) What should I use instead of NPOIFSFile

upgrading to 4.0.0

2018-07-31 Thread Tim Allison
All, I'm trying to upgrade Tika to 4.0.0-SNAPSHOT. This is an amazing amount of work you all did. Thank you! Some questions: 1) What should I use instead of NPOIFSFileSystem.hasPOIFSHeader(is)? Should I try to create a HeaderBlock and catch exceptions (anything to differentiate not a header

Re: 2016 Excel Tree Map Charts excel file Not able to Read Using POI

2018-06-03 Thread Tim Allison
Any chance you can open an issue on our bug tracker and share a test file? Thank you! https://bz.apache.org/bugzilla/describecomponents.cgi?product=POI On Fri, Jun 1, 2018 at 6:05 AM arepalli.ram1...@gmail.com < arepalli.ram1...@gmail.com> wrote: > run: > org.apache.poi.POIXMLException:

SAXParser lock contention

2018-05-17 Thread Tim Allison
Useful observation from Sebastian Nagel and recommendation from Jukka over on the tika-user list: https://lists.apache.org/thread.html/54913f29cd83bba175f77a2d6a4902bb3a5cba2fa495bbfd6012024a@%3Cuser.tika.apache.org%3E Looks like we might want to add a SAXParser pool at some point.

  1   2   >