[jira] [Created] (TIKA-1731) Try to integrate java-hwp into Tika

2015-09-08 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1731: - Summary: Try to integrate java-hwp into Tika Key: TIKA-1731 URL: https://issues.apache.org/jira/browse/TIKA-1731 Project: Tika Issue Type: New Feature

[jira] [Commented] (TIKA-1728) Detection is not working properly for detecting HWP 5.0 file

2015-09-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734808#comment-14734808 ] Tim Allison commented on TIKA-1728: --- Opened separate ticket for potential integration: TI

[jira] [Comment Edited] (TIKA-1731) Try to integrate java-hwp into Tika

2015-09-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734812#comment-14734812 ] Tim Allison edited comment on TIKA-1731 at 9/8/15 1:46 PM: --- One o

[jira] [Commented] (TIKA-1731) Try to integrate java-hwp into Tika

2015-09-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734812#comment-14734812 ] Tim Allison commented on TIKA-1731: --- One other library [h2tlib|https://sites.google.com/s

[jira] [Commented] (TIKA-1726) Augment public methods that use a java.io.File with methods that use a java.nio.file.Path

2015-09-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734821#comment-14734821 ] Tim Allison commented on TIKA-1726: --- My preference would be for {{getPath()}} and {{creat

[jira] [Commented] (TIKA-1726) Augment public methods that use a java.io.File with methods that use a java.nio.file.Path

2015-09-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734823#comment-14734823 ] Tim Allison commented on TIKA-1726: --- I'll take a look at tika-batch and see what we can m

[jira] [Commented] (TIKA-1731) Try to integrate java-hwp into Tika

2015-09-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734855#comment-14734855 ] Tim Allison commented on TIKA-1731: --- Opened https://github.com/ddoleye/java-hwp/issues/2

[jira] [Updated] (TIKA-1731) Try to integrate java-hwp into Tika

2015-09-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1731: -- Description: Now that we have detection working for hwp files, it would be great to add a parser. [java

[jira] [Commented] (TIKA-1513) Add mime detection and parsing for dbf files

2015-09-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734927#comment-14734927 ] Tim Allison commented on TIKA-1513: --- Hi [~iryndin], I wanted to check in to see if you've

[jira] [Commented] (TIKA-1731) Try to integrate java-hwp into Tika

2015-09-09 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736663#comment-14736663 ] Tim Allison commented on TIKA-1731: --- Thank you for the feedback! Are there other options

[jira] [Comment Edited] (TIKA-1731) Try to integrate java-hwp into Tika

2015-09-09 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736663#comment-14736663 ] Tim Allison edited comment on TIKA-1731 at 9/9/15 11:08 AM: Tha

[jira] [Commented] (TIKA-1731) Try to integrate java-hwp into Tika

2015-09-09 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736676#comment-14736676 ] Tim Allison commented on TIKA-1731: --- [~mungeol], on another note...did hwp ever go the oo

[jira] [Commented] (TIKA-1732) TikaException "Failed to close temporary resources" with AutoDetectParser on Windows

2015-09-09 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736959#comment-14736959 ] Tim Allison commented on TIKA-1732: --- Odd...What happens if you call TikaInputStream.get()

[jira] [Comment Edited] (TIKA-1732) TikaException "Failed to close temporary resources" with AutoDetectParser on Windows

2015-09-09 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736959#comment-14736959 ] Tim Allison edited comment on TIKA-1732 at 9/9/15 2:56 PM: --- Odd..

[jira] [Commented] (TIKA-1732) TikaException "Failed to close temporary resources" with AutoDetectParser on Windows

2015-09-09 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737046#comment-14737046 ] Tim Allison commented on TIKA-1732: --- Any chance there's an old version of POI on your cla

[jira] [Commented] (TIKA-1732) TikaException "Failed to close temporary resources" with AutoDetectParser on Windows

2015-09-09 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737440#comment-14737440 ] Tim Allison commented on TIKA-1732: --- NP. Thank you for closing the loop! Your test doc

[jira] [Commented] (TIKA-1731) Try to integrate java-hwp into Tika

2015-09-10 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738638#comment-14738638 ] Tim Allison commented on TIKA-1731: --- Thank you for looking into this. bq. can Tika+POI

[jira] [Commented] (TIKA-1731) Try to integrate java-hwp into Tika

2015-09-10 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738660#comment-14738660 ] Tim Allison commented on TIKA-1731: --- Great. Thank you so much! It would be helpful to k

[jira] [Comment Edited] (TIKA-1731) Try to integrate java-hwp into Tika

2015-09-10 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738660#comment-14738660 ] Tim Allison edited comment on TIKA-1731 at 9/10/15 12:26 PM: - G

[jira] [Commented] (TIKA-1731) Try to integrate java-hwp into Tika

2015-09-10 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738663#comment-14738663 ] Tim Allison commented on TIKA-1731: --- [~mungeol], out of curiosity, what is your gut feeli

[jira] [Commented] (TIKA-1733) RuntimeException when parsing some word (.doc) documents

2015-09-10 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738809#comment-14738809 ] Tim Allison commented on TIKA-1733: --- Thank you for submitting a document that triggers th

[jira] [Commented] (TIKA-1733) RuntimeException when parsing some word (.doc) documents

2015-09-10 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739005#comment-14739005 ] Tim Allison commented on TIKA-1733: --- Can't figure out what's going wrong, I've opened: h

[jira] [Commented] (TIKA-1733) RuntimeException when parsing some word (.doc) documents

2015-09-10 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739022#comment-14739022 ] Tim Allison commented on TIKA-1733: --- And, y, in Tika 1.4 we grabbed footer text with this

[jira] [Commented] (TIKA-1731) Try to integrate java-hwp into Tika

2015-09-11 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740625#comment-14740625 ] Tim Allison commented on TIKA-1731: --- Based on only a very cursory look at the examples+sp

[jira] [Comment Edited] (TIKA-1731) Try to integrate java-hwp into Tika

2015-09-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738660#comment-14738660 ] Tim Allison edited comment on TIKA-1731 at 9/16/15 10:52 AM: - G

[jira] [Commented] (TIKA-1731) Try to integrate java-hwp into Tika

2015-09-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14747292#comment-14747292 ] Tim Allison commented on TIKA-1731: --- Please don't stop watching. We can use your help!

[jira] [Commented] (TIKA-1607) Introduce new arbitrary object key/values data structure for persistence of Tika Metadata

2015-09-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14747338#comment-14747338 ] Tim Allison commented on TIKA-1607: --- Thank you, [~rgauss], for your thoughtful responses

[jira] [Comment Edited] (TIKA-1607) Introduce new arbitrary object key/values data structure for persistence of Tika Metadata

2015-09-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14747338#comment-14747338 ] Tim Allison edited comment on TIKA-1607 at 9/16/15 11:31 AM: - T

[jira] [Updated] (TIKA-1736) Bouncy Castle version binary incompatibility

2015-09-18 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1736: -- Description: One file in our Common Crawl stash demonstrates a Bouncy Castle version conflict...incompat

[jira] [Created] (TIKA-1736) Bouncy Castle version binary incompatibility

2015-09-18 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1736: - Summary: Bouncy Castle version binary incompatibility Key: TIKA-1736 URL: https://issues.apache.org/jira/browse/TIKA-1736 Project: Tika Issue Type: Bug

[jira] [Commented] (TIKA-1737) PDFBox 1.8.10 is still a basket case

2015-09-21 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14900508#comment-14900508 ] Tim Allison commented on TIKA-1737: --- Thank you for raising this issue. I don't think we'

[jira] [Commented] (TIKA-1737) PDFBox 1.8.10 is still a basket case

2015-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14902522#comment-14902522 ] Tim Allison commented on TIKA-1737: --- Thank you, [~tilman]! > PDFBox 1.8.10 is still a ba

[jira] [Commented] (TIKA-1737) PDFBox 1.8.10 is still a basket case

2015-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14902528#comment-14902528 ] Tim Allison commented on TIKA-1737: --- bq. there were many more that just had a single lin

[jira] [Commented] (TIKA-1734) Use java.nio.file.Path in TemporaryResources

2015-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14902567#comment-14902567 ] Tim Allison commented on TIKA-1734: --- About to commit, unless you'd like to. :) > Use jav

[jira] [Commented] (TIKA-1740) RecursiveParserWrapper returning ContentHandler-s

2015-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14902591#comment-14902591 ] Tim Allison commented on TIKA-1740: --- How about we store a list of pairs instead of Metad

[jira] [Commented] (TIKA-1740) RecursiveParserWrapper returning ContentHandler-s

2015-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14902592#comment-14902592 ] Tim Allison commented on TIKA-1740: --- Oops. Nick beat me to it. That was plan B. [~gagr

[jira] [Resolved] (TIKA-1734) Use java.nio.file.Path in TemporaryResources

2015-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1734. --- Resolution: Fixed r1704620. Thank you, [~kunda]! > Use java.nio.file.Path in TemporaryResources > ---

[jira] [Commented] (TIKA-1726) Augment public methods that use a java.io.File with methods that use a java.nio.file.Path

2015-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14902613#comment-14902613 ] Tim Allison commented on TIKA-1726: --- Thank you, [~kkrugler]. [~kunda], is there enough c

[jira] [Commented] (TIKA-1737) PDFBox 1.8.10 is still a basket case

2015-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14902622#comment-14902622 ] Tim Allison commented on TIKA-1737: --- Could we have done something at the Tika level to ca

[jira] [Commented] (TIKA-1737) PDFBox 1.8.10 is still a basket case

2015-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14902659#comment-14902659 ] Tim Allison commented on TIKA-1737: --- bq. dating back as far as 1992 Y, I just confirmed

[jira] [Commented] (TIKA-1737) PDFBox 1.8.10 is still a basket case

2015-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14902835#comment-14902835 ] Tim Allison commented on TIKA-1737: --- See PDFBOX-2986 for a resource leak discovered throu

[jira] [Comment Edited] (TIKA-1737) PDFBox 1.8.10 is still a basket case

2015-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14902528#comment-14902528 ] Tim Allison edited comment on TIKA-1737 at 9/22/15 4:16 PM: bq.

[jira] [Commented] (TIKA-1742) StackOverflowError parsing a PDF with ExtractInlineImages=true

2015-09-23 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904360#comment-14904360 ] Tim Allison commented on TIKA-1742: --- The HORROR! If it were a second rate conference, it

[jira] [Commented] (TIKA-1743) NetworkParser can create Unbounded Number of Threads

2015-09-23 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904371#comment-14904371 ] Tim Allison commented on TIKA-1743: --- Oh, I wish I had time to finish off TIKA-1657 and TI

[jira] [Commented] (TIKA-1744) Use java.nio.file.Path in TikaInputStream

2015-09-23 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904388#comment-14904388 ] Tim Allison commented on TIKA-1744: --- Thank you, [~kunda]! I think this was part of the

[jira] [Comment Edited] (TIKA-1744) Use java.nio.file.Path in TikaInputStream

2015-09-23 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904388#comment-14904388 ] Tim Allison edited comment on TIKA-1744 at 9/23/15 12:06 PM: - T

[jira] [Created] (TIKA-1747) Change file->path in tika-batch throughout

2015-09-23 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1747: - Summary: Change file->path in tika-batch throughout Key: TIKA-1747 URL: https://issues.apache.org/jira/browse/TIKA-1747 Project: Tika Issue Type: Sub-task

[jira] [Created] (TIKA-1748) Upgrade to POI 3.13-final when available

2015-09-23 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1748: - Summary: Upgrade to POI 3.13-final when available Key: TIKA-1748 URL: https://issues.apache.org/jira/browse/TIKA-1748 Project: Tika Issue Type: Task Re

[jira] [Commented] (TIKA-1742) StackOverflowError parsing a PDF with ExtractInlineImages=true

2015-09-24 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14906238#comment-14906238 ] Tim Allison commented on TIKA-1742: --- [~tilman] fixed this over in PDFBox 1.8.x (already n

[jira] [Commented] (TIKA-1667) Upgrade to POI 3.13-beta1 when available

2015-09-25 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14907914#comment-14907914 ] Tim Allison commented on TIKA-1667: --- Which issue? > Upgrade to POI 3.13-beta1 when avail

[jira] [Comment Edited] (TIKA-1667) Upgrade to POI 3.13-beta1 when available

2015-09-25 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14907914#comment-14907914 ] Tim Allison edited comment on TIKA-1667 at 9/25/15 11:02 AM: - W

[jira] [Commented] (TIKA-1753) Improper word concatenation when extracting pdf

2015-09-25 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14907917#comment-14907917 ] Tim Allison commented on TIKA-1753: --- Y. I defer to [~lehmi] on PDFBOX-2991 for whether t

[jira] [Commented] (TIKA-1657) Allow easier XML serialization of TikaConfig

2015-09-25 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14907925#comment-14907925 ] Tim Allison commented on TIKA-1657: --- Thank you, [~gagravarr], for moving this forward...a

[jira] [Commented] (TIKA-1667) Upgrade to POI 3.13-beta1 when available

2015-09-25 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14907988#comment-14907988 ] Tim Allison commented on TIKA-1667: --- Thank you for raising this. I agree, this is probab

[jira] [Commented] (TIKA-1736) Bouncy Castle version binary incompatibility

2015-09-28 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933175#comment-14933175 ] Tim Allison commented on TIKA-1736: --- Should be fixed when [2.1.1|https://sourceforge.net

[jira] [Commented] (TIKA-1748) Upgrade to POI 3.13-final when available

2015-09-28 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933186#comment-14933186 ] Tim Allison commented on TIKA-1748: --- As [~kunda] pointed out, you're using a future versi

[jira] [Comment Edited] (TIKA-1748) Upgrade to POI 3.13-final when available

2015-09-28 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933186#comment-14933186 ] Tim Allison edited comment on TIKA-1748 at 9/28/15 11:40 AM: - A

[jira] [Updated] (TIKA-1748) Upgrade to POI 3.13-final when available

2015-09-28 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1748: -- Attachment: TIKA-1748.patch Y, not too much work. All tests pass, what could possibly go wrong? I added

[jira] [Commented] (TIKA-1748) Upgrade to POI 3.13-final when available

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14936698#comment-14936698 ] Tim Allison commented on TIKA-1748: --- Thank you! Will commit today. > Upgrade to POI 3.1

[jira] [Assigned] (TIKA-1747) Change file->path in tika-batch throughout

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reassigned TIKA-1747: - Assignee: Tim Allison > Change file->path in tika-batch throughout > -

[jira] [Assigned] (TIKA-1744) Use java.nio.file.Path in TikaInputStream

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reassigned TIKA-1744: - Assignee: Tim Allison > Use java.nio.file.Path in TikaInputStream > --

[jira] [Resolved] (TIKA-1744) Use java.nio.file.Path in TikaInputStream

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1744. --- Resolution: Fixed r1706056 Thank you, [~kunda]! > Use java.nio.file.Path in TikaInputStream > ---

[jira] [Created] (TIKA-1754) tika-batch's FileListCrawler truncates the first character of the fileList if the root is e.g. X:

2015-09-30 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1754: - Summary: tika-batch's FileListCrawler truncates the first character of the fileList if the root is e.g. X: Key: TIKA-1754 URL: https://issues.apache.org/jira/browse/TIKA-1754

[jira] [Resolved] (TIKA-1747) Change file->path in tika-batch throughout

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1747. --- Resolution: Fixed r1706060 > Change file->path in tika-batch throughout >

[jira] [Resolved] (TIKA-1754) tika-batch's FileListCrawler truncates the first character of the fileList if the root is e.g. X:

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1754. --- Resolution: Fixed Fixed with TIKA-1747. > tika-batch's FileListCrawler truncates the first character o

[jira] [Commented] (TIKA-1754) tika-batch's FileListCrawler truncates the first character of the fileList if the root is e.g. X:

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14937440#comment-14937440 ] Tim Allison commented on TIKA-1754: --- Y, probably. This particular issue is fixed for now.

[jira] [Assigned] (TIKA-1707) Upgrade to Apache POI 3.13 Beta 2

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reassigned TIKA-1707: - Assignee: Tim Allison > Upgrade to Apache POI 3.13 Beta 2 > - > >

[jira] [Resolved] (TIKA-1707) Upgrade to Apache POI 3.13 Beta 2

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1707. --- Resolution: Fixed r1706079. Thank you, [~kiwiwings]! Apologies to you and [~gagravarr] for not rememb

[jira] [Resolved] (TIKA-1742) StackOverflowError parsing a PDF with ExtractInlineImages=true

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1742. --- Resolution: Fixed r1706086 > StackOverflowError parsing a PDF with ExtractInlineImages=true >

[jira] [Commented] (TIKA-1742) StackOverflowError parsing a PDF with ExtractInlineImages=true

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14937720#comment-14937720 ] Tim Allison commented on TIKA-1742: --- Thank you, [~nated], for raising this, and thank you

[jira] [Resolved] (TIKA-1748) Upgrade to POI 3.13-final when available

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1748. --- Resolution: Fixed > Upgrade to POI 3.13-final when available >

[jira] [Created] (TIKA-1755) Make ppt and pptx paragraph/div breaks more consistent

2015-09-30 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1755: - Summary: Make ppt and pptx paragraph/div breaks more consistent Key: TIKA-1755 URL: https://issues.apache.org/jira/browse/TIKA-1755 Project: Tika Issue Type: Impro

[jira] [Commented] (TIKA-1755) Make ppt and pptx paragraph/div breaks more consistent

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14938273#comment-14938273 ] Tim Allison commented on TIKA-1755: --- Current patch gets us this with PPTX: {noformat}

[jira] [Updated] (TIKA-1755) Make ppt and pptx paragraph/div breaks more consistent

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1755: -- Attachment: TIKA-1755.patch Initial patch > Make ppt and pptx paragraph/div breaks more consistent > ---

[jira] [Commented] (TIKA-1744) Use java.nio.file.Path in TikaInputStream

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14938788#comment-14938788 ] Tim Allison commented on TIKA-1744: --- Doh! Thank you. > Use java.nio.file.Path in TikaIn

[jira] [Commented] (TIKA-1755) Make ppt and pptx paragraph/div breaks more consistent

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14938880#comment-14938880 ] Tim Allison commented on TIKA-1755: --- Y, you've got plenty of bigger fish to fry, and the

[jira] [Assigned] (TIKA-1757) tika-batch tests fail on systems with whitespace or special chars in folder name

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reassigned TIKA-1757: - Assignee: Tim Allison > tika-batch tests fail on systems with whitespace or special chars in folde

[jira] [Commented] (TIKA-1757) tika-batch tests fail on systems with whitespace or special chars in folder name

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14938893#comment-14938893 ] Tim Allison commented on TIKA-1757: --- Sorry about that. Will fix shortly. Thank you! >

[jira] [Commented] (TIKA-1757) tika-batch tests fail on systems with whitespace or special chars in folder name

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14938913#comment-14938913 ] Tim Allison commented on TIKA-1757: --- Y, won't be able to fix for a few hours, but I can r

[jira] [Resolved] (TIKA-1757) tika-batch tests fail on systems with whitespace or special chars in folder name

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1757. --- Resolution: Fixed Mea culpa. Tests pass for me on Windows with space in path and Linux. Let me know i

[jira] [Resolved] (TIKA-1758) BatchCommandLineBuilder fails on systems with whitespace in path

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1758. --- Resolution: Fixed r1706178. Thank you, [~kunda]. > BatchCommandLineBuilder fails on systems with whit

[jira] [Comment Edited] (TIKA-1758) BatchCommandLineBuilder fails on systems with whitespace in path

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14939102#comment-14939102 ] Tim Allison edited comment on TIKA-1758 at 10/1/15 12:27 AM: - r

[jira] [Resolved] (TIKA-1756) Update forbiddenapis to v2.0

2015-10-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1756. --- Resolution: Fixed r1706242. Thank you, [~thetaphi]! > Update forbiddenapis to v2.0 >

[jira] [Commented] (TIKA-1744) Use java.nio.file.Path in TikaInputStream

2015-10-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14939822#comment-14939822 ] Tim Allison commented on TIKA-1744: --- committed r1706249. Thank you! > Use java.nio.file

[jira] [Created] (TIKA-1759) Extract contributor metadata from supporting file formats

2015-10-01 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1759: - Summary: Extract contributor metadata from supporting file formats Key: TIKA-1759 URL: https://issues.apache.org/jira/browse/TIKA-1759 Project: Tika Issue Type: Im

[jira] [Updated] (TIKA-1759) Extract contributor metadata from supporting file formats

2015-10-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1759: -- Description: Many common file formats store information about contributors (broadly speaking) to a docum

[jira] [Commented] (TIKA-1759) Extract contributor metadata from supporting file formats

2015-10-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14939860#comment-14939860 ] Tim Allison commented on TIKA-1759: --- Question #1: are there any other types of embedded c

[jira] [Updated] (TIKA-1759) Extract contributor metadata from supporting file formats

2015-10-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1759: -- Attachment: contributors.zip I've created test files for MSOffice docs. If anyone would be willing to ad

[jira] [Commented] (TIKA-1759) Extract contributor metadata from supporting file formats

2015-10-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14939881#comment-14939881 ] Tim Allison commented on TIKA-1759: --- If we don't want to call all of the above {{dc:contr

[jira] [Commented] (TIKA-1759) Extract contributor metadata from supporting file formats

2015-10-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14940009#comment-14940009 ] Tim Allison commented on TIKA-1759: --- [~tilman], I think we're good with PDAnnotationMarku

[jira] [Commented] (TIKA-1759) Extract contributor metadata from supporting file formats

2015-10-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14940091#comment-14940091 ] Tim Allison commented on TIKA-1759: --- Y, it is. I think it would be useful to try to get

[jira] [Commented] (TIKA-1760) PDF index fulltext fails.

2015-10-02 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14940981#comment-14940981 ] Tim Allison commented on TIKA-1760: --- Thank you for raising this issue. I'm not sure ther

[jira] [Commented] (TIKA-1759) Extract contributor metadata from supporting file formats

2015-10-02 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14940983#comment-14940983 ] Tim Allison commented on TIKA-1759: --- Will do. Thank you! > Extract contributor metadata

[jira] [Assigned] (TIKA-1761) Error Parsing PPT (97-2003) files with password protection against modification which were created using Office 2013

2015-10-02 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reassigned TIKA-1761: - Assignee: Tim Allison > Error Parsing PPT (97-2003) files with password protection against > modi

[jira] [Commented] (TIKA-1285) Upgrade to PDFBox 2.0.0 when available

2015-10-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14942248#comment-14942248 ] Tim Allison commented on TIKA-1285: --- Completely agree. If I update the PDFBox 2.0 branch

[jira] [Commented] (TIKA-1285) Upgrade to PDFBox 2.0.0 when available

2015-10-05 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14943339#comment-14943339 ] Tim Allison commented on TIKA-1285: --- Thank you, [~b...@benmccann.com]! The more eyes we

[jira] [Comment Edited] (TIKA-1285) Upgrade to PDFBox 2.0.0 when available

2015-10-05 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14943339#comment-14943339 ] Tim Allison edited comment on TIKA-1285 at 10/5/15 1:14 PM: Tha

[jira] [Commented] (TIKA-1737) PDFBox 1.8.10 is still a basket case

2015-10-05 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14944295#comment-14944295 ] Tim Allison commented on TIKA-1737: --- [~alanbur], over on TIKA-1285, I posted a link for m

[jira] [Commented] (TIKA-1764) Provide information on failed document parsing in ParsingEmbeddedDocumentExtractor

2015-10-07 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946775#comment-14946775 ] Tim Allison commented on TIKA-1764: --- Ha, I've been wanting to do this for a while. I'm n

[jira] [Created] (TIKA-1765) Some doc and docx store multiple authors as semi-colon delimited list

2015-10-07 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1765: - Summary: Some doc and docx store multiple authors as semi-colon delimited list Key: TIKA-1765 URL: https://issues.apache.org/jira/browse/TIKA-1765 Project: Tika I

<    6   7   8   9   10   11   12   13   14   15   >