[jira] [Updated] (TIKA-2878) Update dependencies for 1.21.1 or 1.22

2019-05-20 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-2878: -- Attachment: pom.xml > Update dependencies for 1.21.1 or 1.22 >

[jira] [Commented] (TIKA-2878) Update dependencies for 1.21.1 or 1.22

2019-05-20 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844275#comment-16844275 ] Tilman Hausherr commented on TIKA-2878: --- [^pom.xml] Here's the pom I use to build > Update

[jira] [Commented] (TIKA-2878) Update dependencies for 1.21.1 or 1.22

2019-05-20 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844146#comment-16844146 ] Tilman Hausherr commented on TIKA-2878: --- With the maven owasp plugin 5.0.0.M3 I get even more when

[jira] [Commented] (TIKA-2749) OCR on PDFs should "just work" out of the box

2019-04-04 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810059#comment-16810059 ] Tilman Hausherr commented on TIKA-2749: --- You probably mean "vector graphics". > OCR on PDFs should

[jira] [Commented] (TIKA-2749) OCR on PDFs should "just work" out of the box

2019-04-03 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808400#comment-16808400 ] Tilman Hausherr commented on TIKA-2749: --- See the accepted answer here:

[jira] [Commented] (TIKA-2832) Very slow large PDF text extraction

2019-03-02 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782344#comment-16782344 ] Tilman Hausherr commented on TIKA-2832: --- Bug in PDFBox has been fixed. > Very slow large PDF text

[jira] [Commented] (TIKA-2828) Your project apache/tika is using buggy third-party libraries [WARNING]

2019-02-15 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769651#comment-16769651 ] Tilman Hausherr commented on TIKA-2828: --- Sorry, corrected. > Your project apache/tika is using

[jira] [Comment Edited] (TIKA-2828) Your project apache/tika is using buggy third-party libraries [WARNING]

2019-02-15 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769574#comment-16769574 ] Tilman Hausherr edited comment on TIKA-2828 at 2/15/19 7:45 PM: See also

[jira] [Commented] (TIKA-2828) Your project apache/tika is using buggy third-party libraries [WARNING]

2019-02-15 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769574#comment-16769574 ] Tilman Hausherr commented on TIKA-2828: --- See also my comment in PDFBOX-4457, it applies to two of

[jira] [Commented] (TIKA-2689) *.ai type (Adobe illustrator ) files are not detected correctly.

2018-08-27 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594084#comment-16594084 ] Tilman Hausherr commented on TIKA-2689: --- Sorry, I don't have any ideas either. > *.ai type (Adobe

[jira] [Comment Edited] (TIKA-2643) Tika call hangs when processes a pdf on Cloudera Hadoop

2018-05-16 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477696#comment-16477696 ] Tilman Hausherr edited comment on TIKA-2643 at 5/16/18 4:36 PM: I don't

[jira] [Commented] (TIKA-2643) Tika call hangs when processes a pdf on Cloudera Hadoop

2018-05-16 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477696#comment-16477696 ] Tilman Hausherr commented on TIKA-2643: --- I don't know anything about MapReduce. All I can tell is

[jira] [Commented] (TIKA-2124) IOException "expected number, actual=COSArray{...}" on a valid PDF

2018-04-05 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427438#comment-16427438 ] Tilman Hausherr commented on TIKA-2124: --- Due to the closing of the related PDFBox issue, this issue

[jira] [Commented] (TIKA-2620) Set sys property to get better rendering speed by default

2018-04-02 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422239#comment-16422239 ] Tilman Hausherr commented on TIKA-2620: --- The subsampling is when decoding, but this would influence

[jira] [Comment Edited] (TIKA-2620) Set sys property to get better rendering speed by default

2018-04-02 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422239#comment-16422239 ] Tilman Hausherr edited comment on TIKA-2620 at 4/2/18 1:13 PM: --- The

[jira] [Commented] (TIKA-2620) Set sys property to get better rendering speed by default

2018-03-30 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420503#comment-16420503 ] Tilman Hausherr commented on TIKA-2620: --- In most cases subsampling shouldn't be used. It might

[jira] [Comment Edited] (TIKA-2620) Set sys property to get better rendering speed by default

2018-03-29 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419487#comment-16419487 ] Tilman Hausherr edited comment on TIKA-2620 at 3/29/18 5:53 PM:

[jira] [Commented] (TIKA-2620) Set sys property to get better rendering speed by default

2018-03-29 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419487#comment-16419487 ] Tilman Hausherr commented on TIKA-2620: --- [~gagravarr] KCMS is the legacy setting. It is much faster.

[jira] [Comment Edited] (TIKA-2442) Non-terminal interactive form fields not handled recursively

2018-03-09 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393296#comment-16393296 ] Tilman Hausherr edited comment on TIKA-2442 at 3/9/18 6:04 PM: --- Isn't this

[jira] [Commented] (TIKA-2442) Non-terminal interactive form fields not handled recursively

2018-03-09 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393296#comment-16393296 ] Tilman Hausherr commented on TIKA-2442: --- Isn't this issue solved? (I stumbled up it while searching

[jira] [Commented] (TIKA-2492) Remove pdfdebugger from tika

2017-11-06 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16240945#comment-16240945 ] Tilman Hausherr commented on TIKA-2492: --- This didn't work, you put the exclusion under pdfbox and not

[jira] [Created] (TIKA-2492) Remove pdfdebugger from tika

2017-11-04 Thread Tilman Hausherr (JIRA)
Tilman Hausherr created TIKA-2492: - Summary: Remove pdfdebugger from tika Key: TIKA-2492 URL: https://issues.apache.org/jira/browse/TIKA-2492 Project: Tika Issue Type: Improvement

[jira] [Commented] (TIKA-2256) Japanese character substituted when reading PDF

2017-06-22 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059658#comment-16059658 ] Tilman Hausherr commented on TIKA-2256: --- Tim is correct. IMHO this issue should be closed as "not a

[jira] [Commented] (TIKA-2320) java.util.zip.DataFormatException when parsing a PDF

2017-04-13 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967844#comment-15967844 ] Tilman Hausherr commented on TIKA-2320: --- Fixed in PDFBox 2.0.6 despite the user not attaching a PDF

[jira] [Commented] (TIKA-2046) Can not read PDF correctly

2016-08-02 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404367#comment-15404367 ] Tilman Hausherr commented on TIKA-2046: --- I've closed the PDFBox issue as the behavior is correct. See

[jira] [Updated] (TIKA-1989) Weird sentence in website

2016-05-28 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-1989: -- Description: https://tika.apache.org/1.13/configuring.html {quote} To override some parser

[jira] [Updated] (TIKA-1989) Weird sentence in website

2016-05-28 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-1989: -- Description: https://tika.apache.org/1.13/configuring.html {quote} To override some parser

[jira] [Created] (TIKA-1989) Weird sentence in website

2016-05-28 Thread Tilman Hausherr (JIRA)
Tilman Hausherr created TIKA-1989: - Summary: Weird sentence in website Key: TIKA-1989 URL: https://issues.apache.org/jira/browse/TIKA-1989 Project: Tika Issue Type: Bug Components:

[jira] [Commented] (TIKA-1857) Enhance PDFParser to extract text from XFA forms

2016-02-16 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149256#comment-15149256 ] Tilman Hausherr commented on TIKA-1857: --- Sorry, I have no experience with XFA. [~msahyoun] might know

[jira] [Commented] (TIKA-1830) Upgrade to PDFBox 1.8.11 when available

2016-01-14 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098412#comment-15098412 ] Tilman Hausherr commented on TIKA-1830: --- Another possibility is that the change I mentioned has

[jira] [Comment Edited] (TIKA-1830) Upgrade to PDFBox 1.8.11 when available

2016-01-14 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15096866#comment-15096866 ] Tilman Hausherr edited comment on TIKA-1830 at 1/14/16 5:05 PM: I can't

[jira] [Commented] (TIKA-1830) Upgrade to PDFBox 1.8.11 when available

2016-01-14 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098503#comment-15098503 ] Tilman Hausherr commented on TIKA-1830: --- Not that, but the change I mentioned

[jira] [Commented] (TIKA-1830) Upgrade to PDFBox 1.8.11 when available

2016-01-14 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098401#comment-15098401 ] Tilman Hausherr commented on TIKA-1830: --- {quote} On PDFBOX-3193, you've set affected versions to

[jira] [Comment Edited] (TIKA-1830) Upgrade to PDFBox 1.8.11 when available

2016-01-14 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098401#comment-15098401 ] Tilman Hausherr edited comment on TIKA-1830 at 1/14/16 5:02 PM: {quote} On

[jira] [Commented] (TIKA-1830) Upgrade to PDFBox 1.8.11 when available

2016-01-14 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098418#comment-15098418 ] Tilman Hausherr commented on TIKA-1830: --- The line at {{BaseParser.java:1077}} is {code} COSInteger

[jira] [Commented] (TIKA-1830) Upgrade to PDFBox 1.8.11 when available

2016-01-13 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15096866#comment-15096866 ] Tilman Hausherr commented on TIKA-1830: --- I can't reproduce the difference for the file 074531.pdf.

[jira] [Commented] (TIKA-1737) PDFBox 1.8.10 is still a basket case

2015-10-06 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944574#comment-14944574 ] Tilman Hausherr commented on TIKA-1737: --- And I'd be interested to hear whether the situation

[jira] [Commented] (TIKA-1759) Extract contributor metadata from supporting file formats

2015-10-01 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940086#comment-14940086 ] Tilman Hausherr commented on TIKA-1759: --- But you already have the author from /Info and from the XMP

[jira] [Commented] (TIKA-1737) PDFBox 1.8.10 is still a basket case

2015-09-22 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903537#comment-14903537 ] Tilman Hausherr commented on TIKA-1737: --- No, PDFBOX-2987 is another one I fixed for you. The NPE in

[jira] [Commented] (TIKA-1737) PDFBox 1.8.10 is still a basket case

2015-09-21 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901502#comment-14901502 ] Tilman Hausherr commented on TIKA-1737: --- We will definitively not be able to find the cause of memory

[jira] [Commented] (TIKA-1737) PDFBox 1.8.10 is still a basket case

2015-09-21 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901042#comment-14901042 ] Tilman Hausherr commented on TIKA-1737: --- Some of the exceptions (the classcastexceptions in the

[jira] [Comment Edited] (TIKA-1737) PDFBox 1.8.10 is still a basket case

2015-09-21 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901042#comment-14901042 ] Tilman Hausherr edited comment on TIKA-1737 at 9/21/15 8:49 PM: Some of the

[jira] [Commented] (TIKA-1678) PDF metadata extraction fails to spot UTF-16 encoded title

2015-07-22 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637232#comment-14637232 ] Tilman Hausherr commented on TIKA-1678: --- API has changed again. This code works:

[jira] [Commented] (TIKA-1678) PDF metadata extraction fails to spot UTF-16 encoded title

2015-07-20 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633722#comment-14633722 ] Tilman Hausherr commented on TIKA-1678: --- Yes, such a string check would be useful. Or

[jira] [Commented] (TIKA-1678) PDF metadata extraction fails to spot UTF-16 encoded title

2015-07-20 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633687#comment-14633687 ] Tilman Hausherr commented on TIKA-1678: --- sure: {code} public class Tika1678 extends

[jira] [Commented] (TIKA-1678) PDF metadata extraction fails to spot UTF-16 encoded title

2015-07-20 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634065#comment-14634065 ] Tilman Hausherr commented on TIKA-1678: --- Yes please do and attach the file. It's late

[jira] [Commented] (TIKA-1678) PDF metadata extraction fails to spot UTF-16 encoded title

2015-07-20 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634045#comment-14634045 ] Tilman Hausherr commented on TIKA-1678: --- Likely a bug. I tried calling getTitele

[jira] [Comment Edited] (TIKA-1678) PDF metadata extraction fails to spot UTF-16 encoded title

2015-07-20 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634045#comment-14634045 ] Tilman Hausherr edited comment on TIKA-1678 at 7/20/15 8:41 PM:

[jira] [Comment Edited] (TIKA-1678) PDF metadata extraction fails to spot UTF-16 encoded title

2015-07-19 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632429#comment-14632429 ] Tilman Hausherr edited comment on TIKA-1678 at 7/19/15 11:21 AM:

[jira] [Comment Edited] (TIKA-1678) PDF metadata extraction fails to spot UTF-16 encoded title

2015-07-19 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632429#comment-14632429 ] Tilman Hausherr edited comment on TIKA-1678 at 7/19/15 11:22 AM:

[jira] [Commented] (TIKA-1678) PDF metadata extraction fails to spot UTF-16 encoded title

2015-07-18 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632429#comment-14632429 ] Tilman Hausherr commented on TIKA-1678: --- I think this is two bytes. I.e. a 0x0 and a

[jira] [Commented] (TIKA-1678) PDF metadata extraction fails to spot UTF-16 encoded title

2015-07-18 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632432#comment-14632432 ] Tilman Hausherr commented on TIKA-1678: --- I get correct output for the non-XMP stuff

[jira] [Commented] (TIKA-1588) Upgrade to PDFBox 1.8.10 when available

2015-07-15 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628890#comment-14628890 ] Tilman Hausherr commented on TIKA-1588: --- The weird thing is that I can't find any

[jira] [Issue Comment Deleted] (TIKA-1575) Upgrade to PDFBox 1.8.9 when available

2015-03-19 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-1575: -- Comment: was deleted (was: With the pure ExtractText, all is identical. Could you attach the

[jira] [Commented] (TIKA-1575) Upgrade to PDFBox 1.8.9 when available

2015-03-19 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368686#comment-14368686 ] Tilman Hausherr commented on TIKA-1575: --- With the pure ExtractText, all is identical.

[jira] [Commented] (TIKA-1575) Upgrade to PDFBox 1.8.9 when available

2015-03-19 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368687#comment-14368687 ] Tilman Hausherr commented on TIKA-1575: --- With the pure ExtractText, all is identical.

[jira] [Commented] (TIKA-1575) Upgrade to PDFBox 1.8.9 when available

2015-03-17 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14364710#comment-14364710 ] Tilman Hausherr commented on TIKA-1575: --- Could you attach the TIKA output you get

[jira] [Commented] (TIKA-1575) Upgrade to PDFBox 1.8.9 when available

2015-03-17 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365807#comment-14365807 ] Tilman Hausherr commented on TIKA-1575: --- Can't tell, I don't know much about the

[jira] [Commented] (TIKA-1575) Upgrade to PDFBox 1.8.9 when available

2015-03-17 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365829#comment-14365829 ] Tilman Hausherr commented on TIKA-1575: --- Thanks. Re: OCR, you should know that there

[jira] [Commented] (TIKA-1575) Upgrade to PDFBox 1.8.9 when available

2015-03-17 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365524#comment-14365524 ] Tilman Hausherr commented on TIKA-1575: --- I can't understand how you get the extracted

[jira] [Commented] (TIKA-1575) Upgrade to PDFBox 1.8.9 when available

2015-03-15 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362365#comment-14362365 ] Tilman Hausherr commented on TIKA-1575: --- {code} b) might be actual modest regressions

[jira] [Commented] (TIKA-1575) Upgrade to PDFBox 1.8.9 when available

2015-03-15 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362406#comment-14362406 ] Tilman Hausherr commented on TIKA-1575: --- [~talli...@apache.org] please repeat the

[jira] [Commented] (TIKA-1174) Invalid characters in filtered PDF output

2015-03-15 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362552#comment-14362552 ] Tilman Hausherr commented on TIKA-1174: --- Can't comment, I'm not that good with font

[jira] [Comment Edited] (TIKA-1038) Parsing PDF with StackOverlowError

2015-03-04 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347377#comment-14347377 ] Tilman Hausherr edited comment on TIKA-1038 at 3/4/15 6:59 PM:

[jira] [Commented] (TIKA-1038) Parsing PDF with StackOverlowError

2015-03-04 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347377#comment-14347377 ] Tilman Hausherr commented on TIKA-1038: --- [~talli...@mitre.org]are you watching this

[jira] [Commented] (TIKA-1548) System property added while catching exception on parsing PDF encrypted doc

2015-02-11 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316723#comment-14316723 ] Tilman Hausherr commented on TIKA-1548: --- Sorry, no. We're not setting that one. It

[jira] [Updated] (TIKA-1442) Upgrade to PDFBox 1.8.8

2014-12-03 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-1442: -- Attachment: PDFBox_1_8_6VPDFBox_1_8_8-CLASSIC-b162.xlsx I've now looked at the 1.8.6 vs 1.8.8

[jira] [Updated] (TIKA-1442) Upgrade to PDFBox 1.8.8

2014-12-02 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-1442: -- Attachment: PDFBox_1_8_8-CLASSICVPDFBox_1_8_8-NONSEQ-b162.xlsx Thanks... one problem in both

[jira] [Commented] (TIKA-1489) PDF Text extraction without permission

2014-12-01 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230193#comment-14230193 ] Tilman Hausherr commented on TIKA-1489: --- [~talli...@mitre.org] I can't tell you what

[jira] [Commented] (TIKA-1442) Upgrade to PDFBox 1.8.8

2014-12-01 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230589#comment-14230589 ] Tilman Hausherr commented on TIKA-1442: --- Weird thing in the 1.8.6 vs 1.8.8 test:

[jira] [Comment Edited] (TIKA-1442) Upgrade to PDFBox 1.8.8

2014-12-01 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230589#comment-14230589 ] Tilman Hausherr edited comment on TIKA-1442 at 12/1/14 10:44 PM:

[jira] [Comment Edited] (TIKA-1442) Upgrade to PDFBox 1.8.8

2014-12-01 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230589#comment-14230589 ] Tilman Hausherr edited comment on TIKA-1442 at 12/1/14 10:49 PM:

[jira] [Updated] (TIKA-1442) Upgrade to PDFBox 1.8.8

2014-11-30 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-1442: -- Attachment: PDFBox_1_8_8-ClassicVPDFBox_1_8_8-NonSeq.xlsx Upgrade to PDFBox 1.8.8

[jira] [Updated] (TIKA-1442) Upgrade to PDFBox 1.8.8

2014-11-30 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-1442: -- Attachment: (was: PDFBox_1_8_8-ClassicVPDFBox_1_8_8-NonSeq.xlsx) Upgrade to PDFBox 1.8.8

[jira] [Comment Edited] (TIKA-1442) Upgrade to PDFBox 1.8.8

2014-11-30 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228968#comment-14228968 ] Tilman Hausherr edited comment on TIKA-1442 at 11/30/14 10:49 PM:

[jira] [Updated] (TIKA-1442) Upgrade to PDFBox 1.8.8

2014-11-29 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-1442: -- Attachment: PDFBox_1_8_8-ClassicVPDFBox_1_8_8-NonSeq.xlsx Here's my evaluation of the test. I

[jira] [Commented] (TIKA-1489) PDF Text extraction without permission

2014-11-26 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226500#comment-14226500 ] Tilman Hausherr commented on TIKA-1489: --- No, permissions are connected to encryption.

[jira] [Commented] (TIKA-1442) Upgrade to PDFBox 1.8.8

2014-11-25 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225008#comment-14225008 ] Tilman Hausherr commented on TIKA-1442: --- Thanks Tim! 892848.pdf and 892859.pdf

[jira] [Comment Edited] (TIKA-1442) Upgrade to PDFBox 1.8.8

2014-11-25 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225008#comment-14225008 ] Tilman Hausherr edited comment on TIKA-1442 at 11/25/14 8:38 PM:

[jira] [Updated] (TIKA-1442) Upgrade to PDFBox 1.8.8

2014-11-25 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-1442: -- Attachment: PDFBox_1_8_6VPDFBox_1_8_8-b145.zip Upgrade to PDFBox 1.8.8 ---

[jira] [Commented] (TIKA-1442) Upgrade to PDFBox 1.8.8

2014-11-25 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225283#comment-14225283 ] Tilman Hausherr commented on TIKA-1442: --- [~talli...@apache.org] I'm really wondering

[jira] [Comment Edited] (TIKA-1442) Upgrade to PDFBox 1.8.8

2014-11-25 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225008#comment-14225008 ] Tilman Hausherr edited comment on TIKA-1442 at 11/25/14 10:08 PM:

[jira] [Comment Edited] (TIKA-1442) Upgrade to PDFBox 1.8.8

2014-11-25 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225008#comment-14225008 ] Tilman Hausherr edited comment on TIKA-1442 at 11/25/14 11:08 PM:

[jira] [Commented] (TIKA-1442) Upgrade to PDFBox 1.8.8

2014-11-25 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225867#comment-14225867 ] Tilman Hausherr commented on TIKA-1442: --- Ok, will do. About the seq vs. nonSeq test:

[jira] [Created] (TIKA-1489) PDF Text extraction without permission

2014-11-25 Thread Tilman Hausherr (JIRA)
Tilman Hausherr created TIKA-1489: - Summary: PDF Text extraction without permission Key: TIKA-1489 URL: https://issues.apache.org/jira/browse/TIKA-1489 Project: Tika Issue Type: Bug

[jira] [Commented] (TIKA-1467) pdf:encrypted:false with encrypted pdf

2014-11-07 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202456#comment-14202456 ] Tilman Hausherr commented on TIKA-1467: --- The old and the new parser have different

[jira] [Comment Edited] (TIKA-1467) pdf:encrypted:false with encrypted pdf

2014-11-07 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202456#comment-14202456 ] Tilman Hausherr edited comment on TIKA-1467 at 11/7/14 10:22 PM:

[jira] [Comment Edited] (TIKA-1442) Upgrade to PDFBox 1.8.8

2014-10-24 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173983#comment-14173983 ] Tilman Hausherr edited comment on TIKA-1442 at 10/24/14 11:02 AM:

[jira] [Commented] (TIKA-1442) Upgrade to PDFBox 1.8.8

2014-10-23 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181779#comment-14181779 ] Tilman Hausherr commented on TIKA-1442: --- Thanks! I'm slowly starting, and here's the

[jira] [Comment Edited] (TIKA-1442) Upgrade to PDFBox 1.8.8

2014-10-23 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181779#comment-14181779 ] Tilman Hausherr edited comment on TIKA-1442 at 10/23/14 7:31 PM:

[jira] [Commented] (TIKA-1442) Upgrade to PDFBox 1.8.8

2014-10-23 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181813#comment-14181813 ] Tilman Hausherr commented on TIKA-1442: --- The directory structure isn't a problem for

[jira] [Updated] (TIKA-1442) Upgrade to PDFBox 1.8.8

2014-10-23 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-1442: -- Attachment: pdfbox_1_8_6V1_8_8-SNAPSHOTc.zip I'm done now; the result is two new issues,

[jira] [Commented] (TIKA-1442) Upgrade to PDFBox 1.8.8

2014-10-23 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182047#comment-14182047 ] Tilman Hausherr commented on TIKA-1442: --- A few files have less meta data than before:

[jira] [Commented] (TIKA-1442) Upgrade to PDFBox 1.8.8

2014-10-22 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180302#comment-14180302 ] Tilman Hausherr commented on TIKA-1442: --- {quote} and recommend other statistics that

[jira] [Commented] (TIKA-1442) Upgrade to PDFBox 1.8.8

2014-10-22 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180440#comment-14180440 ] Tilman Hausherr commented on TIKA-1442: --- Whats also missing this time is the token

[jira] [Comment Edited] (TIKA-1442) Upgrade to PDFBox 1.8.8

2014-10-22 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180302#comment-14180302 ] Tilman Hausherr edited comment on TIKA-1442 at 10/22/14 8:06 PM:

[jira] [Commented] (TIKA-1442) Upgrade to PDFBox 1.8.8

2014-10-22 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180446#comment-14180446 ] Tilman Hausherr commented on TIKA-1442: --- Sorry, ignore my text re: 1st line only.

[jira] [Commented] (TIKA-1442) Upgrade to PDFBox 1.8.8

2014-10-22 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180469#comment-14180469 ] Tilman Hausherr commented on TIKA-1442: --- {quote} Should I add token count? {quote}

[jira] [Commented] (TIKA-1442) Upgrade to PDFBox 1.8.8

2014-10-22 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180636#comment-14180636 ] Tilman Hausherr commented on TIKA-1442: --- Which are the top10words? I ask because

[jira] [Updated] (TIKA-1442) Upgrade to PDFBox 1.8.8

2014-10-16 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-1442: -- Attachment: (was: pdfbox_1_8_6V1_8_8-SNAPSHOT.xlsx) Upgrade to PDFBox 1.8.8

<    2   3   4   5   6   7   8   >