[
https://issues.apache.org/jira/browse/TIKA-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated TIKA-2878:
--
Attachment: pom.xml
> Update dependencies for 1.21.1 or 1.22
>
[
https://issues.apache.org/jira/browse/TIKA-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844275#comment-16844275
]
Tilman Hausherr commented on TIKA-2878:
---
[^pom.xml] Here's the pom I use to build
> Update
[
https://issues.apache.org/jira/browse/TIKA-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844146#comment-16844146
]
Tilman Hausherr commented on TIKA-2878:
---
With the maven owasp plugin 5.0.0.M3 I get even more when
[
https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810059#comment-16810059
]
Tilman Hausherr commented on TIKA-2749:
---
You probably mean "vector graphics".
> OCR on PDFs should
[
https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808400#comment-16808400
]
Tilman Hausherr commented on TIKA-2749:
---
See the accepted answer here:
[
https://issues.apache.org/jira/browse/TIKA-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782344#comment-16782344
]
Tilman Hausherr commented on TIKA-2832:
---
Bug in PDFBox has been fixed.
> Very slow large PDF text
[
https://issues.apache.org/jira/browse/TIKA-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769651#comment-16769651
]
Tilman Hausherr commented on TIKA-2828:
---
Sorry, corrected.
> Your project apache/tika is using
[
https://issues.apache.org/jira/browse/TIKA-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769574#comment-16769574
]
Tilman Hausherr edited comment on TIKA-2828 at 2/15/19 7:45 PM:
See also
[
https://issues.apache.org/jira/browse/TIKA-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769574#comment-16769574
]
Tilman Hausherr commented on TIKA-2828:
---
See also my comment in PDFBOX-4457, it applies to two of
[
https://issues.apache.org/jira/browse/TIKA-2689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594084#comment-16594084
]
Tilman Hausherr commented on TIKA-2689:
---
Sorry, I don't have any ideas either.
> *.ai type (Adobe
[
https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477696#comment-16477696
]
Tilman Hausherr edited comment on TIKA-2643 at 5/16/18 4:36 PM:
I don't
[
https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477696#comment-16477696
]
Tilman Hausherr commented on TIKA-2643:
---
I don't know anything about MapReduce. All I can tell is
[
https://issues.apache.org/jira/browse/TIKA-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427438#comment-16427438
]
Tilman Hausherr commented on TIKA-2124:
---
Due to the closing of the related PDFBox issue, this issue
[
https://issues.apache.org/jira/browse/TIKA-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422239#comment-16422239
]
Tilman Hausherr commented on TIKA-2620:
---
The subsampling is when decoding, but this would influence
[
https://issues.apache.org/jira/browse/TIKA-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422239#comment-16422239
]
Tilman Hausherr edited comment on TIKA-2620 at 4/2/18 1:13 PM:
---
The
[
https://issues.apache.org/jira/browse/TIKA-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420503#comment-16420503
]
Tilman Hausherr commented on TIKA-2620:
---
In most cases subsampling shouldn't be used. It might
[
https://issues.apache.org/jira/browse/TIKA-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419487#comment-16419487
]
Tilman Hausherr edited comment on TIKA-2620 at 3/29/18 5:53 PM:
[
https://issues.apache.org/jira/browse/TIKA-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419487#comment-16419487
]
Tilman Hausherr commented on TIKA-2620:
---
[~gagravarr] KCMS is the legacy setting. It is much faster.
[
https://issues.apache.org/jira/browse/TIKA-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393296#comment-16393296
]
Tilman Hausherr edited comment on TIKA-2442 at 3/9/18 6:04 PM:
---
Isn't this
[
https://issues.apache.org/jira/browse/TIKA-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393296#comment-16393296
]
Tilman Hausherr commented on TIKA-2442:
---
Isn't this issue solved? (I stumbled up it while searching
[
https://issues.apache.org/jira/browse/TIKA-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16240945#comment-16240945
]
Tilman Hausherr commented on TIKA-2492:
---
This didn't work, you put the exclusion under pdfbox and not
Tilman Hausherr created TIKA-2492:
-
Summary: Remove pdfdebugger from tika
Key: TIKA-2492
URL: https://issues.apache.org/jira/browse/TIKA-2492
Project: Tika
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/TIKA-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059658#comment-16059658
]
Tilman Hausherr commented on TIKA-2256:
---
Tim is correct. IMHO this issue should be closed as "not a
[
https://issues.apache.org/jira/browse/TIKA-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967844#comment-15967844
]
Tilman Hausherr commented on TIKA-2320:
---
Fixed in PDFBox 2.0.6 despite the user not attaching a PDF
[
https://issues.apache.org/jira/browse/TIKA-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404367#comment-15404367
]
Tilman Hausherr commented on TIKA-2046:
---
I've closed the PDFBox issue as the behavior is correct. See
[
https://issues.apache.org/jira/browse/TIKA-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated TIKA-1989:
--
Description:
https://tika.apache.org/1.13/configuring.html
{quote}
To override some parser
[
https://issues.apache.org/jira/browse/TIKA-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated TIKA-1989:
--
Description:
https://tika.apache.org/1.13/configuring.html
{quote}
To override some parser
Tilman Hausherr created TIKA-1989:
-
Summary: Weird sentence in website
Key: TIKA-1989
URL: https://issues.apache.org/jira/browse/TIKA-1989
Project: Tika
Issue Type: Bug
Components:
[
https://issues.apache.org/jira/browse/TIKA-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149256#comment-15149256
]
Tilman Hausherr commented on TIKA-1857:
---
Sorry, I have no experience with XFA. [~msahyoun] might know
[
https://issues.apache.org/jira/browse/TIKA-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098412#comment-15098412
]
Tilman Hausherr commented on TIKA-1830:
---
Another possibility is that the change I mentioned has
[
https://issues.apache.org/jira/browse/TIKA-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15096866#comment-15096866
]
Tilman Hausherr edited comment on TIKA-1830 at 1/14/16 5:05 PM:
I can't
[
https://issues.apache.org/jira/browse/TIKA-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098503#comment-15098503
]
Tilman Hausherr commented on TIKA-1830:
---
Not that, but the change I mentioned
[
https://issues.apache.org/jira/browse/TIKA-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098401#comment-15098401
]
Tilman Hausherr commented on TIKA-1830:
---
{quote}
On PDFBOX-3193, you've set affected versions to
[
https://issues.apache.org/jira/browse/TIKA-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098401#comment-15098401
]
Tilman Hausherr edited comment on TIKA-1830 at 1/14/16 5:02 PM:
{quote}
On
[
https://issues.apache.org/jira/browse/TIKA-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098418#comment-15098418
]
Tilman Hausherr commented on TIKA-1830:
---
The line at {{BaseParser.java:1077}} is
{code}
COSInteger
[
https://issues.apache.org/jira/browse/TIKA-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15096866#comment-15096866
]
Tilman Hausherr commented on TIKA-1830:
---
I can't reproduce the difference for the file 074531.pdf.
[
https://issues.apache.org/jira/browse/TIKA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944574#comment-14944574
]
Tilman Hausherr commented on TIKA-1737:
---
And I'd be interested to hear whether the situation
[
https://issues.apache.org/jira/browse/TIKA-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940086#comment-14940086
]
Tilman Hausherr commented on TIKA-1759:
---
But you already have the author from /Info and from the XMP
[
https://issues.apache.org/jira/browse/TIKA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903537#comment-14903537
]
Tilman Hausherr commented on TIKA-1737:
---
No, PDFBOX-2987 is another one I fixed for you. The NPE in
[
https://issues.apache.org/jira/browse/TIKA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901502#comment-14901502
]
Tilman Hausherr commented on TIKA-1737:
---
We will definitively not be able to find the cause of memory
[
https://issues.apache.org/jira/browse/TIKA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901042#comment-14901042
]
Tilman Hausherr commented on TIKA-1737:
---
Some of the exceptions (the classcastexceptions in the
[
https://issues.apache.org/jira/browse/TIKA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901042#comment-14901042
]
Tilman Hausherr edited comment on TIKA-1737 at 9/21/15 8:49 PM:
Some of the
[
https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637232#comment-14637232
]
Tilman Hausherr commented on TIKA-1678:
---
API has changed again. This code works:
[
https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633722#comment-14633722
]
Tilman Hausherr commented on TIKA-1678:
---
Yes, such a string check would be useful. Or
[
https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633687#comment-14633687
]
Tilman Hausherr commented on TIKA-1678:
---
sure:
{code}
public class Tika1678 extends
[
https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634065#comment-14634065
]
Tilman Hausherr commented on TIKA-1678:
---
Yes please do and attach the file. It's late
[
https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634045#comment-14634045
]
Tilman Hausherr commented on TIKA-1678:
---
Likely a bug. I tried calling getTitele
[
https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634045#comment-14634045
]
Tilman Hausherr edited comment on TIKA-1678 at 7/20/15 8:41 PM:
[
https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632429#comment-14632429
]
Tilman Hausherr edited comment on TIKA-1678 at 7/19/15 11:21 AM:
[
https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632429#comment-14632429
]
Tilman Hausherr edited comment on TIKA-1678 at 7/19/15 11:22 AM:
[
https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632429#comment-14632429
]
Tilman Hausherr commented on TIKA-1678:
---
I think this is two bytes. I.e. a 0x0 and a
[
https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632432#comment-14632432
]
Tilman Hausherr commented on TIKA-1678:
---
I get correct output for the non-XMP stuff
[
https://issues.apache.org/jira/browse/TIKA-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628890#comment-14628890
]
Tilman Hausherr commented on TIKA-1588:
---
The weird thing is that I can't find any
[
https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated TIKA-1575:
--
Comment: was deleted
(was: With the pure ExtractText, all is identical. Could you attach the
[
https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368686#comment-14368686
]
Tilman Hausherr commented on TIKA-1575:
---
With the pure ExtractText, all is identical.
[
https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368687#comment-14368687
]
Tilman Hausherr commented on TIKA-1575:
---
With the pure ExtractText, all is identical.
[
https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14364710#comment-14364710
]
Tilman Hausherr commented on TIKA-1575:
---
Could you attach the TIKA output you get
[
https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365807#comment-14365807
]
Tilman Hausherr commented on TIKA-1575:
---
Can't tell, I don't know much about the
[
https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365829#comment-14365829
]
Tilman Hausherr commented on TIKA-1575:
---
Thanks. Re: OCR, you should know that there
[
https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365524#comment-14365524
]
Tilman Hausherr commented on TIKA-1575:
---
I can't understand how you get the extracted
[
https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362365#comment-14362365
]
Tilman Hausherr commented on TIKA-1575:
---
{code}
b) might be actual modest regressions
[
https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362406#comment-14362406
]
Tilman Hausherr commented on TIKA-1575:
---
[~talli...@apache.org] please repeat the
[
https://issues.apache.org/jira/browse/TIKA-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362552#comment-14362552
]
Tilman Hausherr commented on TIKA-1174:
---
Can't comment, I'm not that good with font
[
https://issues.apache.org/jira/browse/TIKA-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347377#comment-14347377
]
Tilman Hausherr edited comment on TIKA-1038 at 3/4/15 6:59 PM:
[
https://issues.apache.org/jira/browse/TIKA-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347377#comment-14347377
]
Tilman Hausherr commented on TIKA-1038:
---
[~talli...@mitre.org]are you watching this
[
https://issues.apache.org/jira/browse/TIKA-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316723#comment-14316723
]
Tilman Hausherr commented on TIKA-1548:
---
Sorry, no. We're not setting that one. It
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated TIKA-1442:
--
Attachment: PDFBox_1_8_6VPDFBox_1_8_8-CLASSIC-b162.xlsx
I've now looked at the 1.8.6 vs 1.8.8
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated TIKA-1442:
--
Attachment: PDFBox_1_8_8-CLASSICVPDFBox_1_8_8-NONSEQ-b162.xlsx
Thanks... one problem in both
[
https://issues.apache.org/jira/browse/TIKA-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230193#comment-14230193
]
Tilman Hausherr commented on TIKA-1489:
---
[~talli...@mitre.org] I can't tell you what
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230589#comment-14230589
]
Tilman Hausherr commented on TIKA-1442:
---
Weird thing in the 1.8.6 vs 1.8.8 test:
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230589#comment-14230589
]
Tilman Hausherr edited comment on TIKA-1442 at 12/1/14 10:44 PM:
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230589#comment-14230589
]
Tilman Hausherr edited comment on TIKA-1442 at 12/1/14 10:49 PM:
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated TIKA-1442:
--
Attachment: PDFBox_1_8_8-ClassicVPDFBox_1_8_8-NonSeq.xlsx
Upgrade to PDFBox 1.8.8
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated TIKA-1442:
--
Attachment: (was: PDFBox_1_8_8-ClassicVPDFBox_1_8_8-NonSeq.xlsx)
Upgrade to PDFBox 1.8.8
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228968#comment-14228968
]
Tilman Hausherr edited comment on TIKA-1442 at 11/30/14 10:49 PM:
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated TIKA-1442:
--
Attachment: PDFBox_1_8_8-ClassicVPDFBox_1_8_8-NonSeq.xlsx
Here's my evaluation of the test. I
[
https://issues.apache.org/jira/browse/TIKA-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226500#comment-14226500
]
Tilman Hausherr commented on TIKA-1489:
---
No, permissions are connected to encryption.
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225008#comment-14225008
]
Tilman Hausherr commented on TIKA-1442:
---
Thanks Tim!
892848.pdf and 892859.pdf
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225008#comment-14225008
]
Tilman Hausherr edited comment on TIKA-1442 at 11/25/14 8:38 PM:
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated TIKA-1442:
--
Attachment: PDFBox_1_8_6VPDFBox_1_8_8-b145.zip
Upgrade to PDFBox 1.8.8
---
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225283#comment-14225283
]
Tilman Hausherr commented on TIKA-1442:
---
[~talli...@apache.org] I'm really wondering
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225008#comment-14225008
]
Tilman Hausherr edited comment on TIKA-1442 at 11/25/14 10:08 PM:
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225008#comment-14225008
]
Tilman Hausherr edited comment on TIKA-1442 at 11/25/14 11:08 PM:
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225867#comment-14225867
]
Tilman Hausherr commented on TIKA-1442:
---
Ok, will do.
About the seq vs. nonSeq test:
Tilman Hausherr created TIKA-1489:
-
Summary: PDF Text extraction without permission
Key: TIKA-1489
URL: https://issues.apache.org/jira/browse/TIKA-1489
Project: Tika
Issue Type: Bug
[
https://issues.apache.org/jira/browse/TIKA-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202456#comment-14202456
]
Tilman Hausherr commented on TIKA-1467:
---
The old and the new parser have different
[
https://issues.apache.org/jira/browse/TIKA-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202456#comment-14202456
]
Tilman Hausherr edited comment on TIKA-1467 at 11/7/14 10:22 PM:
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173983#comment-14173983
]
Tilman Hausherr edited comment on TIKA-1442 at 10/24/14 11:02 AM:
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181779#comment-14181779
]
Tilman Hausherr commented on TIKA-1442:
---
Thanks!
I'm slowly starting, and here's the
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181779#comment-14181779
]
Tilman Hausherr edited comment on TIKA-1442 at 10/23/14 7:31 PM:
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181813#comment-14181813
]
Tilman Hausherr commented on TIKA-1442:
---
The directory structure isn't a problem for
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated TIKA-1442:
--
Attachment: pdfbox_1_8_6V1_8_8-SNAPSHOTc.zip
I'm done now; the result is two new issues,
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182047#comment-14182047
]
Tilman Hausherr commented on TIKA-1442:
---
A few files have less meta data than before:
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180302#comment-14180302
]
Tilman Hausherr commented on TIKA-1442:
---
{quote}
and recommend other statistics that
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180440#comment-14180440
]
Tilman Hausherr commented on TIKA-1442:
---
Whats also missing this time is the token
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180302#comment-14180302
]
Tilman Hausherr edited comment on TIKA-1442 at 10/22/14 8:06 PM:
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180446#comment-14180446
]
Tilman Hausherr commented on TIKA-1442:
---
Sorry, ignore my text re: 1st line only.
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180469#comment-14180469
]
Tilman Hausherr commented on TIKA-1442:
---
{quote}
Should I add token count?
{quote}
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180636#comment-14180636
]
Tilman Hausherr commented on TIKA-1442:
---
Which are the top10words? I ask because
[
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated TIKA-1442:
--
Attachment: (was: pdfbox_1_8_6V1_8_8-SNAPSHOT.xlsx)
Upgrade to PDFBox 1.8.8
601 - 700 of 719 matches
Mail list logo