[
https://issues.apache.org/jira/browse/TIKA-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pascal Essiembre updated TIKA-1286:
---
Attachment: TIKA-1286.zip
Here you go. One for each types. They do not hold real/significant c
Pascal Essiembre created TIKA-1620:
--
Summary: OUTPUT_FILE_TOKEN not being replaced in ExternalParser
Key: TIKA-1620
URL: https://issues.apache.org/jira/browse/TIKA-1620
Project: Tika
Issue T
Pascal Essiembre created TIKA-1837:
--
Summary: HtmlEncodingDetector wrongly detects charset from
commented meta
Key: TIKA-1837
URL: https://issues.apache.org/jira/browse/TIKA-1837
Project: Tika
[
https://issues.apache.org/jira/browse/TIKA-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15108892#comment-15108892
]
Pascal Essiembre commented on TIKA-1837:
How often? It was the first and only time
[
https://issues.apache.org/jira/browse/TIKA-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15136128#comment-15136128
]
Pascal Essiembre commented on TIKA-741:
---
It looks like maxDepth 100 is not enough. I
[
https://issues.apache.org/jira/browse/TIKA-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15137497#comment-15137497
]
Pascal Essiembre commented on TIKA-741:
---
What? That easy? Those two simple lines did i
[
https://issues.apache.org/jira/browse/TIKA-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15140092#comment-15140092
]
Pascal Essiembre commented on TIKA-741:
---
Is your own dev branch found as snapshots in
[
https://issues.apache.org/jira/browse/TIKA-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15140212#comment-15140212
]
Pascal Essiembre commented on TIKA-741:
---
Awesome, thanks!
> "Zip bomb" (XML nesting)
[
https://issues.apache.org/jira/browse/TIKA-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15140217#comment-15140217
]
Pascal Essiembre commented on TIKA-741:
---
So the best way to submit PDFBox 2.0.0 relate
[
https://issues.apache.org/jira/browse/TIKA-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15140376#comment-15140376
]
Pascal Essiembre commented on TIKA-741:
---
Got it, thanks!
> "Zip bomb" (XML nesting) d
[
https://issues.apache.org/jira/browse/TIKA-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pascal Essiembre updated TIKA-741:
--
Comment: was deleted
(was: Got it. Thanks!)
> "Zip bomb" (XML nesting) detection is too strict
>
[
https://issues.apache.org/jira/browse/TIKA-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15140375#comment-15140375
]
Pascal Essiembre commented on TIKA-741:
---
Got it. Thanks!
> "Zip bomb" (XML nesting) d
Pascal Essiembre created TIKA-1857:
--
Summary: Enhance PDFParser to extract text from XFA forms
Key: TIKA-1857
URL: https://issues.apache.org/jira/browse/TIKA-1857
Project: Tika
Issue Type: I
[
https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15148914#comment-15148914
]
Pascal Essiembre commented on TIKA-1607:
In the case of XFA forms, the form IS the
[
https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235547#comment-15235547
]
Pascal Essiembre commented on TIKA-1946:
I certainly can, but they ain't perfect as
[
https://issues.apache.org/jira/browse/TIKA-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235558#comment-15235558
]
Pascal Essiembre commented on TIKA-1857:
Yes, it looks like the changes will do jus
Pascal Essiembre created TIKA-1286:
--
Summary: Adding MS Visio VSDX to mime-types detection
Key: TIKA-1286
URL: https://issues.apache.org/jira/browse/TIKA-1286
Project: Tika
Issue Type: Impro
Pascal Essiembre created TIKA-2219:
--
Summary: CharsetDetector no longer detects windows-1252 charset
Key: TIKA-2219
URL: https://issues.apache.org/jira/browse/TIKA-2219
Project: Tika
Issue T
[
https://issues.apache.org/jira/browse/TIKA-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pascal Essiembre updated TIKA-2219:
---
Description:
Starting with Tika 1.14, windows-1252 is no longer detected, as ISO-8859-1 is
alw
[
https://issues.apache.org/jira/browse/TIKA-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15765034#comment-15765034
]
Pascal Essiembre commented on TIKA-2219:
I am relying on CharsetDetector. Thanks f
[
https://issues.apache.org/jira/browse/TIKA-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15765059#comment-15765059
]
Pascal Essiembre commented on TIKA-2219:
BTW, I tested and can confirm you fix work
[
https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15765348#comment-15765348
]
Pascal Essiembre edited comment on TIKA-1946 at 12/20/16 9:51 PM:
---
[
https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15765348#comment-15765348
]
Pascal Essiembre commented on TIKA-1946:
I finally had a bit of time to port the Wo
[
https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766083#comment-15766083
]
Pascal Essiembre commented on TIKA-1946:
It now throws a TikaException as you sugge
[
https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767581#comment-15767581
]
Pascal Essiembre commented on TIKA-1946:
I am OK to remove it as you are correct, .
[
https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767599#comment-15767599
]
Pascal Essiembre commented on TIKA-1946:
I noticed you have some corporate copyrigh
[
https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767641#comment-15767641
]
Pascal Essiembre commented on TIKA-1946:
You are welcome! I am glad to contribute t
[
https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767676#comment-15767676
]
Pascal Essiembre commented on TIKA-1946:
Thanks!
> Add mime detection and parser f
[
https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767695#comment-15767695
]
Pascal Essiembre commented on TIKA-1946:
I am not sure when I may have time to bene
Pascal Essiembre created TIKA-:
--
Summary: Contributing a XFDL Parser
Key: TIKA-
URL: https://issues.apache.org/jira/browse/TIKA-
Project: Tika
Issue Type: Improvement
C
[
https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767725#comment-15767725
]
Pascal Essiembre commented on TIKA-1946:
H2 works for me. I downloaded the files y
[
https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767939#comment-15767939
]
Pascal Essiembre commented on TIKA-1946:
So what would be the percentage that are p
[
https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768100#comment-15768100
]
Pascal Essiembre commented on TIKA-1946:
I also checked. Looks like a version issue
[
https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768146#comment-15768146
]
Pascal Essiembre commented on TIKA-1946:
WordPerfect extensions vary quite a bit.
[
https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768146#comment-15768146
]
Pascal Essiembre edited comment on TIKA-1946 at 12/21/16 8:57 PM:
---
[
https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768146#comment-15768146
]
Pascal Essiembre edited comment on TIKA-1946 at 12/21/16 9:02 PM:
---
[
https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768146#comment-15768146
]
Pascal Essiembre edited comment on TIKA-1946 at 12/21/16 9:03 PM:
---
[
https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pascal Essiembre updated TIKA-1946:
---
Attachment: TIKA-1946-pascal.essiembre-01.patch
I created a patch that will now throw a TikaExc
[
https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pascal Essiembre updated TIKA-1946:
---
Attachment: wordperfect_signatures_by_versions.xlsx
In case you are curious, I am attaching a s
[
https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768959#comment-15768959
]
Pascal Essiembre commented on TIKA-1946:
I also like the idea of an {{UnsupportedFo
[
https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769350#comment-15769350
]
Pascal Essiembre commented on TIKA-1946:
FYI, I found relevant information about 5.
[
https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15770744#comment-15770744
]
Pascal Essiembre commented on TIKA-1946:
This is code imported from one of our exis
Pascal Essiembre created TIKA-2228:
--
Summary: WordPerfect parser update to support 5.x
Key: TIKA-2228
URL: https://issues.apache.org/jira/browse/TIKA-2228
Project: Tika
Issue Type: Improveme
[
https://issues.apache.org/jira/browse/TIKA-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15800363#comment-15800363
]
Pascal Essiembre commented on TIKA-2230:
Sure!
> Add paragraph markup to WordPerf
Pascal Essiembre created TIKA-2232:
--
Summary: Add JBIG2 image parsing support
Key: TIKA-2232
URL: https://issues.apache.org/jira/browse/TIKA-2232
Project: Tika
Issue Type: New Feature
[
https://issues.apache.org/jira/browse/TIKA-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pascal Essiembre updated TIKA-2232:
---
Component/s: (was: detector)
> Add JBIG2 image parsing support
> --
[
https://issues.apache.org/jira/browse/TIKA-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15818741#comment-15818741
]
Pascal Essiembre commented on TIKA-2232:
Either way. I think the most important is
[
https://issues.apache.org/jira/browse/TIKA-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15995451#comment-15995451
]
Pascal Essiembre commented on TIKA-2352:
Found the cause. My assumption was wrong
[
https://issues.apache.org/jira/browse/TIKA-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15995549#comment-15995549
]
Pascal Essiembre commented on TIKA-2352:
Must have got lost in the mail! :-) I ju
[
https://issues.apache.org/jira/browse/TIKA-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15995584#comment-15995584
]
Pascal Essiembre commented on TIKA-2352:
No problem. I'd be curious to know how ma
[
https://issues.apache.org/jira/browse/TIKA-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15997216#comment-15997216
]
Pascal Essiembre commented on TIKA-2352:
I had time to look further at one of the f
[
https://issues.apache.org/jira/browse/TIKA-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15997238#comment-15997238
]
Pascal Essiembre commented on TIKA-2352:
FYI, "commoncrawl2_likely_broken\W4\W4YNRC
[
https://issues.apache.org/jira/browse/TIKA-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15997267#comment-15997267
]
Pascal Essiembre commented on TIKA-2352:
I also checked some of the QuatroPro ones,
Pascal Essiembre created TIKA-2530:
--
Summary: OutlookExtractor "buffer underrun" when parsing .msg with
embedded .msg
Key: TIKA-2530
URL: https://issues.apache.org/jira/browse/TIKA-2530
Project: Tika
[
https://issues.apache.org/jira/browse/TIKA-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351487#comment-16351487
]
Pascal Essiembre commented on TIKA-2490:
I still believe the warnings should be the
Pascal Essiembre created TIKA-2922:
--
Summary: Regression issue with detecting .dotx and .xlam MS Office
mime-types
Key: TIKA-2922
URL: https://issues.apache.org/jira/browse/TIKA-2922
Project: Tika
56 matches
Mail list logo