[
https://issues.apache.org/jira/browse/TIKA-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17706400#comment-17706400
]
Andrew Jackson commented on TIKA-3992:
--
Sounds interesting! Just wanted to note that
[
https://issues.apache.org/jira/browse/TIKA-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441077#comment-16441077
]
Andrew Jackson commented on TIKA-2632:
--
It would be great to see the old PowerPoint si
[
https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635032#comment-14635032
]
Andrew Jackson commented on TIKA-1678:
--
Sorry for the delay. Here are the results:
*
[
https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627960#comment-14627960
]
Andrew Jackson commented on TIKA-1678:
--
As far as I can tell, the PDF spec seems to im
[
https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627913#comment-14627913
]
Andrew Jackson commented on TIKA-1678:
--
I'm seeing this in about 220,000 out of 21,204
[
https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Jackson updated TIKA-1678:
-
Summary: PDF metadata extraction fails to spot UTF-16 encoded title (was:
PDF metadata extraction
Andrew Jackson created TIKA-1678:
Summary: PDF metadata extraction fails to spot UTF-16 encoded data
Key: TIKA-1678
URL: https://issues.apache.org/jira/browse/TIKA-1678
Project: Tika
Issue Ty
[
https://issues.apache.org/jira/browse/TIKA-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14368858#comment-14368858
]
Andrew Jackson commented on TIKA-1154:
--
Yes, thanks - that's the behaviour I'd hoped f
[
https://issues.apache.org/jira/browse/TIKA-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Jackson updated TIKA-1486:
-
Attachment: tika-mime-info-extensions-namespace.patch
The attached patch adds a namespace declarati
[
https://issues.apache.org/jira/browse/TIKA-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226415#comment-14226415
]
Andrew Jackson commented on TIKA-1302:
--
We have two more sets of data. One is the same
[
https://issues.apache.org/jira/browse/TIKA-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224745#comment-14224745
]
Andrew Jackson commented on TIKA-1486:
--
A-ha! I didn't notice the {{isregex="true"}} a
[
https://issues.apache.org/jira/browse/TIKA-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224714#comment-14224714
]
Andrew Jackson commented on TIKA-1486:
--
There's no problem with adding an XML namespac
Andrew Jackson created TIKA-1486:
Summary: Minor issues with the Tika MIME type magic file
Key: TIKA-1486
URL: https://issues.apache.org/jira/browse/TIKA-1486
Project: Tika
Issue Type: Improv
[
https://issues.apache.org/jira/browse/TIKA-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209757#comment-14209757
]
Andrew Jackson edited comment on TIKA-1302 at 11/13/14 1:42 PM:
-
[
https://issues.apache.org/jira/browse/TIKA-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209757#comment-14209757
]
Andrew Jackson commented on TIKA-1302:
--
[~talli...@apache.org] I've created a download
[
https://issues.apache.org/jira/browse/TIKA-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14186718#comment-14186718
]
Andrew Jackson commented on TIKA-1302:
--
Shall I go ahead and extract the XML errors? O
[
https://issues.apache.org/jira/browse/TIKA-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14178361#comment-14178361
]
Andrew Jackson edited comment on TIKA-1302 at 10/21/14 12:59 PM:
[
https://issues.apache.org/jira/browse/TIKA-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14178361#comment-14178361
]
Andrew Jackson commented on TIKA-1302:
--
Okay, so the c.300,000 exceptions are here:
h
[
https://issues.apache.org/jira/browse/TIKA-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14176934#comment-14176934
]
Andrew Jackson commented on TIKA-1302:
--
I have 2,358,167 errors from one collection (2
[
https://issues.apache.org/jira/browse/TIKA-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14176892#comment-14176892
]
Andrew Jackson commented on TIKA-1302:
--
At the UK Web Archive we run Apache Tika over
[
https://issues.apache.org/jira/browse/TIKA-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125384#comment-14125384
]
Andrew Jackson commented on TIKA-1232:
--
Looks like this is fixed and in the 1.6 releas
[
https://issues.apache.org/jira/browse/TIKA-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13920698#comment-13920698
]
Andrew Jackson commented on TIKA-1232:
--
Does anyone have a copy of Acrobat 9.1? That v
[
https://issues.apache.org/jira/browse/TIKA-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13908402#comment-13908402
]
Andrew Jackson commented on TIKA-1232:
--
Going by my original intention, then I would p
[
https://issues.apache.org/jira/browse/TIKA-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13900156#comment-13900156
]
Andrew Jackson commented on TIKA-1154:
--
I've had no response on the metadata-extractor
[
https://issues.apache.org/jira/browse/TIKA-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896697#comment-13896697
]
Andrew Jackson commented on TIKA-1232:
--
Multiple dc:formats appears to be a reasonable
[
https://issues.apache.org/jira/browse/TIKA-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13894376#comment-13894376
]
Andrew Jackson commented on TIKA-1232:
--
Great!
For (1), very happy for that code to g
[
https://issues.apache.org/jira/browse/TIKA-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13892210#comment-13892210
]
Andrew Jackson commented on TIKA-1232:
--
Yes, you can't identify > 1.7 PDF or the PDF/A
[
https://issues.apache.org/jira/browse/TIKA-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757042#comment-13757042
]
Andrew Jackson commented on TIKA-1170:
--
Fair point! Thanks for accepting the changes.
[
https://issues.apache.org/jira/browse/TIKA-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756981#comment-13756981
]
Andrew Jackson commented on TIKA-1170:
--
Thanks, that's great. If you prefer, you shoul
[
https://issues.apache.org/jira/browse/TIKA-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Jackson updated TIKA-1170:
-
Attachment: 0002-Added-example-malformed-HTML-file-that-was-being-mis.patch
This additional patch
[
https://issues.apache.org/jira/browse/TIKA-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756577#comment-13756577
]
Andrew Jackson commented on TIKA-1170:
--
I'm not sure that commit is right. I see this
[
https://issues.apache.org/jira/browse/TIKA-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Jackson updated TIKA-1170:
-
Attachment: 0001-Added-CGM-test-file-test-and-improved-magic.patch
Patch containing test file, tes
[
https://issues.apache.org/jira/browse/TIKA-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756064#comment-13756064
]
Andrew Jackson commented on TIKA-1170:
--
I was able to create an example file, using [G
[
https://issues.apache.org/jira/browse/TIKA-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Jackson updated TIKA-1170:
-
Attachment: plotutils-example.cgm
This is an example version 3 binary CGM file, generated using GN
[
https://issues.apache.org/jira/browse/TIKA-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756051#comment-13756051
]
Andrew Jackson commented on TIKA-1170:
--
My corpus is a chunk of the Internet Archive,
[
https://issues.apache.org/jira/browse/TIKA-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Jackson updated TIKA-1170:
-
Summary: Insufficiently specific magic for binary image/cgm files (was:
Possibly erroneous magic
Andrew Jackson created TIKA-1170:
Summary: Possibly erroneous magic for image/cgm files
Key: TIKA-1170
URL: https://issues.apache.org/jira/browse/TIKA-1170
Project: Tika
Issue Type: Bug
[
https://issues.apache.org/jira/browse/TIKA-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13719631#comment-13719631
]
Andrew Jackson commented on TIKA-1154:
--
Okay, I submitted an issue here:
https://code
[
https://issues.apache.org/jira/browse/TIKA-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13719594#comment-13719594
]
Andrew Jackson commented on TIKA-1154:
--
We could exclude the package from coming in vi
[
https://issues.apache.org/jira/browse/TIKA-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13719513#comment-13719513
]
Andrew Jackson commented on TIKA-1154:
--
Thanks for the stacktrace, which lead me to th
[
https://issues.apache.org/jira/browse/TIKA-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Jackson updated TIKA-1154:
-
Attachment: tika-breaker.html
This file makes tika hang. If you remove both of the binary characte
Andrew Jackson created TIKA-1154:
Summary: Tika hangs on format detection of malformed HTML file.
Key: TIKA-1154
URL: https://issues.apache.org/jira/browse/TIKA-1154
Project: Tika
Issue Type:
Andrew Jackson created TIKA-1117:
Summary: IWorkPackageParser should not close the InputStream
Key: TIKA-1117
URL: https://issues.apache.org/jira/browse/TIKA-1117
Project: Tika
Issue Type: Bu
[
https://issues.apache.org/jira/browse/TIKA-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429426#comment-13429426
]
Andrew Jackson commented on TIKA-970:
-
Hi, I noticed the updated version includes a bit
[
https://issues.apache.org/jira/browse/TIKA-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428116#comment-13428116
]
Andrew Jackson commented on TIKA-970:
-
He's added the Apache licence here:
https://gith
[
https://issues.apache.org/jira/browse/TIKA-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428108#comment-13428108
]
Andrew Jackson commented on TIKA-970:
-
I assume I'll need him to confirm an Apache 2 lic
[
https://issues.apache.org/jira/browse/TIKA-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428096#comment-13428096
]
Andrew Jackson commented on TIKA-970:
-
I should be able to sort that out. I know the aut
[
https://issues.apache.org/jira/browse/TIKA-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428085#comment-13428085
]
Andrew Jackson commented on TIKA-970:
-
BTW, this set of signatures rather clumsily repea
Andrew Jackson created TIKA-970:
---
Summary: Full identification of the JPEG 2000 family of formats
Key: TIKA-970
URL: https://issues.apache.org/jira/browse/TIKA-970
Project: Tika
Issue Type: New
[
https://issues.apache.org/jira/browse/TIKA-970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Jackson updated TIKA-970:
Attachment: custom-mimetype.xml
> Full identification of the JPEG 2000 family of formats
> --
[
https://issues.apache.org/jira/browse/TIKA-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Jackson updated TIKA-900:
Description: I have been testing Tika's ability to identify ISO9660 disk
image file systems, and disc
[
https://issues.apache.org/jira/browse/TIKA-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13259615#comment-13259615
]
Andrew Jackson commented on TIKA-900:
-
I re-uploaded the patch as it had an extra format
[
https://issues.apache.org/jira/browse/TIKA-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Jackson updated TIKA-900:
Description: I have been testing Tika's ability to identify ISO9660 disk
image file systems, and disc
[
https://issues.apache.org/jira/browse/TIKA-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Jackson updated TIKA-900:
Attachment: iso-image-detection.patch
Patch to fix ISO image magic, and extended the buffer size so t
[
https://issues.apache.org/jira/browse/TIKA-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Jackson updated TIKA-900:
Attachment: (was: iso-image-detection.patch)
> Tika fails to detect ISO9660 disk images
> ---
[
https://issues.apache.org/jira/browse/TIKA-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Jackson updated TIKA-900:
Attachment: iso-image-detection.patch
Patch to increase buffer size and fix ISO image detection.
Andrew Jackson created TIKA-900:
---
Summary: Tika fails to detect ISO9660 disk images
Key: TIKA-900
URL: https://issues.apache.org/jira/browse/TIKA-900
Project: Tika
Issue Type: Bug
Com
57 matches
Mail list logo