[
https://issues.apache.org/jira/browse/TIKA-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated TIKA-2632:
Description:
I recently started to analyze randomly govdocs1 files that could not be
recognized by T
[
https://issues.apache.org/jira/browse/TIKA-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated TIKA-2632:
Description:
I recently started to analyze randomly govdocs1 files that could not be
recognized by T
[
https://issues.apache.org/jira/browse/TIKA-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated TIKA-2632:
Description:
I recently started to analyze randomly govdocs1 files that could not be
recognized by T
[
https://issues.apache.org/jira/browse/TIKA-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442059#comment-16442059
]
Andreas Meier edited comment on TIKA-2632 at 4/18/18 7:46 AM:
--
[
https://issues.apache.org/jira/browse/TIKA-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442059#comment-16442059
]
Andreas Meier commented on TIKA-2632:
-
Thanks for the link [~talli...@mitre.org]
Glad
[
https://issues.apache.org/jira/browse/TIKA-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated TIKA-2632:
Description:
I recently started to analyze randomly govdocs1 files that could not be
recognized by T
[
https://issues.apache.org/jira/browse/TIKA-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated TIKA-2632:
Description:
I recently started to analyze randomly govdocs1 files that could not be
recognized by T
[
https://issues.apache.org/jira/browse/TIKA-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated TIKA-2632:
Description:
I recently started to analyze randomly govdocs1 files that could not be
recognized by T
[
https://issues.apache.org/jira/browse/TIKA-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated TIKA-2632:
Description:
I recently started to analyze randomly govdocs1 files that could not be
recognized by T
[
https://issues.apache.org/jira/browse/TIKA-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated TIKA-2632:
Description:
I recently started to analyze randomly govdocs1 files that could not be
recognized by T
[
https://issues.apache.org/jira/browse/TIKA-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated TIKA-2632:
Description:
I recently started to analyze randomly govdocs1 files that could not be
recognized by T
Andreas Meier created TIKA-2632:
---
Summary: Analyze unknown govdocs files
Key: TIKA-2632
URL: https://issues.apache.org/jira/browse/TIKA-2632
Project: Tika
Issue Type: Improvement
Re
Andreas Meier created TIKA-2629:
---
Summary: Add image/x-dpx media-type detection
Key: TIKA-2629
URL: https://issues.apache.org/jira/browse/TIKA-2629
Project: Tika
Issue Type: Improvement
Andreas Meier created TIKA-2628:
---
Summary: Add image/aces media-type detection
Key: TIKA-2628
URL: https://issues.apache.org/jira/browse/TIKA-2628
Project: Tika
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/TIKA-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16418926#comment-16418926
]
Andreas Meier commented on TIKA-2619:
-
Can confirm this OutOfMemoryError in Version 1.1
[
https://issues.apache.org/jira/browse/TIKA-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16409192#comment-16409192
]
Andreas Meier commented on TIKA-2609:
-
Emacs 18 and earlier testfiles can be found unde
[
https://issues.apache.org/jira/browse/TIKA-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16409181#comment-16409181
]
Andreas Meier commented on TIKA-2611:
-
As [~gagravarr] already mentioned you should try
Andreas Meier created TIKA-2609:
---
Summary: Refine Emacs Lisp file recognition (.elc)
Key: TIKA-2609
URL: https://issues.apache.org/jira/browse/TIKA-2609
Project: Tika
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/TIKA-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16400138#comment-16400138
]
Andreas Meier commented on TIKA-2574:
-
Link to the original published specification tak
[
https://issues.apache.org/jira/browse/TIKA-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16400071#comment-16400071
]
Andreas Meier edited comment on TIKA-2602 at 3/15/18 8:28 AM:
--
[
https://issues.apache.org/jira/browse/TIKA-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated TIKA-2602:
Attachment: VERSION_Test
> iCalendar not properly recognized as text/calendar
> -
[
https://issues.apache.org/jira/browse/TIKA-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16400071#comment-16400071
]
Andreas Meier commented on TIKA-2602:
-
Unfortunately the above mentioned mime-type brok
[
https://issues.apache.org/jira/browse/TIKA-2607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16398247#comment-16398247
]
Andreas Meier commented on TIKA-2607:
-
[~talli...@mitre.org] I hope you don't mind that
Andreas Meier created TIKA-2607:
---
Summary: Exchange levigo-jbig2-imageio with
pdfbox-jbig2-imageio:3.0.0
Key: TIKA-2607
URL: https://issues.apache.org/jira/browse/TIKA-2607
Project: Tika
Issue
Andreas Meier created TIKA-2603:
---
Summary: application/x-iso9660-image extraktion
Key: TIKA-2603
URL: https://issues.apache.org/jira/browse/TIKA-2603
Project: Tika
Issue Type: New Feature
[
https://issues.apache.org/jira/browse/TIKA-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16391054#comment-16391054
]
Andreas Meier commented on TIKA-2602:
-
The following mime-type will recognize all testf
[
https://issues.apache.org/jira/browse/TIKA-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16390988#comment-16390988
]
Andreas Meier commented on TIKA-2602:
-
Thanks for the response, Nick.
On my search for
Andreas Meier created TIKA-2602:
---
Summary: iCalendar not properly recognized as text/calendar
Key: TIKA-2602
URL: https://issues.apache.org/jira/browse/TIKA-2602
Project: Tika
Issue Type: Impro
[
https://issues.apache.org/jira/browse/TIKA-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16389180#comment-16389180
]
Andreas Meier commented on TIKA-2576:
-
I'm glad I could help.
> Add application/zstd d
[
https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16386017#comment-16386017
]
Andreas Meier edited comment on TIKA-2592 at 3/5/18 12:46 PM:
--
[
https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16386017#comment-16386017
]
Andreas Meier edited comment on TIKA-2592 at 3/5/18 12:44 PM:
--
[
https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16386017#comment-16386017
]
Andreas Meier commented on TIKA-2592:
-
Thanks Tim, but I think I will just download the
[
https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated TIKA-2592:
Attachment: StandardCharsets_unsupported_by_IANA.txt
> HTML with charset unicode handled as utf-16 in
[
https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383350#comment-16383350
]
Andreas Meier edited comment on TIKA-2592 at 3/2/18 10:56 AM:
--
[
https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated TIKA-2592:
Attachment: TestHTMLCharsetCP1256.html
TestHTMLCharsetArabicCP1256.html
> HTML with c
[
https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383350#comment-16383350
]
Andreas Meier commented on TIKA-2592:
-
{quote}
Before making this kind of change (defau
[
https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16381667#comment-16381667
]
Andreas Meier commented on TIKA-2592:
-
Thanks for your response [~kkrugler]
You are ri
Andreas Meier created TIKA-2594:
---
Summary: Mail detected as application/xhtml+xml
Key: TIKA-2594
URL: https://issues.apache.org/jira/browse/TIKA-2594
Project: Tika
Issue Type: Bug
Affects V
[
https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380106#comment-16380106
]
Andreas Meier commented on TIKA-2592:
-
Attached a sample patch to set UTF-8 as default
[
https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated TIKA-2592:
Attachment: fix-for-TIKA2592-contributed-by-Andreas-Meier.patch
> HTML with charset unicode handled a
Andreas Meier created TIKA-2592:
---
Summary: HTML with charset unicode handled as utf-16 instead utf-8
Key: TIKA-2592
URL: https://issues.apache.org/jira/browse/TIKA-2592
Project: Tika
Issue Type
Andreas Meier created TIKA-2587:
---
Summary: DKIM signed mails recognized as text/plain
Key: TIKA-2587
URL: https://issues.apache.org/jira/browse/TIKA-2587
Project: Tika
Issue Type: Bug
[
https://issues.apache.org/jira/browse/TIKA-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated TIKA-2578:
Component/s: detector
> Mails not recognized when unknown X-headers are present
> ---
Andreas Meier created TIKA-2578:
---
Summary: Mails not recognized when unknown X-headers are present
Key: TIKA-2578
URL: https://issues.apache.org/jira/browse/TIKA-2578
Project: Tika
Issue Type:
Andreas Meier created TIKA-2576:
---
Summary: Add application/zstd detection and parser
Key: TIKA-2576
URL: https://issues.apache.org/jira/browse/TIKA-2576
Project: Tika
Issue Type: Improvement
Andreas Meier created TIKA-2574:
---
Summary: Extend PCX detection in tika-mimetypes.xml
Key: TIKA-2574
URL: https://issues.apache.org/jira/browse/TIKA-2574
Project: Tika
Issue Type: Sub-task
Andreas Meier created TIKA-2557:
---
Summary: .mbox detected as text/html
Key: TIKA-2557
URL: https://issues.apache.org/jira/browse/TIKA-2557
Project: Tika
Issue Type: Bug
Components: co
[
https://issues.apache.org/jira/browse/TIKA-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16340729#comment-16340729
]
Andreas Meier commented on TIKA-2527:
-
Added a patch ([^fix-for-binhexmatch-TIKA2527-co
[
https://issues.apache.org/jira/browse/TIKA-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated TIKA-2527:
Attachment: fix-for-binhexmatch-TIKA2527-contributed-by-AMeier.patch
> Typos in tika-mimetypes.xml
>
[
https://issues.apache.org/jira/browse/TIKA-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16337643#comment-16337643
]
Andreas Meier commented on TIKA-2527:
-
Added another patch (enhancement-for-TIKA2527-co
[
https://issues.apache.org/jira/browse/TIKA-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated TIKA-2527:
Attachment: enhancement-for-TIKA2527-contributed-by-AMeier.patch
> Typos in tika-mimetypes.xml
>
[
https://issues.apache.org/jira/browse/TIKA-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16337531#comment-16337531
]
Andreas Meier commented on TIKA-2527:
-
I attached a patch to address the mentioned prob
[
https://issues.apache.org/jira/browse/TIKA-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated TIKA-2527:
Attachment: fix-for-TIKA2527-contributed-by-AMeier-Fixed-adpcmmi.patch
> Typos in tika-mimetypes.xml
[
https://issues.apache.org/jira/browse/TIKA-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated TIKA-2527:
Flags: Patch
Affects Version/s: 1.18
2.0
> Typos in tika-mimet
[
https://issues.apache.org/jira/browse/TIKA-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated TIKA-2527:
Affects Version/s: 1.17
> Typos in tika-mimetypes.xml
> ---
>
>
[
https://issues.apache.org/jira/browse/TIKA-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16306071#comment-16306071
]
Andreas Meier commented on TIKA-2527:
-
I don't know whether I shall open another ticket
[
https://issues.apache.org/jira/browse/TIKA-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16304468#comment-16304468
]
Andreas Meier commented on TIKA-2527:
-
Found another suspect:
{code:xml}
ESRI Shap
[
https://issues.apache.org/jira/browse/TIKA-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated TIKA-2527:
Description:
Are these mimetypes in tika-mimetypes.xml
audio/x-adbcm instead audio/x-adpcm
{code:xm
[
https://issues.apache.org/jira/browse/TIKA-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated TIKA-2527:
Description:
Are these typos in tika-mimetypes.xml
audio/x-dec-adbcm instead audio/x-dec-adpcm
{co
[
https://issues.apache.org/jira/browse/TIKA-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated TIKA-2527:
Description:
Are these mimetypes in tika-mimetypes.xml
audio/x-dec-adbcm instead audio/x-dec-adpcm
Andreas Meier created TIKA-2527:
---
Summary: Typos in tika-mimetypes.xml
Key: TIKA-2527
URL: https://issues.apache.org/jira/browse/TIKA-2527
Project: Tika
Issue Type: Bug
Components: co
[
https://issues.apache.org/jira/browse/TIKA-2484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16241633#comment-16241633
]
Andreas Meier edited comment on TIKA-2484 at 11/7/17 8:03 AM:
--
[
https://issues.apache.org/jira/browse/TIKA-2484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16241633#comment-16241633
]
Andreas Meier commented on TIKA-2484:
-
Thanks for the info [~gagravarr]
I think I unde
[
https://issues.apache.org/jira/browse/TIKA-2484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16240338#comment-16240338
]
Andreas Meier commented on TIKA-2484:
-
Would be great if you could try to get the Chars
[
https://issues.apache.org/jira/browse/TIKA-2484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated TIKA-2484:
Description:
I would like to help to improve the recognition accuracy of the CharsetDetector.
Theref
[
https://issues.apache.org/jira/browse/TIKA-2484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated TIKA-2484:
Attachment: IUC10-ar.UTF-7.with-BOM
IUC10-ar.UTF-7.without-BOM
IUC10-a
Andreas Meier created TIKA-2484:
---
Summary: Improve CharsetDetector to recognize
UTF-16LE/BE,UTF-32LE/BE and UTF-7 with/without BOMs correctly
Key: TIKA-2484
URL: https://issues.apache.org/jira/browse/TIKA-2484
67 matches
Mail list logo