[
https://issues.apache.org/jira/browse/TIKA-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16194435#comment-16194435
]
Matthew Caruana Galizia edited comment on TIKA-2473 at 10/6/17 10:42 AM:
[
https://issues.apache.org/jira/browse/TIKA-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16194435#comment-16194435
]
Matthew Caruana Galizia commented on TIKA-2473:
---
Magic:
byte 0: x0A
byte 1:
Matthew Caruana Galizia created TIKA-2473:
-
Summary: PCX and DCX image support
Key: TIKA-2473
URL: https://issues.apache.org/jira/browse/TIKA-2473
Project: Tika
Issue Type: Improvemen
[
https://issues.apache.org/jira/browse/TIKA-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthew Caruana Galizia updated TIKA-2471:
--
Attachment: mbox
Reduced test case attached. The result of parsing this file will
Matthew Caruana Galizia created TIKA-2471:
-
Summary: Tab-prefixed message body lines in Mbox interpreted as
headers
Key: TIKA-2471
URL: https://issues.apache.org/jira/browse/TIKA-2471
Project:
[
https://issues.apache.org/jira/browse/TIKA-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150718#comment-16150718
]
Matthew Caruana Galizia commented on TIKA-2219:
---
Thanks for getting back. Sho
[
https://issues.apache.org/jira/browse/TIKA-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthew Caruana Galizia updated TIKA-2219:
--
Attachment: test.txt
This file contains x92 characters which should force detecti
[
https://issues.apache.org/jira/browse/TIKA-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149673#comment-16149673
]
Matthew Caruana Galizia commented on TIKA-2219:
---
[~talli...@mitre.org] I thin
Matthew Caruana Galizia created TIKA-2455:
-
Summary: Flag in metadata for alternative email bodies
Key: TIKA-2455
URL: https://issues.apache.org/jira/browse/TIKA-2455
Project: Tika
Is
[
https://issues.apache.org/jira/browse/TIKA-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16148117#comment-16148117
]
Matthew Caruana Galizia commented on TIKA-2454:
---
I don't know if the same thi
[
https://issues.apache.org/jira/browse/TIKA-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147907#comment-16147907
]
Matthew Caruana Galizia commented on TIKA-2454:
---
I agree with you. The fact t
[
https://issues.apache.org/jira/browse/TIKA-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147657#comment-16147657
]
Matthew Caruana Galizia commented on TIKA-2444:
---
I have no idea. I'm trying t
Matthew Caruana Galizia created TIKA-2454:
-
Summary: Emails extracted from PSTs detected as unexpected file
types
Key: TIKA-2454
URL: https://issues.apache.org/jira/browse/TIKA-2454
Project: T
[
https://issues.apache.org/jira/browse/TIKA-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147632#comment-16147632
]
Matthew Caruana Galizia commented on TIKA-2450:
---
Thank you, that looks like a
Matthew Caruana Galizia created TIKA-2453:
-
Summary: Corrupt MBOX file detected as text/plain
Key: TIKA-2453
URL: https://issues.apache.org/jira/browse/TIKA-2453
Project: Tika
Issue T
[
https://issues.apache.org/jira/browse/TIKA-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147347#comment-16147347
]
Matthew Caruana Galizia commented on TIKA-2450:
---
When you put it that way, th
[
https://issues.apache.org/jira/browse/TIKA-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147331#comment-16147331
]
Matthew Caruana Galizia commented on TIKA-2450:
---
OK, with that in mind then I
[
https://issues.apache.org/jira/browse/TIKA-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147303#comment-16147303
]
Matthew Caruana Galizia commented on TIKA-2450:
---
I would argue that the raiso
Matthew Caruana Galizia created TIKA-2450:
-
Summary: OfficeParser.parse called for zero-byte file with .doc
extension
Key: TIKA-2450
URL: https://issues.apache.org/jira/browse/TIKA-2450
Projec
[
https://issues.apache.org/jira/browse/TIKA-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthew Caruana Galizia updated TIKA-2444:
--
Attachment: balloon.j2c
Example JP2K codestream file attached.
> JP2 codestream
Matthew Caruana Galizia created TIKA-2444:
-
Summary: JP2 codestream files not parsed
Key: TIKA-2444
URL: https://issues.apache.org/jira/browse/TIKA-2444
Project: Tika
Issue Type: Bug
[
https://issues.apache.org/jira/browse/TIKA-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16106136#comment-16106136
]
Matthew Caruana Galizia commented on TIKA-2436:
---
To give you an example of wh
[
https://issues.apache.org/jira/browse/TIKA-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16106135#comment-16106135
]
Matthew Caruana Galizia commented on TIKA-2436:
---
The difference is that the f
[
https://issues.apache.org/jira/browse/TIKA-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthew Caruana Galizia updated TIKA-2436:
--
Attachment: image004.emz
Example EMZ file attached. Common Compress will yield an
Matthew Caruana Galizia created TIKA-2436:
-
Summary: Support for GZIP-compressed EMF files
Key: TIKA-2436
URL: https://issues.apache.org/jira/browse/TIKA-2436
Project: Tika
Issue Type
[
https://issues.apache.org/jira/browse/TIKA-879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthew Caruana Galizia updated TIKA-879:
-
Attachment: mbox_email_section.txt
As described in TIKA-2042, the attached file [^mb
[
https://issues.apache.org/jira/browse/TIKA-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085847#comment-16085847
]
Matthew Caruana Galizia edited comment on TIKA-2042 at 7/13/17 3:13 PM:
-
[
https://issues.apache.org/jira/browse/TIKA-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthew Caruana Galizia updated TIKA-2042:
--
Attachment: mbox_email_section.txt
Sample of one of the message sections from the
[
https://issues.apache.org/jira/browse/TIKA-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085842#comment-16085842
]
Matthew Caruana Galizia commented on TIKA-2042:
---
[~gagravarr] thank you - tha
[
https://issues.apache.org/jira/browse/TIKA-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085709#comment-16085709
]
Matthew Caruana Galizia edited comment on TIKA-2042 at 7/13/17 2:22 PM:
-
[
https://issues.apache.org/jira/browse/TIKA-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthew Caruana Galizia updated TIKA-2042:
--
Attachment: mbox_header.txt
Header attached with identifying information stripped
[
https://issues.apache.org/jira/browse/TIKA-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085709#comment-16085709
]
Matthew Caruana Galizia commented on TIKA-2042:
---
I'd like to ask for this iss
[
https://issues.apache.org/jira/browse/TIKA-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16078012#comment-16078012
]
Matthew Caruana Galizia commented on TIKA-2399:
---
OK. I can't think of any oth
[
https://issues.apache.org/jira/browse/TIKA-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16076092#comment-16076092
]
Matthew Caruana Galizia commented on TIKA-2399:
---
Their response:
bq. I would
[
https://issues.apache.org/jira/browse/TIKA-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16074918#comment-16074918
]
Matthew Caruana Galizia commented on TIKA-2399:
---
I've emailed Unidata to ask
[
https://issues.apache.org/jira/browse/TIKA-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16074881#comment-16074881
]
Matthew Caruana Galizia commented on TIKA-2399:
---
Tim, see https://github.com/
[
https://issues.apache.org/jira/browse/TIKA-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16073660#comment-16073660
]
Matthew Caruana Galizia commented on TIKA-2399:
---
Wouldn't it be better to war
[
https://issues.apache.org/jira/browse/TIKA-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056159#comment-16056159
]
Matthew Caruana Galizia commented on TIKA-2399:
---
Their response:
bq. Thanks
[
https://issues.apache.org/jira/browse/TIKA-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16055541#comment-16055541
]
Matthew Caruana Galizia commented on TIKA-2399:
---
I had emailed Unidata in Feb
[
https://issues.apache.org/jira/browse/TIKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16050226#comment-16050226
]
Matthew Caruana Galizia commented on TIKA-2394:
---
I remember seeing how to ove
[
https://issues.apache.org/jira/browse/TIKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16050226#comment-16050226
]
Matthew Caruana Galizia edited comment on TIKA-2394 at 6/15/17 9:28 AM:
-
[
https://issues.apache.org/jira/browse/TIKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthew Caruana Galizia updated TIKA-2394:
--
Affects Version/s: 1.15
Labels: container email pst (was: )
Matthew Caruana Galizia created TIKA-2394:
-
Summary: "Unknown message type"
Key: TIKA-2394
URL: https://issues.apache.org/jira/browse/TIKA-2394
Project: Tika
Issue Type: Bug
[
https://issues.apache.org/jira/browse/TIKA-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16044297#comment-16044297
]
Matthew Caruana Galizia commented on TIKA-2389:
---
Please don't move this to in
[
https://issues.apache.org/jira/browse/TIKA-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15929105#comment-15929105
]
Matthew Caruana Galizia commented on TIKA-1195:
---
[~talli...@mitre.org] d'you
[
https://issues.apache.org/jira/browse/TIKA-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthew Caruana Galizia closed TIKA-2280.
-
Resolution: Duplicate
> message_from not extracted from Outlook emails
> --
[
https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890800#comment-15890800
]
Matthew Caruana Galizia commented on TIKA-1865:
---
Thank you, this is a big imp
[
https://issues.apache.org/jira/browse/TIKA-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890324#comment-15890324
]
Matthew Caruana Galizia commented on TIKA-2235:
---
Ah, good catch. OCR'ing inli
[
https://issues.apache.org/jira/browse/TIKA-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890106#comment-15890106
]
Matthew Caruana Galizia commented on TIKA-2235:
---
In the majority of cases, JP
[
https://issues.apache.org/jira/browse/TIKA-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15889192#comment-15889192
]
Matthew Caruana Galizia commented on TIKA-2280:
---
OK, so this is a duplicate t
[
https://issues.apache.org/jira/browse/TIKA-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthew Caruana Galizia updated TIKA-2280:
--
Description:
While the MESSAGE_FROM metadata field is extracted for both RFC and
Matthew Caruana Galizia created TIKA-2280:
-
Summary: message_from not extracted from Outlook emails
Key: TIKA-2280
URL: https://issues.apache.org/jira/browse/TIKA-2280
Project: Tika
I
[
https://issues.apache.org/jira/browse/TIKA-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15887559#comment-15887559
]
Matthew Caruana Galizia commented on TIKA-2274:
---
Thanks fot checking up on th
Matthew Caruana Galizia created TIKA-2274:
-
Summary: and metadata collision
Key: TIKA-2274
URL: https://issues.apache.org/jira/browse/TIKA-2274
Project: Tika
Issue Type: Bug
[
https://issues.apache.org/jira/browse/TIKA-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15829856#comment-15829856
]
Matthew Caruana Galizia commented on TIKA-2245:
---
So should we agree that pars
[
https://issues.apache.org/jira/browse/TIKA-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthew Caruana Galizia updated TIKA-2245:
--
Description:
Tika parsers sometimes use Log4j's Logger, sometimes the JUL
(java.
Matthew Caruana Galizia created TIKA-2245:
-
Summary: Standardise on java.util.Logging
Key: TIKA-2245
URL: https://issues.apache.org/jira/browse/TIKA-2245
Project: Tika
Issue Type: Imp
[
https://issues.apache.org/jira/browse/TIKA-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15823691#comment-15823691
]
Matthew Caruana Galizia commented on TIKA-2232:
---
Could we at least log a warn
[
https://issues.apache.org/jira/browse/TIKA-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15818176#comment-15818176
]
Matthew Caruana Galizia commented on TIKA-2235:
---
Yes, I am already! Thanks fo
Matthew Caruana Galizia created TIKA-2235:
-
Summary: Use Tesseract's recommended DPI for PDF images
Key: TIKA-2235
URL: https://issues.apache.org/jira/browse/TIKA-2235
Project: Tika
I
Matthew Caruana Galizia created TIKA-2221:
-
Summary: poi.EncryptedDocumentException not wrapped in
tika.exception.EncryptedDocumentException
Key: TIKA-2221
URL: https://issues.apache.org/jira/browse/TIKA-2
[
https://issues.apache.org/jira/browse/TIKA-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15701919#comment-15701919
]
Matthew Caruana Galizia commented on TIKA-2175:
---
The problem was OpenCL suppo
[
https://issues.apache.org/jira/browse/TIKA-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15696377#comment-15696377
]
Matthew Caruana Galizia commented on TIKA-2175:
---
Still no joy, both with my b
[
https://issues.apache.org/jira/browse/TIKA-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15663830#comment-15663830
]
Matthew Caruana Galizia commented on TIKA-1896:
---
Perhaps we should push ahead
[
https://issues.apache.org/jira/browse/TIKA-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15653441#comment-15653441
]
Matthew Caruana Galizia commented on TIKA-2175:
---
I've filed [an
issue|https:
[
https://issues.apache.org/jira/browse/TIKA-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15653430#comment-15653430
]
Matthew Caruana Galizia commented on TIKA-2174:
---
Thank you! I've also confirm
[
https://issues.apache.org/jira/browse/TIKA-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15651347#comment-15651347
]
Matthew Caruana Galizia commented on TIKA-2174:
---
That issue went away once I
[
https://issues.apache.org/jira/browse/TIKA-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15650892#comment-15650892
]
Matthew Caruana Galizia commented on TIKA-2174:
---
Both on inline and independe
[
https://issues.apache.org/jira/browse/TIKA-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthew Caruana Galizia updated TIKA-2174:
--
Description:
A complete install of Leptonica with Tesseract will add support for
Matthew Caruana Galizia created TIKA-2174:
-
Summary: JP2 and JPX (JPEG 2000) support not declared by
TesseractOCRParser
Key: TIKA-2174
URL: https://issues.apache.org/jira/browse/TIKA-2174
Proj
[
https://issues.apache.org/jira/browse/TIKA-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15644455#comment-15644455
]
Matthew Caruana Galizia commented on TIKA-2167:
---
[~talli...@mitre.org] to rep
[
https://issues.apache.org/jira/browse/TIKA-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthew Caruana Galizia updated TIKA-2167:
--
Attachment: simple.tiff
> Image processing causes OCR to fail
> -
Matthew Caruana Galizia created TIKA-2167:
-
Summary: Image processing causes OCR to fail
Key: TIKA-2167
URL: https://issues.apache.org/jira/browse/TIKA-2167
Project: Tika
Issue Type:
[
https://issues.apache.org/jira/browse/TIKA-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15638978#comment-15638978
]
Matthew Caruana Galizia commented on TIKA-1896:
---
[~talli...@mitre.org] did yo
[
https://issues.apache.org/jira/browse/TIKA-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184969#comment-15184969
]
Matthew Caruana Galizia edited comment on TIKA-1896 at 3/8/16 2:33 PM:
--
[
https://issues.apache.org/jira/browse/TIKA-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184969#comment-15184969
]
Matthew Caruana Galizia commented on TIKA-1896:
---
TagSoup handles this well in
Matthew Caruana Galizia created TIKA-1896:
-
Summary: Invalid closing script tag not handled gracefully by
HtmlParser
Key: TIKA-1896
URL: https://issues.apache.org/jira/browse/TIKA-1896
Project
[
https://issues.apache.org/jira/browse/TIKA-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthew Caruana Galizia updated TIKA-1896:
--
Attachment: test.html
> Invalid closing script tag not handled gracefully by Html
78 matches
Mail list logo