[
https://issues.apache.org/jira/browse/TIKA-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche closed TIKA-2269.
---
thanks for committing [~talli...@mitre.org]
> NPE with FeedParser
> ---
>
>
Julien Nioche created TIKA-2269:
---
Summary: NPE with FeedParser
Key: TIKA-2269
URL: https://issues.apache.org/jira/browse/TIKA-2269
Project: Tika
Issue Type: Bug
Components: parser
[
https://issues.apache.org/jira/browse/TIKA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049239#comment-15049239
]
Julien Nioche commented on TIKA-1599:
-
Hi [~talli...@mitre.org]
Haven't kept a log of specific
[
https://issues.apache.org/jira/browse/TIKA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049248#comment-15049248
]
Julien Nioche commented on TIKA-1599:
-
Don't think that this is the version they use now.
[
https://issues.apache.org/jira/browse/TIKA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487012#comment-14487012
]
Julien Nioche commented on TIKA-1599:
-
FWIW we've just added a JSoup based parser to
[
https://issues.apache.org/jira/browse/TIKA-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228305#comment-14228305
]
Julien Nioche commented on TIKA-1302:
-
FYI have extracted data from the CommonCrawl
[
https://issues.apache.org/jira/browse/TIKA-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226336#comment-14226336
]
Julien Nioche commented on TIKA-1302:
-
Hi [~talli...@apache.org]
It would be easy to do
[
https://issues.apache.org/jira/browse/TIKA-595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217749#comment-14217749
]
Julien Nioche commented on TIKA-595:
Thanks Dave!
HtmlHandler does not support
[
https://issues.apache.org/jira/browse/TIKA-595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated TIKA-595:
---
Attachment: TIKA-595.patch
Any reason why we wouldn't want to have multiple values in the metadata if
[
https://issues.apache.org/jira/browse/TIKA-595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated TIKA-595:
---
Fix Version/s: 1.7
HtmlHandler does not support multivalue metadata
[
https://issues.apache.org/jira/browse/TIKA-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14001612#comment-14001612
]
Julien Nioche commented on TIKA-1302:
-
How large do you want that batch to be? If we
Extract watermarks from Word documents
--
Key: TIKA-696
URL: https://issues.apache.org/jira/browse/TIKA-696
Project: Tika
Issue Type: New Feature
Components: parser
Affects Versions: 0.9
[
https://issues.apache.org/jira/browse/TIKA-696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated TIKA-696:
---
Attachment: Demo with watermark.doc
Attached doc file containing a watermark
Extract watermarks from
[
https://issues.apache.org/jira/browse/TIKA-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089480#comment-13089480
]
Julien Nioche commented on TIKA-696:
Can't see the watermark when saving and reopening
[
https://issues.apache.org/jira/browse/TIKA-696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated TIKA-696:
---
Attachment: Demo+with+watermark.docx
.docx version generated with MS Office
Can't see the watermark
[
https://issues.apache.org/jira/browse/TIKA-657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche reassigned TIKA-657:
--
Assignee: Julien Nioche
Email parser gets into trouble on malformed html in enron corpus
[
https://issues.apache.org/jira/browse/TIKA-657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13030467#comment-13030467
]
Julien Nioche commented on TIKA-657:
Good idea. We need more tutorials and example for
[
https://issues.apache.org/jira/browse/TIKA-649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026266#comment-13026266
]
Julien Nioche commented on TIKA-649:
Sorry, should have tested on the trunk as well.
NPE while parsing a .docx
---
Key: TIKA-649
URL: https://issues.apache.org/jira/browse/TIKA-649
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 0.9
Reporter: Julien
Specify PDFBox options via ParseContext
Key: TIKA-612
URL: https://issues.apache.org/jira/browse/TIKA-612
Project: Tika
Issue Type: New Feature
Components: parser
Affects Versions: 0.9
[
https://issues.apache.org/jira/browse/TIKA-611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche closed TIKA-611.
--
PDFParser mixes the text from separate columns
--
[
https://issues.apache.org/jira/browse/TIKA-597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche reassigned TIKA-597:
--
Assignee: Julien Nioche (was: Chris A. Mattmann)
Bogus exception handler in
[
https://issues.apache.org/jira/browse/TIKA-597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved TIKA-597.
Resolution: Fixed
Fix Version/s: 1.0
Committed revision 1076300
Thanks Benson
Bogus
[
https://issues.apache.org/jira/browse/TIKA-461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12965271#action_12965271
]
Julien Nioche commented on TIKA-461:
Benjamin, thanks for your patch. Could you generate
[
https://issues.apache.org/jira/browse/TIKA-461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12965286#action_12965286
]
Julien Nioche commented on TIKA-461:
patch -p1 failed
peb...@lucid-vostro:/data/tika$
[
https://issues.apache.org/jira/browse/TIKA-461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930180#action_12930180
]
Julien Nioche commented on TIKA-461:
Nope. I was planning to refactor the parser first
[
https://issues.apache.org/jira/browse/TIKA-461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915708#action_12915708
]
Julien Nioche commented on TIKA-461:
Nick,
Thanks for taking the time to review my
[
https://issues.apache.org/jira/browse/TIKA-461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated TIKA-461:
---
Attachment: TIKA-461.patch
This patch contains an initial version of the RFC822Parser which uses
[
https://issues.apache.org/jira/browse/TIKA-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche closed TIKA-466.
--
Feed Parser
---
Key: TIKA-466
URL:
[
https://issues.apache.org/jira/browse/TIKA-147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12889883#action_12889883
]
Julien Nioche commented on TIKA-147:
There is http://www.jswiff.com/licensing/ which
Feed Parser
---
Key: TIKA-466
URL: https://issues.apache.org/jira/browse/TIKA-466
Project: Tika
Issue Type: New Feature
Components: parser
Reporter: Julien Nioche
Priority: Minor
[
https://issues.apache.org/jira/browse/TIKA-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated TIKA-466:
---
Attachment: TIKA-466.patch
Feed Parser
---
Key: TIKA-466
[
https://issues.apache.org/jira/browse/TIKA-460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887718#action_12887718
]
Julien Nioche commented on TIKA-460:
this would work if we had a in the list of safe
[
https://issues.apache.org/jira/browse/TIKA-454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche closed TIKA-454.
--
Resolution: Fixed
Committed revision 960487
Illegal Charset Name crashes HTMLParser
[
https://issues.apache.org/jira/browse/TIKA-433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12871544#action_12871544
]
Julien Nioche commented on TIKA-433:
You can do that with
[
https://issues.apache.org/jira/browse/TIKA-430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12871585#action_12871585
]
Julien Nioche commented on TIKA-430:
The method mapSafeAttribute(String elementName,
36 matches
Mail list logo