[jira] [Closed] (TIKA-2269) NPE with FeedParser

2017-02-21 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed TIKA-2269. --- thanks for committing [~talli...@mitre.org] > NPE with FeedParser > --- > >

[jira] [Created] (TIKA-2269) NPE with FeedParser

2017-02-20 Thread Julien Nioche (JIRA)
Julien Nioche created TIKA-2269: --- Summary: NPE with FeedParser Key: TIKA-2269 URL: https://issues.apache.org/jira/browse/TIKA-2269 Project: Tika Issue Type: Bug Components: parser

[jira] [Commented] (TIKA-1599) Switch from TagSoup to JSoup

2015-12-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049239#comment-15049239 ] Julien Nioche commented on TIKA-1599: - Hi [~talli...@mitre.org] Haven't kept a log of specific

[jira] [Commented] (TIKA-1599) Switch from TagSoup to JSoup

2015-12-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049248#comment-15049248 ] Julien Nioche commented on TIKA-1599: - Don't think that this is the version they use now.

[jira] [Commented] (TIKA-1599) Switch from TagSoup to JSoup

2015-04-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487012#comment-14487012 ] Julien Nioche commented on TIKA-1599: - FWIW we've just added a JSoup based parser to

[jira] [Commented] (TIKA-1302) Let's run Tika against a large batch of docs nightly

2014-11-28 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228305#comment-14228305 ] Julien Nioche commented on TIKA-1302: - FYI have extracted data from the CommonCrawl

[jira] [Commented] (TIKA-1302) Let's run Tika against a large batch of docs nightly

2014-11-26 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226336#comment-14226336 ] Julien Nioche commented on TIKA-1302: - Hi [~talli...@apache.org] It would be easy to do

[jira] [Commented] (TIKA-595) HtmlHandler does not support multivalue metadata

2014-11-19 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217749#comment-14217749 ] Julien Nioche commented on TIKA-595: Thanks Dave! HtmlHandler does not support

[jira] [Updated] (TIKA-595) HtmlHandler does not support multivalue metadata

2014-11-07 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated TIKA-595: --- Attachment: TIKA-595.patch Any reason why we wouldn't want to have multiple values in the metadata if

[jira] [Updated] (TIKA-595) HtmlHandler does not support multivalue metadata

2014-11-07 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated TIKA-595: --- Fix Version/s: 1.7 HtmlHandler does not support multivalue metadata

[jira] [Commented] (TIKA-1302) Let's run Tika against a large batch of docs nightly

2014-05-19 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14001612#comment-14001612 ] Julien Nioche commented on TIKA-1302: - How large do you want that batch to be? If we

[jira] [Created] (TIKA-696) Extract watermarks from Word documents

2011-08-23 Thread Julien Nioche (JIRA)
Extract watermarks from Word documents -- Key: TIKA-696 URL: https://issues.apache.org/jira/browse/TIKA-696 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 0.9

[jira] [Updated] (TIKA-696) Extract watermarks from Word documents

2011-08-23 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated TIKA-696: --- Attachment: Demo with watermark.doc Attached doc file containing a watermark Extract watermarks from

[jira] [Commented] (TIKA-696) Extract watermarks from Word documents

2011-08-23 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089480#comment-13089480 ] Julien Nioche commented on TIKA-696: Can't see the watermark when saving and reopening

[jira] [Updated] (TIKA-696) Extract watermarks from Word documents

2011-08-23 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated TIKA-696: --- Attachment: Demo+with+watermark.docx .docx version generated with MS Office Can't see the watermark

[jira] [Assigned] (TIKA-657) Email parser gets into trouble on malformed html in enron corpus

2011-05-21 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche reassigned TIKA-657: -- Assignee: Julien Nioche Email parser gets into trouble on malformed html in enron corpus

[jira] [Commented] (TIKA-657) Email parser gets into trouble on malformed html in enron corpus

2011-05-08 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13030467#comment-13030467 ] Julien Nioche commented on TIKA-657: Good idea. We need more tutorials and example for

[jira] [Commented] (TIKA-649) NPE while parsing a .docx

2011-04-28 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026266#comment-13026266 ] Julien Nioche commented on TIKA-649: Sorry, should have tested on the trunk as well.

[jira] [Created] (TIKA-649) NPE while parsing a .docx

2011-04-27 Thread Julien Nioche (JIRA)
NPE while parsing a .docx --- Key: TIKA-649 URL: https://issues.apache.org/jira/browse/TIKA-649 Project: Tika Issue Type: Bug Components: parser Affects Versions: 0.9 Reporter: Julien

[jira] Created: (TIKA-612) Specify PDFBox options via ParseContext

2011-03-09 Thread Julien Nioche (JIRA)
Specify PDFBox options via ParseContext Key: TIKA-612 URL: https://issues.apache.org/jira/browse/TIKA-612 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 0.9

[jira] Closed: (TIKA-611) PDFParser mixes the text from separate columns

2011-03-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed TIKA-611. -- PDFParser mixes the text from separate columns --

[jira] Assigned: (TIKA-597) Bogus exception handler in org.apache.tika.parser.mail.MailContentHandler.body(BodyDescriptor, InputStream)

2011-03-02 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche reassigned TIKA-597: -- Assignee: Julien Nioche (was: Chris A. Mattmann) Bogus exception handler in

[jira] Resolved: (TIKA-597) Bogus exception handler in org.apache.tika.parser.mail.MailContentHandler.body(BodyDescriptor, InputStream)

2011-03-02 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved TIKA-597. Resolution: Fixed Fix Version/s: 1.0 Committed revision 1076300 Thanks Benson Bogus

[jira] Commented: (TIKA-461) RFC822 messages not parsed

2010-11-30 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12965271#action_12965271 ] Julien Nioche commented on TIKA-461: Benjamin, thanks for your patch. Could you generate

[jira] Commented: (TIKA-461) RFC822 messages not parsed

2010-11-30 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12965286#action_12965286 ] Julien Nioche commented on TIKA-461: patch -p1 failed peb...@lucid-vostro:/data/tika$

[jira] Commented: (TIKA-461) RFC822 messages not parsed

2010-11-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930180#action_12930180 ] Julien Nioche commented on TIKA-461: Nope. I was planning to refactor the parser first

[jira] Commented: (TIKA-461) RFC822 messages not parsed

2010-09-28 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915708#action_12915708 ] Julien Nioche commented on TIKA-461: Nick, Thanks for taking the time to review my

[jira] Updated: (TIKA-461) RFC822 messages not parsed

2010-09-06 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated TIKA-461: --- Attachment: TIKA-461.patch This patch contains an initial version of the RFC822Parser which uses

[jira] Closed: (TIKA-466) Feed Parser

2010-07-20 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed TIKA-466. -- Feed Parser --- Key: TIKA-466 URL:

[jira] Commented: (TIKA-147) Add Flash parser

2010-07-19 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12889883#action_12889883 ] Julien Nioche commented on TIKA-147: There is http://www.jswiff.com/licensing/ which

[jira] Created: (TIKA-466) Feed Parser

2010-07-16 Thread Julien Nioche (JIRA)
Feed Parser --- Key: TIKA-466 URL: https://issues.apache.org/jira/browse/TIKA-466 Project: Tika Issue Type: New Feature Components: parser Reporter: Julien Nioche Priority: Minor

[jira] Updated: (TIKA-466) Feed Parser

2010-07-16 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated TIKA-466: --- Attachment: TIKA-466.patch Feed Parser --- Key: TIKA-466

[jira] Commented: (TIKA-460) HTMLHandler misses treatment of A elements

2010-07-13 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887718#action_12887718 ] Julien Nioche commented on TIKA-460: this would work if we had a in the list of safe

[jira] Closed: (TIKA-454) Illegal Charset Name crashes HTMLParser

2010-07-05 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed TIKA-454. -- Resolution: Fixed Committed revision 960487 Illegal Charset Name crashes HTMLParser

[jira] Commented: (TIKA-433) Tika + Hadoop

2010-05-26 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12871544#action_12871544 ] Julien Nioche commented on TIKA-433: You can do that with

[jira] Commented: (TIKA-430) Automatically let all valid XHTML 1.0 attributes through from HTML documents

2010-05-26 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12871585#action_12871585 ] Julien Nioche commented on TIKA-430: The method mapSafeAttribute(String elementName,