[jira] [Resolved] (TIKA-667) Changes to RFC822Parser to support turning off strict parsing

2011-08-21 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-667. Resolution: Fixed Assignee: Jukka Zitting Thanks! Patch committed in revision 1160018. Note th

[jira] [Commented] (TIKA-648) Parsing HTML anchors with embedded div faulty

2011-08-21 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088381#comment-13088381 ] Jukka Zitting commented on TIKA-648: This seems to be a result of TagSoup normalizing th

[jira] [Updated] (TIKA-692) TikaCLI -x or -h on a Word doc sometimes adds newline after tag

2011-08-21 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated TIKA-692: Attachment: TIKA-692-pretty-print.patch bq. though I'd name the option --pretty-print for con

[jira] [Resolved] (TIKA-677) Installing Tika 0.9 using Maven fails tests

2011-08-21 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-677. Resolution: Duplicate Fix Version/s: (was: 1.0) Resolving as a duplicate of TIKA-551. > I

[jira] [Commented] (TIKA-676) Boilerpipe fails

2011-08-21 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088377#comment-13088377 ] Jukka Zitting commented on TIKA-676: We can only update the boilerpipe dependency once t

[jira] [Commented] (TIKA-692) TikaCLI -x or -h on a Word doc sometimes adds newline after tag

2011-08-21 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088375#comment-13088375 ] Jukka Zitting commented on TIKA-692: {quote} -prettyPrint option {quote} Sounds OK to m

[jira] [Commented] (TIKA-692) TikaCLI -x or -h on a Word doc sometimes adds newline after tag

2011-08-21 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088372#comment-13088372 ] Michael McCandless commented on TIKA-692: - {quote} bq. You mean because the other (

[jira] [Resolved] (TIKA-447) Container aware mimetype detection

2011-08-21 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-447. Resolution: Fixed Fix Version/s: 1.0 As suggested above, I moved the detector classes from o.a

[jira] [Resolved] (TIKA-692) TikaCLI -x or -h on a Word doc sometimes adds newline after tag

2011-08-21 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-692. Resolution: Fixed Assignee: Jukka Zitting I committed all the patches, thus resolving this as f

[jira] [Commented] (TIKA-692) TikaCLI -x or -h on a Word doc sometimes adds newline after tag

2011-08-21 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088360#comment-13088360 ] Michael McCandless commented on TIKA-692: - bq. The content emitted by many parsers

Re: Preview of Rich Documents

2011-08-21 Thread Jukka Zitting
Hi, On Sat, Aug 20, 2011 at 3:39 PM, nirnaydewan wrote: > Now, further expanding Doc A node, i want to show a preview of the whole > text like it was in document with formatting and all. > > Is it really possible? Because to what i have known till now is that, only > the text is extracted and sto