[jira] [Commented] (TIKA-717) Comment/annotation is sometimes not extracted

2011-10-03 Thread Michael McCandless (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13119237#comment-13119237 ] Michael McCandless commented on TIKA-717: - RTF and PPT are now extracting comments

[jira] [Created] (TIKA-738) Tika fails to extract text from PDF annotations

2011-10-03 Thread Michael McCandless (Created) (JIRA)
Tika fails to extract text from PDF annotations --- Key: TIKA-738 URL: https://issues.apache.org/jira/browse/TIKA-738 Project: Tika Issue Type: Bug Components: parser

[jira] [Resolved] (TIKA-717) Comment/annotation is sometimes not extracted

2011-10-03 Thread Michael McCandless (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved TIKA-717. - Resolution: Fixed Fix Version/s: 1.0 Comment/annotation is sometimes not

[jira] [Commented] (TIKA-738) Tika fails to extract text from PDF annotations

2011-10-03 Thread Michael McCandless (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13119269#comment-13119269 ] Michael McCandless commented on TIKA-738: - I moved the failing (but ignored) test

Newb: IDE + Maven?

2011-10-03 Thread Albert Law (Logik)
Hi All, I'm a Tika newb but I have some format expertise with RTF, PDF, DOC, PPT, XLS, OOXML, blah blah blah. I want to contribute to Tika. Unfortunately, I'm finding this Maven thing hard to use in conjunction with Eclipse and m2e. I can do everything from command-line, but I want to do it in

Re: Newb: IDE + Maven?

2011-10-03 Thread Jukka Zitting
Hi, On Mon, Oct 3, 2011 at 4:46 PM, Nick Burch nick.bu...@alfresco.com wrote: What I tend to do is build the project with maven on the command line, then unpack the tika-bundle jar. Then, I add a regular (non maven) project to eclipse, and add the jars from the bundle as dependencies by

[jira] [Commented] (TIKA-722) Arabic PDF doesn't extract correctly

2011-10-03 Thread Robert Muir (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13119403#comment-13119403 ] Robert Muir commented on TIKA-722: -- Actually in this case the original TTF font (AxtManal)

[jira] [Resolved] (TIKA-722) Arabic PDF doesn't extract correctly

2011-10-03 Thread Michael McCandless (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved TIKA-722. - Resolution: Won't Fix OK resolving as Won't Fix. I don't see how Tika can recover when

[jira] [Issue Comment Edited] (TIKA-733) [PATCH] RTF TextExtractor processGroupEnd() NoSuchElementException

2011-10-03 Thread Jeremy Anderson (Issue Comment Edited) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13119441#comment-13119441 ] Jeremy Anderson edited comment on TIKA-733 at 10/3/11 6:01 PM: ---

[jira] [Commented] (TIKA-733) [PATCH] RTF TextExtractor processGroupEnd() NoSuchElementException

2011-10-03 Thread Jeremy Anderson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13119441#comment-13119441 ] Jeremy Anderson commented on TIKA-733: -- The problem is also present in the older 0.9

[jira] [Issue Comment Edited] (TIKA-733) [PATCH] RTF TextExtractor processGroupEnd() NoSuchElementException

2011-10-03 Thread Jeremy Anderson (Issue Comment Edited) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13119441#comment-13119441 ] Jeremy Anderson edited comment on TIKA-733 at 10/3/11 6:02 PM: ---

[jira] [Issue Comment Edited] (TIKA-733) [PATCH] RTF TextExtractor processGroupEnd() NoSuchElementException

2011-10-03 Thread Jeremy Anderson (Issue Comment Edited) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13119441#comment-13119441 ] Jeremy Anderson edited comment on TIKA-733 at 10/3/11 6:03 PM: ---

[jira] [Issue Comment Edited] (TIKA-733) [PATCH] RTF TextExtractor processGroupEnd() NoSuchElementException

2011-10-03 Thread Jeremy Anderson (Issue Comment Edited) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13119441#comment-13119441 ] Jeremy Anderson edited comment on TIKA-733 at 10/3/11 6:04 PM: ---

[jira] [Issue Comment Edited] (TIKA-733) [PATCH] RTF TextExtractor processGroupEnd() NoSuchElementException

2011-10-03 Thread Jeremy Anderson (Issue Comment Edited) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13119441#comment-13119441 ] Jeremy Anderson edited comment on TIKA-733 at 10/3/11 6:06 PM: ---

[jira] [Commented] (TIKA-733) [PATCH] RTF TextExtractor processGroupEnd() NoSuchElementException

2011-10-03 Thread Michael McCandless (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13119462#comment-13119462 ] Michael McCandless commented on TIKA-733: - Actually, I think we should just commit

[jira] [Resolved] (TIKA-733) [PATCH] RTF TextExtractor processGroupEnd() NoSuchElementException

2011-10-03 Thread Michael McCandless (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved TIKA-733. - Resolution: Fixed Thanks Jeremy! [PATCH] RTF TextExtractor

[jira] [Resolved] (TIKA-711) Word parser doesn't extract optional hyphen correctly

2011-10-03 Thread Michael McCandless (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved TIKA-711. - Resolution: Fixed Fix Version/s: 1.0 Word parser doesn't extract optional

[jira] [Created] (TIKA-739) For certain DWG files, the Tika content parser outputs garbage

2011-10-03 Thread John Bartak (Created) (JIRA)
For certain DWG files, the Tika content parser outputs garbage -- Key: TIKA-739 URL: https://issues.apache.org/jira/browse/TIKA-739 Project: Tika Issue Type: Bug

[jira] [Updated] (TIKA-739) For certain DWG files, the Tika content parser outputs garbage

2011-10-03 Thread John Bartak (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Bartak updated TIKA-739: - Attachment: 3D Dacor Modern Kitchen.dwg File that seems to be causing Tike problems For

[jira] [Updated] (TIKA-739) For certain DWG files, the Tika content parser outputs garbage

2011-10-03 Thread John Bartak (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Bartak updated TIKA-739: - Attachment: SolrErrorMsg.png The error message displayed when getting this file back in search results

[jira] [Commented] (TIKA-739) For certain DWG files, the Tika content parser outputs garbage

2011-10-03 Thread John Bartak (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13119614#comment-13119614 ] John Bartak commented on TIKA-739: -- Not entirely sure what version I'm using. I'm using

[jira] [Updated] (TIKA-739) For certain DWG files, the Tika content parser outputs garbage

2011-10-03 Thread John Bartak (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Bartak updated TIKA-739: - Attachment: screenshot-1.jpg For certain DWG files, the Tika content parser outputs garbage

[jira] [Commented] (TIKA-739) For certain DWG files, the Tika content parser outputs garbage

2011-10-03 Thread John Bartak (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13119730#comment-13119730 ] John Bartak commented on TIKA-739: -- Just downloaded 0.10 and tried extracting the file in

[jira] [Issue Comment Edited] (TIKA-739) For certain DWG files, the Tika content parser outputs garbage

2011-10-03 Thread John Bartak (Issue Comment Edited) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13119730#comment-13119730 ] John Bartak edited comment on TIKA-739 at 10/3/11 11:24 PM: Just

[jira] [Commented] (TIKA-733) [PATCH] RTF TextExtractor processGroupEnd() NoSuchElementException

2011-10-03 Thread Jeremy Anderson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13119815#comment-13119815 ] Jeremy Anderson commented on TIKA-733: -- Cool beans!! Thanks for your attention to it.

[jira] [Created] (TIKA-740) SAX parser used for HTML

2011-10-03 Thread Erik Hetzner (Created) (JIRA)
SAX parser used for HTML Key: TIKA-740 URL: https://issues.apache.org/jira/browse/TIKA-740 Project: Tika Issue Type: Bug Components: parser Affects Versions: 1.0 Reporter: Erik Hetzner

[jira] [Created] (TIKA-741) Make Zip bomb (XML nesting) detection level configurable?

2011-10-03 Thread Erik Hetzner (Created) (JIRA)
Make Zip bomb (XML nesting) detection level configurable? --- Key: TIKA-741 URL: https://issues.apache.org/jira/browse/TIKA-741 Project: Tika Issue Type: New Feature