Re: Preview of Rich Documents

2011-08-22 Thread nirnaydewan
Thanks much for your suggestion. But for the XHTML output, i believe that is one time process while extraction is being done. That means again i have to store/index that xhtml output text as well for later use. Is this correct or am i missing something? Regards -- View this message in contex

Re: Tika 0.9 integration in Solr 3.3.0

2011-08-22 Thread nirnaydewan
Please let me know how can i get rid of this exception. -- View this message in context: http://lucene.472066.n3.nabble.com/Tika-0-9-integration-in-Solr-3-3-0-tp3267799p3274463.html Sent from the Apache Tika - Development mailing list archive at Nabble.com.

Re: Tika 0.9 integration in Solr 3.3.0

2011-08-22 Thread Jukka Zitting
Hi, On Mon, Aug 22, 2011 at 11:08 AM, nirnaydewan wrote: > Please let me know how can i get rid of this exception. It looks like you have a dependency version mismatch. Instead of POI version -3.8-beta3-20110606 use the earlier 3.6 version as listed in the Tika 0.9 getting started page [1]. If

Re: Preview of Rich Documents

2011-08-22 Thread Jukka Zitting
Hi, On Mon, Aug 22, 2011 at 11:06 AM, nirnaydewan wrote: > But for the XHTML output, i believe that is one time process while > extraction is being done. That means again i have to store/index that xhtml > output text as well for later use. Is this correct or am i missing > something? Correct, y

[jira] [Created] (TIKA-693) Incorrent mime-type for .pptm, .ppsm and .ppsx in OOXMLParser

2011-08-22 Thread Maxim Valyanskiy (JIRA)
Incorrent mime-type for .pptm, .ppsm and .ppsx in OOXMLParser - Key: TIKA-693 URL: https://issues.apache.org/jira/browse/TIKA-693 Project: Tika Issue Type: Bug Components:

[jira] [Resolved] (TIKA-693) Incorrect mime-type for .pptm, .ppsm and .ppsx in OOXMLParser

2011-08-22 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy resolved TIKA-693. --- Resolution: Fixed Committed revision 1160216. > Incorrect mime-type for .pptm, .ppsm and .pps

[jira] [Updated] (TIKA-693) Incorrect mime-type for .pptm, .ppsm and .ppsx in OOXMLParser

2011-08-22 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy updated TIKA-693: -- Summary: Incorrect mime-type for .pptm, .ppsm and .ppsx in OOXMLParser (was: Incorrent mime-type

http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException

2011-08-22 Thread prince shah
Hi Geeks, I am new to Open source community and wanted to start with Tika project. I checkout latest version of Tika (1160218). Then went to my tika-site and hit mvn install (I have mac) it download bunch of stuff and in the end it spit out following exceptions. Can any one help me. Is there any

Re: Appending Mime Types

2011-08-22 Thread Nick Burch
On Thu, 18 Aug 2011, Tom Grant wrote: Is there a way to programmatically register new Mime Types? I think the expectation was that people finding gaps would open a new jira entry, and list the details of these mimetypes and then everyone would benefit from them! There shouldn't be many case

Re: http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException

2011-08-22 Thread Oleg Tikhonov
Hey, and welcome to the Tika. Using Eclipse you would better download an eclipse plug-in: http://m2eclipse.sonatype.org/sites/m2e Having downloaded and installed plug-in, your next step could be importing Tika project like that: ' *File* ->* Import* -> *Existing Maven Project* ' ... However, i

Re: Appending Mime Types

2011-08-22 Thread Tom Grant
Here's the use case that I'm attempting to solve. I have a customer with many legacy systems, some of which are completely custom. These systems have data files that will never be seen outside of their environment. For example, some are XML files with their own schemas. Some are similar to the

[jira] [Updated] (TIKA-683) RTF Parser issues with non european characters

2011-08-22 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated TIKA-683: Attachment: testWORD_bold_character_runs2.docx testWORD_bold_character_runs.do

[jira] [Commented] (TIKA-683) RTF Parser issues with non european characters

2011-08-22 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089072#comment-13089072 ] Michael McCandless commented on TIKA-683: - Sorry, wrong issue -- that last patch was