[ https://issues.apache.org/jira/browse/JCR-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jukka Zitting resolved JCR-1894. -------------------------------- Resolution: Incomplete Without an example document there's little we can do about this. See the Tika project (http://tika.apache.org/) for the text extraction functionality Jackrabbit nowadays uses, and file an issue at https://issues.apache.org/jira/browse/TIKA if the problem still occurs with Tika. > Word doc extraction problem > --------------------------- > > Key: JCR-1894 > URL: https://issues.apache.org/jira/browse/JCR-1894 > Project: Jackrabbit Content Repository > Issue Type: Bug > Components: jackrabbit-text-extractors > Affects Versions: core 1.4.3 > Environment: OS: Windows 2003 sp2 My-eclipse6.0 / tomcat 5.5 and > Athelon500+ > Reporter: Rajesh Upadhyay > > Hi, > I have a .doc file which contains data inside a table. Now i want to parse > the table to get the table values. Normal Parsing is not working for table( I > mean using String tokenizer) because it is giving some unwanted special > characters while parsing the table. So I just want to convert that .doc to > .txt file, then only it is easy to split the values. But i can't make it! Can > any one please tell me how to parse a MS WORD TABLE Values? > We need to know the process by which we can index a doc file excluding > special characters, > When we will show the excerpt then these special characters make it > unreadable. > Thanks in advance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.