On extraction, get properties AND / OR content extraction
---------------------------------------------------------

                 Key: TIKA-694
                 URL: https://issues.apache.org/jira/browse/TIKA-694
             Project: Tika
          Issue Type: Wish
          Components: parser
    Affects Versions: 0.9
         Environment: All OS

            Reporter: Etienne Jouvin
            Priority: Minor


I use TIKA to extract properties, and only, on Office files.
The parser goes throw the document content and this is not necessary and slow 
down the process.

It would be nice to have choice to extract only properties or not.

What I did was the following:
Extension of AutoDetectParser to override the parse method.
Then in the ParseContext instance, I put a flag with boolean true to say only 
extract the properties.

And for example, on office file, I extended OfficeParser class. During parse 
method, I check the flag, and if equals to true, I removed all the extraction 
from the content.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to