Extracting metadata only

Sergiy Shyrkov Thu, 22 Apr 2010 01:43:48 -0700

Hello,

I would like to find out if there is any possibility to only extractmetadata from the document (without the content).We are using Tika to parse and index content of files in JCR repository(in Jackrabbit; we are extending its indexing part) and would like tosplit the process of extracting metadata (will be indexed immediately)and complete file content (indexing will be postponed to a later timeinto a dedicated background task).I see in the Parser implementations for different formats that it is notalways possible to extract metadata without completely parsing thedocument, but e.g. PDFParser is able to do it without parsing content.I was trying to find the answer in the mailing list, but have notsucceeded so far.

Has anyone had similar requirements and was able to solve this (byextending each parser, creating an own implementation of the contenthandler etc.)?

I will appreciate any help as I am new to the Tika.

Kind regards
Sergiy Shyrkov

Extracting metadata only

Reply via email to