How to implement a ContentExtractor?

ryan Mon, 21 Jun 2004 00:14:52 -0700
I tried to build a content extractor to pull the text from MS Word docs.
 
It looks like the PropertyExtractorTrigger is fired by the event
framework when a node is created or stored, and then it calls the
ExtractorManager to get all the PropertyExtractors associated with the
node that changed and adds the extracted properties to the node.
 
event framework --> PropertyExtractorTrigger --> ExtractorManager -->
PropertyExtractor
 
 
I don't think the ContentExtractor is getting called at all now.  I was
thinking it probably can't be a ContentExtractorTrigger, because there
isn't anywhere to store the extracted content on the node.  I think it
will probably have to call ExtractorManager from LuceneIndex.  Something
like:
 
IndexTrigger --> LuceneIndex --> ExtractorManager --> ContentExtractor
 
Does this sound correct?
 
 
I found this LuceneIndex posted by Christophe, but I don't think it is
checked into CVS.  I believe you can index fields in Lucene that are not
actually stored as content.  I would like to try and add the content
extractor code to the LuceneIndex.  Does anyone know the status of the
LuceneIndex?
 
http://www.mail-archive.com/[EMAIL PROTECTED]/msg09091.html
 
Thanks,
 
Ryan Rhodes
How to implement a ContentExtractor?

Reply via email to