Hi Ryan,
you are exactly right. I didn't implement the ContentExtractor yet, because it makes no sense to do it in the way the property extractors works.
As you stated the content extractor only makes sense in combination with an indexer.
It was my plan to build an indexing framework, but had no time to do it. The LuceneIndex by Christophe is not checked in yet, because it is not integrated into all of the DASL stuff. So it is not possible to search the content via webdav by using this index.
If you want to perform server side queries only, it might be a choice to use this indexer and to integrate the ContentExtractor you are thinking of.
But in long term we need the 'big' solution that integrates indexing, extracting and DASL.
Regards,


Daniel


ryan wrote:

I tried to build a content extractor to pull the text from MS Word docs.

It looks like the PropertyExtractorTrigger is fired by the event
framework when a node is created or stored, and then it calls the
ExtractorManager to get all the PropertyExtractors associated with the
node that changed and adds the extracted properties to the node.

event framework --> PropertyExtractorTrigger --> ExtractorManager -->
PropertyExtractor


I don't think the ContentExtractor is getting called at all now. I was thinking it probably can't be a ContentExtractorTrigger, because there isn't anywhere to store the extracted content on the node. I think it will probably have to call ExtractorManager from LuceneIndex. Something like:

IndexTrigger --> LuceneIndex --> ExtractorManager --> ContentExtractor

Does this sound correct?


I found this LuceneIndex posted by Christophe, but I don't think it is checked into CVS. I believe you can index fields in Lucene that are not actually stored as content. I would like to try and add the content extractor code to the LuceneIndex. Does anyone know the status of the LuceneIndex?

http://www.mail-archive.com/[EMAIL PROTECTED]/msg09091.html

Thanks,

Ryan Rhodes






--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to