RE: Full Text Search for MS Word and Excel files?

Martin.Wallmer Thu, 26 Feb 2004 22:18:33 -0800

Hi Ryan,



> -----Original Message-----
> From: Ryan Rhodes [mailto:[EMAIL PROTECTED]
> Sent: Mittwoch, 25. Februar 2004 17:41
> To: [EMAIL PROTECTED]
> Subject: RE: Full Text Search for MS Word and Excel files?
> 
> 
> Let me see if I understand something.  Search normally finds 
> resources based 
> on their properties.  When you use CONTAINS in your search, it checks 
> against the actual contents of the resource for a match.  
> Does that sound 
> right?

Yes. Content and properties are definitly different things. 


> 
> DASL is the protocol level search while 
> org.apache.slide.search is the Java 
> API for doing searches.  Can you do all the same types of 
> searches with 
> either one of these methods?

No. Some computed properties might only exist in WebDAV context.

> 
> Well, I'm hoping I'll get to do something like that...
> 
> 
> >
> 
> So, with the Lucene Index I get something like?
> 
> webapp --> org.apache.slide.search --> ContentStore --> WordDocIndexer

In the "store driven" indexing framework (different to the "event driven" stuff,
we still have to look how to bring them together :-) it looks like:


                              ==> ContentStore 
                              |
PUT (UPDATE, DELETE) ==> ParentStore 
                              |
                              ==> ContentIndexer


SEARCH ==> org.apache.slide.search ==> WordContentIndexer

So the content store is not affected in this scenario.
         

> 
> Now, if I pull the text out of a word doc and create a Lucene 
> index with it. 
>   Does that mean my index is a new piece of content?  
No
> Or is 
> my index just a 
> property of the original word doc?  Or is an Index just 
> something separate 
> that is only related to searches that use CONTAINS?
Yes
> 


                              ==> ContentStore ==> my.doc
                              |
PUT (UPDATE, DELETE) ==> ParentStore 
                              |
                              ==> WordContentExtractor ==> ContentIndexer

The text your extractor produces is the input for Lucene. This is not content
data, it is only used for searching.


Best regards,
Martin 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Full Text Search for MS Word and Excel files?

Reply via email to