Hi Ryan,
> -----Original Message----- > From: Ryan Rhodes [mailto:[EMAIL PROTECTED] > Sent: Mittwoch, 25. Februar 2004 17:41 > To: [EMAIL PROTECTED] > Subject: RE: Full Text Search for MS Word and Excel files? > > > Let me see if I understand something. Search normally finds > resources based > on their properties. When you use CONTAINS in your search, it checks > against the actual contents of the resource for a match. > Does that sound > right? Yes. Content and properties are definitly different things. > > DASL is the protocol level search while > org.apache.slide.search is the Java > API for doing searches. Can you do all the same types of > searches with > either one of these methods? No. Some computed properties might only exist in WebDAV context. > > Well, I'm hoping I'll get to do something like that... > > > > > > So, with the Lucene Index I get something like? > > webapp --> org.apache.slide.search --> ContentStore --> WordDocIndexer In the "store driven" indexing framework (different to the "event driven" stuff, we still have to look how to bring them together :-) it looks like: ==> ContentStore | PUT (UPDATE, DELETE) ==> ParentStore | ==> ContentIndexer SEARCH ==> org.apache.slide.search ==> WordContentIndexer So the content store is not affected in this scenario. > > Now, if I pull the text out of a word doc and create a Lucene > index with it. > Does that mean my index is a new piece of content? No > Or is > my index just a > property of the original word doc? Or is an Index just > something separate > that is only related to searches that use CONTAINS? Yes > ==> ContentStore ==> my.doc | PUT (UPDATE, DELETE) ==> ParentStore | ==> WordContentExtractor ==> ContentIndexer The text your extractor produces is the input for Lucene. This is not content data, it is only used for searching. Best regards, Martin --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
