I'm not familiar with the searching facilities so far, but I think there is a big difference between searching and sorting/filtering. So lucene seems to be perfect for searching the repository (properties and content). What we need (and what is somehow harder to achieve without db) is some kind of sorting/filtering of the search results. I think of the following use cases: - Search the repository for the last x uploaded documents sorted by date - Find all documents since xxx containing yyy - Find the documents containing xxx in the labeled revision yyy - Find only documents the user is allowed to read - What about searching within transactions? Should the search be transaction aware (so that the user finds documents uploaded within a transaction but other users don't)? I've no idea how to handle this, but I think this are some things to think about... Daniel
Am Montag, 19. Januar 2004 15:12 schrieb Michael Oliver: > On Mon, 2004-01-19 at 06:32, Stefano Mazzocchi wrote: > > On 18 Jan 2004, at 22:12, Christophe wrote: > > > Stefano Mazzocchi wrote: > > >>> If you store your properties in one store (eg. DB) and used index > > >>> store engine for content search, I expected to have some performance > > >>> issues when you search on prop and content. > > >> > > >> hmmm, not sure I follow you, can you elaborate on this more? it would > > >> be very appreciated. > > > > > > How do you make a query that used criteria on properties and full text > > > search? > > > > eh, good question :-) > > > > > If the properties/metadata are in a DB and content is tokenized into a > > > index engine like Lucene. First, you need to select rows from DB > > > tables and makes a second query into the index store to query on the > > > content itself. > > > For this kind of scenario (search on prop AND full text search), I > > > expect only one query via Lucene will be faster. Lucene can store > > > properties that will not be tokenized. Anyway, it is not a ideal > > > situation because properties have to be duplicate into 2 differents > > > stores. So, I don't know what will be the best solution ! > > > > I think we are attacking the problem from the wrong angle: first we > > need to collect usecases, then we need to find a way to make the > > usecase possible. > > > > I personally wouldn't know how to make use of a query against full text > > *and* properties. This is because such a query looks weird to me: > > full-text is the least structure possible (get me everything but I > > don't know where) while properties tent to be very much structured > > (last modified time, author, and so on). > > > > There is a decades long discussion on what is data and what is metadata > > and I don't want to touch that with a stick, but I think that if you > > need to do full-text search on your metadata there is something wrong. > > Stefano with all due respect, there is nothing wrong with a full-text > search on metadata because metadata in this case can be any properties > of any of the resources in the repository and that meta data can be free > form text. > > consider a search query like > > doctype="memo" and description contains "Fire Stefano" and contents > contains "January" > > doctype and description are properties with string values that would be > indexed and matched with the same index as the contents. > > Everybody doesn't use the Database Stores, some actually preter the XML > Stores so an index of the XML should be full text, yes? > > > But this is my very personal vision, of course, and I would like to see > > what other usecases or scenarios others can come up with before stating > > where to go. > > > > >>> Anyway, Do you have some idea to optimize the current search service > > >>> ? > > >> > > >> I havn't looked into this yet (I'm still lagging behind on some other > > >> issues with my project and I havn't attacked this part yet). > > >> > > >> The idea is to use an RDBMS as much as possible on all content that > > >> can be turned relational without major issues (and normally metadata > > >> fits this category). As for full-text search, I agree that there is > > >> no way to beat an engine like lucene. > > > > > > Agree ! I understand your point of view, the best way to query on > > > properties is certainly the classic select statment but if you need an > > > index/search engine to for full-text search, I don't know. > > > > I personally had this vision before: DASL allows you to select the > > search language. We already provide the DASL basic-search, nothing > > stops us from coming up with an entirely new lucene-influenced > > full-text language that works only on the files contents. > > > > So, you do different queries depending on how you want to treat the > > content. > > > > > Furthermore, like Erik explains in a previous mail, you can write some > > > filter to apply security rules. So, in one query makes in only one > > > store , you can filters on props, content and security rules. > > > Can you do that without storing properties into the search engine ? > > > I'm curious :-) > > For clarity indexing properties as they go into the store isn't the same > as storing properties into the search engine/index. In other words the > index of the properties and content just needs access to the data as it > is being stored and doesn't impact the stores beyond the call, and that > can be minimized with an indexing queue that can be done asynchronously. > > > You could, I think, but it would be tremendously slow compared. > > > > > It should be interesting to compare in more detail both solution, > > > makes performance tests, ... > > > > > >>> Why not to support both situation : either inder the prop or not ? > > >> > > >> You mean with a global configuration or more granularely? > > > > > > Still thinking on that :-) The idea is to use the domain.xml file to > > > define how to make the query on props and options used for the full > > > text search. > > > > I think we need to attack the store/indexing problem from the scenario > > angle down... or we'll go around in circles for a long time. of course, > > I'm not talking about Slide 2.0 but something to do after the release > > is done. > > I completely agree, a few scenarios/stories should be the first step and > I hope the example I gave above fits in that category. > > Ollie > > > > Thanks for this mail, > > > > You are welcome. > > > > -- > > Stefano. > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
