On 18 Jan 2004, at 22:12, Christophe wrote:
Stefano Mazzocchi wrote:
How do you make a query that used criteria on properties and full text search?
If you store your properties in one store (eg. DB) and used index store engine for content search, I expected to have some performance issues when you search on prop and content.
hmmm, not sure I follow you, can you elaborate on this more? it would be very appreciated.
eh, good question :-)
If the properties/metadata are in a DB and content is tokenized into a index engine like Lucene. First, you need to select rows from DB tables and makes a second query into the index store to query on the content itself.
For this kind of scenario (search on prop AND full text search), I expect only one query via Lucene will be faster. Lucene can store properties that will not be tokenized. Anyway, it is not a ideal situation because properties have to be duplicate into 2 differents stores. So, I don't know what will be the best solution !
I think we are attacking the problem from the wrong angle: first we need to collect usecases, then we need to find a way to make the usecase possible.
I personally wouldn't know how to make use of a query against full text *and* properties. This is because such a query looks weird to me: full-text is the least structure possible (get me everything but I don't know where) while properties tent to be very much structured (last modified time, author, and so on).
There is a decades long discussion on what is data and what is metadata and I don't want to touch that with a stick, but I think that if you need to do full-text search on your metadata there is something wrong.
But this is my very personal vision, of course, and I would like to see what other usecases or scenarios others can come up with before stating where to go.
Agree ! I understand your point of view, the best way to query on properties is certainly the classic select statment but if you need an index/search engine to for full-text search, I don't know.Anyway, Do you have some idea to optimize the current search service ?
I havn't looked into this yet (I'm still lagging behind on some other issues with my project and I havn't attacked this part yet).
The idea is to use an RDBMS as much as possible on all content that can be turned relational without major issues (and normally metadata fits this category). As for full-text search, I agree that there is no way to beat an engine like lucene.
I personally had this vision before: DASL allows you to select the search language. We already provide the DASL basic-search, nothing stops us from coming up with an entirely new lucene-influenced full-text language that works only on the files contents.
So, you do different queries depending on how you want to treat the content.
Furthermore, like Erik explains in a previous mail, you can write some filter to apply security rules. So, in one query makes in only one store , you can filters on props, content and security rules.
Can you do that without storing properties into the search engine ? I'm curious :-)
You could, I think, but it would be tremendously slow compared.
It should be interesting to compare in more detail both solution, makes performance tests, ...
Why not to support both situation : either inder the prop or not ?
You mean with a global configuration or more granularely?
Still thinking on that :-) The idea is to use the domain.xml file to define how to make the query on props and options used for the full text search.
I think we need to attack the store/indexing problem from the scenario angle down... or we'll go around in circles for a long time. of course, I'm not talking about Slide 2.0 but something to do after the release is done.
Thanks for this mail,
You are welcome.
-- Stefano.
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
