On Mon, 2004-01-19 at 08:45, Daniel Florey wrote:

> I'm not familiar with the searching facilities so far, but I think there is a 
> big difference between searching and sorting/filtering.
> So lucene seems to be perfect for searching the repository (properties and 
> content). What we need (and what is somehow harder to achieve without db) is 
> some kind of sorting/filtering of the search results.
> I think of the following use cases:
> - Search the repository for the last x uploaded documents sorted by date

Should be handled by Client/Presentation not part of Slide per se.

> - Find all documents since xxx containing yyy

Yes and should be possible if the metadata and content are indexed.

> - Find the documents containing xxx in the labeled revision yyy

Ditto

> - Find only documents the user is allowed to read

Security should filter/mask the results.  The index should likely NOT be
doing this, the result set should pass through a filter.

> - What about searching within transactions? Should the search be transaction 
> aware (so that the user finds documents uploaded within a transaction but 
> other users don't)?

No I think not, the normal paradigm is that the user owning the
transaction is doing the transaction and if the transaction is complete
it is committed and indexed and can be searched.  If the same user were
in different sessions the sessions would still prevent access to the
transactional info until it is comitted so even if there were some
reason to want this, I don't think it is feasible.

> I've no idea how to handle this, but I think this are some things to think 
> about...
> Daniel
> 
> Am Montag, 19. Januar 2004 15:12 schrieb Michael Oliver:
> > On Mon, 2004-01-19 at 06:32, Stefano Mazzocchi wrote:
> > > On 18 Jan 2004, at 22:12, Christophe wrote:
> > > > Stefano Mazzocchi wrote:
> > > >>> If you store your properties in one store (eg. DB) and used index
> > > >>> store engine for content search, I expected to have some performance
> > > >>> issues when you search on prop and content.
> > > >>
> > > >> hmmm, not sure I follow you, can you elaborate on this more? it would
> > > >> be very appreciated.
> > > >
> > > > How do you make a query that used criteria on properties and full text
> > > > search?
> > >
> > > eh, good question :-)
> > >
> > > > If the properties/metadata are in a DB and content is tokenized into a
> > > > index engine like Lucene. First, you need to select rows from DB
> > > > tables and makes a second query into the index store to query on the
> > > > content itself.
> > > > For this kind of scenario (search on prop AND  full text search), I
> > > > expect only one query via Lucene will be faster. Lucene can store
> > > > properties that will not be tokenized. Anyway, it is not a ideal
> > > > situation because properties have to be duplicate into 2 differents
> > > > stores. So, I don't know what will be the best solution !
> > >
> > > I think we are attacking the problem from the wrong angle: first we
> > > need to collect usecases, then we need to find a way to make the
> > > usecase possible.
> > >
> > > I personally wouldn't know how to make use of a query against full text
> > > *and* properties. This is because such a query looks weird to me:
> > > full-text is the least structure possible (get me everything but I
> > > don't know where) while properties tent to be very much structured
> > > (last modified time, author, and so on).
> > >
> > > There is a decades long discussion on what is data and what is metadata
> > > and I don't want to touch that with a stick, but I think that if you
> > > need to do full-text search on your metadata there is something wrong.
> >
> > Stefano with all due respect, there is nothing wrong with a full-text
> > search on metadata because metadata in this case can be any properties
> > of any of the resources in the repository and that meta data can be free
> > form text.
> >
> > consider a search query like
> >
> > doctype="memo" and description contains "Fire Stefano" and contents
> > contains "January"
> >
> > doctype and description are properties with string values that would be
> > indexed and matched with the same index as the contents.
> >
> > Everybody doesn't use the Database Stores, some actually preter the XML
> > Stores so an index of the XML should be full text, yes?
> >
> > > But this is my very personal vision, of course, and I would like to see
> > > what other usecases or scenarios others can come up with before stating
> > > where to go.
> > >
> > > >>> Anyway, Do you have some idea to optimize the current search service
> > > >>> ?
> > > >>
> > > >> I havn't looked into this yet (I'm still lagging behind on some other
> > > >> issues with my project and I havn't attacked this part yet).
> > > >>
> > > >> The idea is to use an RDBMS as much as possible on all content that
> > > >> can be turned relational without major issues (and normally metadata
> > > >> fits this category). As for full-text search, I agree that there is
> > > >> no way to beat an engine like lucene.
> > > >
> > > > Agree ! I understand your point of view, the best way to query on
> > > > properties is certainly the classic select statment but if you need an
> > > > index/search engine to for full-text search, I don't know.
> > >
> > > I personally had this vision before: DASL allows you to select the
> > > search language. We already provide the DASL basic-search, nothing
> > > stops us from coming up with an entirely new lucene-influenced
> > > full-text language that works only on the files contents.
> > >
> > > So, you do different queries depending on how you want to treat the
> > > content.
> > >
> > > > Furthermore, like Erik explains in a previous mail, you can write some
> > > > filter to apply security rules. So, in one query makes in only one
> > > > store , you can filters on props, content and security rules.
> > > > Can you do that without storing properties into the search engine ?
> > > > I'm curious :-)
> >
> > For clarity indexing properties as they go into the store isn't the same
> > as storing properties into the search engine/index.  In other words the
> > index of the properties and content just needs access to the data as it
> > is being stored and doesn't impact the stores beyond the call, and that
> > can be minimized with an indexing queue that can be done asynchronously.
> >
> > > You could, I think, but it would be tremendously slow compared.
> > >
> > > > It should be interesting to compare in more detail both solution,
> > > > makes performance tests, ...
> > > >
> > > >>> Why not to support both situation : either inder the prop or not ?
> > > >>
> > > >> You mean with a global configuration or more granularely?
> > > >
> > > > Still thinking on that :-)  The idea is to use the domain.xml file to
> > > > define how to make the query on props and options used for the full
> > > > text search.
> > >
> > > I think we need to attack the store/indexing problem from the scenario
> > > angle down... or we'll go around in circles for a long time. of course,
> > > I'm not talking about Slide 2.0 but something to do after the release
> > > is done.
> >
> > I completely agree, a few scenarios/stories should be the first step and
> > I hope the example I gave above fits in that category.
> >
> > Ollie
> >
> > > > Thanks for this mail,
> > >
> > > You are welcome.
> > >
> > > --
> > > Stefano.
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 

Reply via email to