Hi Martin,

Am Mittwoch, 4. Februar 2004 13:00 schrieb Wallmer, Martin:
> Hi Daniel,
>
> Great that you want to spend effort in an Indexing Framework!
>
> I have the feeling we can create an indexer framework that is independant
> of how the indexer is triggered.
Yes, that would be fine!

>
> Some unstructured thoughts, collected from several postings on this list:
>
> You can plug in extractors into the framework. An extractor is bound to a
> content type, you can have an extractor for jpegs, plain text, word
> documents, ...
>
> An extractor may deliver content data (the text of a pdf document, the text
> of a scanned facsimile, xml data, ...) and properties (author for word
> docs, composer for music files, ...).
>
> The extracted properties may be written to the propertiesstore and are so
> accessible with the normal DASL queries.

I'd prefer if the extractors do not belong to the search/indexing framework at 
all. They are very useful, even if no DASL queries are used.
So extractors are are great thing and should be plugged in before 
content/properties are stored. So that when the indexer use indexing the 
properties/content the extracted data is already available.
So my vote is to have an extra package org.apache.slide.content.extractors.* 
where these extractors will live in.

>
> The extracted content data may be indexed by one or more indexers, i.e.
> Lucene for text data, an indexer that might provide natural language
> queries, Tamino for XML data, ...
>
> The different indexers can be addressed by different content operators,
> <DAV:contains> for Lucene, <xsv:xpath> for Tamino,
> <otherNamespace:RDFcontains> for an RDF indexer... (this should be
> discussed on the DASL list as well)
Great!

>
> A lot is possible here!
>
> With the interface you proposed we can come together. To stay close at the
> existing I'd like to use the old interface IndexStore (perhaps rename it to
> Indexer, as it is not really a store). It extends Service, so we have the
> Two Phase Commit stuff.
>
> public interface IndexStore extends Service {
>     void index(Uri uri, NodeRevisionDescriptor revisionDescriptor,
>                NodeRevisionContent revisionContent)
>         throws ServiceAccessException;
>
>     void drop(Uri uri, NodeRevisionDescriptor revisionDescriptor)
>         throws ServiceAccessException;
>
>     // the ExpressionFactoryStuff (to be defined)
> }
>
> Could you live with that interface? If so, we can decouple the discussion,
> how to trigger the indexer.

I don't see how PropertyIndexer and ContentIndexer can be handled with this 
interface. How do we know if content or properties have changed? How do we 
know if index should be inserted or existing index should be updated?
The information is availble, because we know exactly what is going on in a 
transaction but gets lost if this interface is used, isn't it?
I have not known until today that IndexStore exists so I'll have a look at it 
in the next time.

>
> Regards,
> Martin

Best regards,
Daniel

>
> > -----Original Message-----
> > From: Daniel Florey [mailto:[EMAIL PROTECTED]
> > Sent: Mittwoch, 4. Februar 2004 11:56
> > To: Slide Developers Mailing List
> > Subject: Re: proposal: integrate indexing - search
> >
> >
> > Hi Martin,
> > sorry... I've overlooked this mail so I answered to another one.
> >
> > Am Mittwoch, 4. Februar 2004 10:18 schrieb Wallmer, Martin:
> > > Hi Daniel,
> > >
> > > > -----Original Message-----
> > > > From: Daniel Florey [mailto:[EMAIL PROTECTED]
> > > > Sent: Dienstag, 3. Februar 2004 17:47
> > > > To: Slide Developers Mailing List
> > > > Subject: Re: proposal: integrate indexing - search
> > > >
> > > >
> > > > Hi Martin,
> > > >
> > > > Am Dienstag, 3. Februar 2004 17:07 schrieb Wallmer, Martin:
> > > > > Hi Daniel,
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Daniel Florey [mailto:[EMAIL PROTECTED]
> > > > > > Sent: Dienstag, 3. Februar 2004 16:27
> > > > > > To: Slide Developers Mailing List
> > > > > > Subject: Re: proposal: integrate indexing - search
> > > > > >
> > > > > >
> > > > > > Hi,
> > > > > > great work. After reading it several times I understand
> > > >
> > > > most of it :-)
> > > >
> > > > > > As I'm working on events at the moment I think they could be
> > > > > > used to trigger
> > > > > > indexing for lucene based search.
> > > > > > They provide synchronous event collections (before commit and
> > > > > > vetoable) as
> > > > > > well as asynchronous event collections (fired after commit).
> > > > > > The collection can be filtered, so that only the desired
> > > > > > changes are indexed
> > > > >
> > > > > that could be a way to trigger the indexer. My current
> > > >
> > > > thoughts are to
> > > >
> > > > > trigger the indexer within the parent store
> > > > > (org.apache.slide.store.AbstractStore.java). Here we have a
> > > >
> > > > good chance to
> > > >
> > > > > let the indexer take part in two phase commit. It's up to
> > > >
> > > > the indexer
> > > >
> > > > > implementation, if two phase commit is used, or
> > > >
> > > > asynchronous processing is
> > > >
> > > > > done. I'm just on AbstractStore.
> > > >
> > > > I hope that that some day in the future someone will vote for
> > > > my event stuff
> > > > so that we can see if it fits to the needs of indexing.
> > >
> > > I don't think anyone voted against events :-)
> >
> > But nobody voted for them... :-(
> >
> > > > It should be possible to implements indexers that can
> >
> > take part in a
> >
> > > > transaction so that search queries reflect that content that
> > > > was changed in
> > > > the transaction (as it possible with rdbms). But this should
> > > > be achieved by
> > > > the use of events as well.
> > >
> > > Two phase commit is very closely coupled to the stores, so
> >
> > I think the best
> >
> > > place for indexing in a two phase commit scenario is in the
> >
> > store context.
> >
> > I've to think about how this can be achieved by events...
> >
> > > > The advantage of events is that
> > > > different aspects
> > > > of slide could be configured in the same way and can be
> > > > coupled loosely
> > > > without bloating up the core classes.
> > >
> > > its not too much bloating up :-)
> > >
> > > > Another advantage is that events could (in the future) be
> > > > distributed in a
> > > > clustered environment so that the search index could be
> > > > clustered as well
> > > > (just brainstorming).
> > >
> > > Yes! We should stay open for that.
> > > How the indexers interface should look like for the event
> >
> > driven solution?
> >
> > > If we agree to a common interface, why not stay open for
> >
> > both approaches?
> > What about something like this:
> >
> > public interface ContentIndexer extends ExpressionFactory {
> >     public void update(Namespace namespace, NodeRevisionDescriptors
> > descriptors, NodeRevisionDescriptor descriptor,
> > NodeRevisionContent content)
> > throws IndexException;
> >
> >     public void insert(Namespace namespace, NodeRevisionDescriptors
> > descriptors, NodeRevisionDescriptor descriptor,
> > NodeRevisionContent content)
> > throws IndexException;
> >
> >     public void delete(Namespace namespace, NodeRevisionDescriptors
> > descriptors, NodeRevisionDescriptor descriptor,
> > NodeRevisionContent content)
> > throws IndexException;
> > }
> >
> > public interface PropertyIndexer extends ExpressionFactory {
> >     public void update(Namespace namespace, NodeRevisionDescriptors
> > descriptors, NodeRevisionDescriptor descriptor) throws IndexException;
> >
> >     public void insert(Namespace namespace, NodeRevisionDescriptors
> > descriptors, NodeRevisionDescriptor descriptor) throws IndexException;
> >
> >     public void delete(Namespace namespace, NodeRevisionDescriptors
> > descriptors, NodeRevisionDescriptor descriptor) throws IndexException;
> > }
> >
> > The index exception would cause a rollback of the current
> > transaction if the
> > indexer was called within a transaction. If this exception is
> > thrown in an
> > asynchronous scenario the exception indicates that index and
> > repository might
> > be out of sync.
> > I could implement some kind of configurable IndexTrigger that
> > will call these
> > interfaces if matching events occur (both contentType and
> > URI-mapping).
> > Should I?
> >
> > Daniel
> >
> > > Regards,
> > > Martin
> >
> > ---------------------------------------------------------------------
> >
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to