On 4/3/08, Andreas Hartmann <[EMAIL PROTECTED]> wrote:
> [EMAIL PROTECTED] schrieb:
> > On 4/2/08, Andreas Hartmann <[EMAIL PROTECTED]> wrote:
> > > Richard Frovarp schrieb:
> > > > Michael Wechner wrote:
> > > >> Andreas Hartmann wrote:
> > > >>> at the moment, we configure the search index per publication. It is
> > > >>> probably possible to use the same search index across multiple
> > > >>> publications, but I wonder if this flexibility is really necessary
> or
> > > >>> could maybe be even harmful.
> > > >>>
> > > >>> I see the following advantages of using a common index for all
> > > >>> publications:
> > > >> I think a common index is bad, because
> > > >> - What if the index is becoming corrupt for whatever reason?
> > > >> - What if you want to use one single Lenya instance for hosting
> > > >> multiple publications and one needs to protect the content from each
> > > >> other
> > >
> > > > I agree. We have about 60 different domains hosted through one Lenya
> > > > instance. They all are school related, so searching for basketball in
> > > > one should only give you that school's basketball information, not
> that
> > > > plus 40 other districts.
> > >
> > > BTW, narrowing down the search to languages etc. (and potentially
> > > publications) can be done by extending the query string:
> > > +({http://purl.org/dc/elements/1.1/}title:Hello)
> +(language:en)
> > > -- Andreas
> >
> > Possibilities:
> > 1. Standard configuration copied to each publication.
> > The current situation.
> >
> > 2. Common configuration using publication-relative directories:
> > The standard configuration is contained in one location. Each
> > publication has separate index. A publication needing custom search
> > configuration can create its own configuration. (In 1.3, this will be
> > overriding the necessary files in a publication Module. Does 2.x have
> > similar abilities?)
> Yes, this could be done with including the config file e.g. via
> fallback://config/search/index.xml
>
> > An advantage is easier maintenance -- changes to
> > the standard configuration are immediately used by all Publications.
> Makes sense.
>
> > 2. Common index for multiple publications.
> > Configure publications to opt-in. This should not exclude
> > publication-specific indexes. Also allow several multiple publication
> > indexes to create sets of Publications. Users or developers may
> > choose to use the publication-specific or a common index depending on
> > needs.
>
> It looks like most people would prefer separate indexes by default. So the
> most useful change would probably be to make the index configuration
> optional. Did I understand this correctly?
> -- Andreas
Sorry, "opt-in" may be American slang -- means default to exclude
everything with an option to be included. (The opposite is "opt-out"
meaning everything is included and each must ask to be excluded.) The
suggestion is to specify in each Publication whether to be included in
each multi-Publication index.
Using the school district example:
<publication id="school1">
<search index="work/search/index"/>
<search index="/commonindexes/town1"/>
<search index="/commonindexes/county1"/>
</publication>
The first index path is relative and Publication-specific; no
Publication-specific search index is created if this entry is missing.
The other two indexes have absolute paths and may include several
Publications. Each Publication must be configured to be included in
town and county indexes. A developer may provide search screens using
each index, or may provide a single search screen allowing users to
choose whether to search this school, the town, or the county.
The configuration could allow additional filtering, useful for
multiple targeted indexes at both the single and multiple Publication
levels:
<search index="work/search/sports" filter="category='sports'"/>
<search index="work/search/music" filter="category='music'"/>
<search index="/commonindexes/countysports" filter="category='sports'"/>
<search index="/commonindexes/countysmusic" filter="category='music'"/>
This is just an example and is unlikely to be the exact syntax needed
for filters. Whether a filter is applied during indexing or as part of
the search query is a design decision. Having an index including just
sporting events may be useful to limit the index size (and improve
performance) if county-wide searches are common:
search?index=county&query=basketball
search?index=countysports&query=basketball
The latter should exclude pages that include the word "basketball"
without being in the sports section of each town's Publication.
solprovider
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]