Thanks for your ideas.  xpath and xquery are evolving, but standards
nonetheless.  Thus, in the interest of providing a migration path to Xindice
from other XML databases, I think it makes sense to maintain xpath.
However, having the flexibility to use fuzzy-search algorithms such as the
one you describe in Lucene, can only augment Xindice's repertoir of
capabilities and hence users.  I'm not familiar with the Xindice
architecture, though, so I'm not sure how easy it is to add modular
functionality.  Perhaps someone else can speak to that...

So anyway, are you volunteering to create the lucene module? :)

Pietro

> -----Original Message-----
> From: sandy pittendrigh [mailto:[EMAIL PROTECTED]
> Sent: Thursday, March 06, 2003 12:44 PM
> To: [email protected]
> Subject: Lucene as an alternate Query Mechanism
>
>
> I have an off-the-cuff idea I wonder if anybody else
> has considered: "does it make any sense to think
> about using apache::lucene as an alternate, fuzzy-search
> mechanism over collections of XML files, rather than, or
> in addition to xpath?"
>
>
> Lucene appears to provide a way of indexing words
> and word proximities in otherwise free-form text
> documents. You could, for instance, use a term modifier
> like ["jakarta apache" ~10]to find all the documents that
> contained the fields jakarta and apache, that appear no
> more than ten fields apart from each other.
>
> To the extent this query language is useful over
> completely unstructured, free-form text, it seems likely
> that it (the lucene query language) would be even more
> powerful operating over more regularly structured text, like XML files.
>
> Lucene is more of a search-engine technology than a database
> technololgy....where answer sets are expected to have an attractive ratio
> between relevant and irrelevant data, rather than
> the rigid, 100% metadata criteria matches possible with
> xpath queries over XML data.
>
> Does it make sense for projects like Xindice to have alterate,
> plug-in-like ways to search and query the same datasets? Or
> should alterate
> query technologies exist as disparate, separate software entities?
>
>
>
>
>
>
> --
> /* Sandy Pittendrigh  >--oO0>
>  * [EMAIL PROTECTED]
>  * http://cns.montana.edu/~sandy */
>
>
>

Reply via email to