Re: Extending org.apache.xindice.core.indexer.Indexer

Terry Rosenbaum 10 Mar 2004 14:41:45 -0000

Andy Armstrong wrote:

> At the moment I'm not doing XPath search syntax. Searching looks like this:

>  String query = "some text to find";
>  TextQueryService tqs =
>      (TextQueryService) col.getService("TextQueryService", "1.0");
>  resultSet = tqs.query(query);

> I'm open to suggestions for improvements / alternatives :)

This is a good thing since there is no way to provide
support for many of the possible Lucene queries
in the current XPath syntax.

A good addition would be implementation of a substring indexer
and use that to support standard XPath "contains"
and "ends-with" functions. The same index could be used
for both. "ends-with" is a special case of "contains" and the
indexed evaluation for "ends-with" could be "contains". The
"ends-with" predicate would be further resolved when the
full XPath evaluation runs following the indexed evaluation.
A substring indexer would index, perhaps, all 3-character
substrings. Indexed searches for 1- or 2-character strings
could either punt for the full collection scan, or implement as
a range query against the 3-character substring index. Optimal
performance would occur when the searched-for string was 3 characters.
Perhaps the index definition could specify the substring size for the
index.

o.a.x.core.query.XPathQueryResolver is where the indexed evaluations occur.

Compare/contrast the code in methods
Object evalValComparison(int op, String owner, int pos)
vs. funcContains(List args).

Unfortunately, job responsibilities prevent me from working on this
right now. :( When we are not close to the end of a product release
cycle, my employer will allow me to donate company-paid time to
the effort (as I have done in the past).

-Terry

Terry Rosenbaum wrote:
One may want to implement a full text index on several
different index patterns. e.g. specify an index on some
element and an index on some attribute, Would your
implementation handle that case?
Yes although right now it's concatenating all the text that matches a particular pattern ([EMAIL PROTECTED]) within a document together and it returns the whole document in the case of a match. I'm changing that behaviour right now :)
Just out of curiosity, does your implementation support the
existing standard XPath approach to searching? What sort
of analyzer are you using? Did you decide to allow users
to specify the analyzer somehow?
You get to specify the Lucene analyzer when you create the index via an additional 'analyzer' attribute.

At the moment I'm not doing XPath search syntax. Searching looks like this:
  String query = "some text to find";
  TextQueryService tqs =
      (TextQueryService) col.getService("TextQueryService", "1.0");
  resultSet = tqs.query(query);
I'm open to suggestions for improvements / alternatives :)

Re: Extending org.apache.xindice.core.indexer.Indexer

Reply via email to