Andy Armstrong wrote:
> At the moment I'm not doing XPath search syntax. Searching looks like this:
> String query = "some text to find";
> TextQueryService tqs =
> (TextQueryService) col.getService("TextQueryService", "1.0");
> resultSet = tqs.query(query);> I'm open to suggestions for improvements / alternatives :)
This is a good thing since there is no way to provide support for many of the possible Lucene queries in the current XPath syntax.
A good addition would be implementation of a substring indexer and use that to support standard XPath "contains" and "ends-with" functions. The same index could be used for both. "ends-with" is a special case of "contains" and the indexed evaluation for "ends-with" could be "contains". The "ends-with" predicate would be further resolved when the full XPath evaluation runs following the indexed evaluation. A substring indexer would index, perhaps, all 3-character substrings. Indexed searches for 1- or 2-character strings could either punt for the full collection scan, or implement as a range query against the 3-character substring index. Optimal performance would occur when the searched-for string was 3 characters. Perhaps the index definition could specify the substring size for the index.
o.a.x.core.query.XPathQueryResolver is where the indexed evaluations occur.
Compare/contrast the code in methods Object evalValComparison(int op, String owner, int pos) vs. funcContains(List args).
Unfortunately, job responsibilities prevent me from working on this right now. :( When we are not close to the end of a product release cycle, my employer will allow me to donate company-paid time to the effort (as I have done in the past).
-Terry
Terry Rosenbaum wrote:
One may want to implement a full text index on several different index patterns. e.g. specify an index on some element and an index on some attribute, Would your implementation handle that case?
Yes although right now it's concatenating all the text that matches a particular pattern ([EMAIL PROTECTED]) within a document together and it returns the whole document in the case of a match. I'm changing that behaviour right now :)
Just out of curiosity, does your implementation support the existing standard XPath approach to searching? What sort of analyzer are you using? Did you decide to allow users to specify the analyzer somehow?
You get to specify the Lucene analyzer when you create the index via an additional 'analyzer' attribute.
At the moment I'm not doing XPath search syntax. Searching looks like this:
String query = "some text to find"; TextQueryService tqs = (TextQueryService) col.getService("TextQueryService", "1.0"); resultSet = tqs.query(query);
I'm open to suggestions for improvements / alternatives :)
