[jira] Commented: (SOLR-1321) Support for efficient leading wildcards search

Andrzej Bialecki (JIRA) Wed, 12 Aug 2009 13:27:41 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742559#action_12742559
 ]


Andrzej Bialecki  commented on SOLR-1321:
-----------------------------------------

bq. Since this is a new filter, we might as well use the new incrementToken 
capability and reusable stuff as well as avoiding other deprecated analysis 
calls.

Indeed, I'll fix this.

bq. Also no need to do the string round trip in the reverse method, right? See 
the ReverseStringFilter in Lucene contrib/analysis.

Roundtrip ... you mean the allocation of new char[] buffer, or conversion to 
String? I assume the latter - the former is needed because we add the marker 
char in front. Yeah, I can return char[] and convert to String only in QP.

bq. Perhaps we should just patch that and add some config options to it? Then 
all Solr would need is the QP change and the FilterFactory change, no?

Hmm. After adding the marker-related stuff the code in ReverseStringFilter 
won't be so nice as it is now. I'd keep in mind the specific use case of this 
filter ...

> Support for efficient leading wildcards search
> ----------------------------------------------
>
>                 Key: SOLR-1321
>                 URL: https://issues.apache.org/jira/browse/SOLR-1321
>             Project: Solr
>          Issue Type: Improvement
>          Components: Analysis
>    Affects Versions: 1.4
>            Reporter: Andrzej Bialecki 
>            Assignee: Grant Ingersoll
>             Fix For: 1.4
>
>         Attachments: wildcards-2.patch, wildcards.patch
>
>
> This patch is an implementation of the "reversed tokens" strategy for 
> efficient leading wildcards queries.
> ReversedWildcardsTokenFilter reverses tokens and returns both the original 
> token (optional) and the reversed token (with positionIncrement == 0). 
> Reversed tokens are prepended with a marker character to avoid collisions 
> between legitimate tokens and the reversed tokens - e.g. "DNA" would become 
> "and", thus colliding with the regular term "and", but with the marker 
> character it becomes "\u0001and".
> This TokenFilter can be added to the analyzer chain that it used during 
> indexing.
> SolrQueryParser has been modified to detect the presence of such fields in 
> the current schema, and treat them in a special way. First, SolrQueryParser 
> examines the schema and collects a map of fields where these reversed tokens 
> are indexed. If there is at least one such field, it also sets 
> QueryParser.setAllowLeadingWildcards(true). When building a wildcard query 
> (in getWildcardQuery) the term text may be optionally reversed to put 
> wildcards further along the term text. This happens when the field uses the 
> reversing filter during indexing (as detected above), AND if the wildcard 
> characters are either at 0-th or 1-st position in the term. Otherwise the 
> term text is processed as before, i.e. turned into a regular wildcard query.
> Unit tests are provided to test the TokenFilter and the query parsing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1321) Support for efficient leading wildcards search

Reply via email to