Re: wild card search and lower-casing

Dmitry Kan Wed, 23 Nov 2011 05:48:39 -0800

Yes, it should be ok, as currently we are on the English side. If that's
beneficial for the effort, I could do a field test on 3.4 after you close
the jira.


Best,
Dmitry

On Wed, Nov 23, 2011 at 2:52 PM, Erick Erickson <erickerick...@gmail.com>wrote:

> Ah, I see what you're doing, go for it.
>
> I intend to commit it today, but things happen.....
>
> About changing the setLowerCaseExpandedTerms(true), yes
> that'll take care of this issue, although it has some
> locale-specific assumptions (i.e. string.toLowerCase() uses the
> default locale). That may not matter in your situation though.
>
> Best
> Erick
>
> On Tue, Nov 22, 2011 at 10:46 AM, Dmitry Kan <dmitry....@gmail.com> wrote:
> > Thanks, Erick. I was in fact reading the patch (the one attached as a
> > file to the aforementioned jira) you updated sometime yesterday. I'll
> > watch the issue, but as said the change of a hard-coded boolean to its
> > opposite worked just fine for me.
> >
> > Best,
> > Dmitry
> >
> >
> > On 11/22/11, Erick Erickson <erickerick...@gmail.com> wrote:
> >> No, no, no.... That's something buried in Lucene, it has nothing to
> >> do with the patch! The patch has NOT yet been applied to any
> >> released code.
> >>
> >> You could pull the patch from the JIRA and apply it to trunk locally if
> >> you wanted. But there's no patch for 3.x, I'll probably put that up
> >> over the holiday.
> >>
> >> But things have changed a bit (one of the things I'll have to do is
> >> create some documentation). You *should* be able to specify
> >> just legacyMultiTerm="true" in your <fieldType> if you want to
> >> apply the 3.x patch to pre 3.6 code. It would be a good field test
> >> if that worked for you.
> >>
> >> But you can't do any of this until the JIRA (SOLR-2438) is
> >> marked "Resolution: Fixed".
> >>
> >> Don't be fooled by "Fix Version". "Fix Version" simply says
> >> that those are the earliest versions it *could* go in.
> >>
> >> Best
> >> Erick
> >>
> >> Best
> >> Erick
> >>
> >> On Tue, Nov 22, 2011 at 6:32 AM, Dmitry Kan <dmitry....@gmail.com>
> wrote:
> >>> I guess, I have found your comment, thanks.
> >>>
> >>> For our current needs I have just set:
> >>>
> >>> setLowercaseExpandedTerms(true); // changed from default false
> >>>
> >>> in the SolrQueryParser's constructor and that seem to work so far.
> >>>
> >>> In order not to start a separate thread on wildcards. Is it so, that
> for
> >>> the trailing wildcard there is a minimum of 2 preceding characters for
> a
> >>> search to happen?
> >>>
> >>> Dmitry
> >>>
> >>> On Mon, Nov 21, 2011 at 2:59 PM, Erick Erickson
> >>> <erickerick...@gmail.com>wrote:
> >>>
> >>>> It may be. The tricky bit is that there is a constant governing the
> >>>> behavior of
> >>>> this that restricts it to 3.6 and above. You'll have to change it
> after
> >>>> applying
> >>>> the patch for this to work for you. Should be trivial, I'll leave a
> note
> >>>> in the
> >>>> code about this, look for SOLR-2438 in the 3x code line for the place
> >>>> to change.
> >>>>
> >>>> On Mon, Nov 21, 2011 at 2:14 AM, Dmitry Kan <dmitry....@gmail.com>
> wrote:
> >>>> > Thanks Erick.
> >>>> >
> >>>> > Do you think the patch you are working on will be applicable as
> well to
> >>>> 3.4?
> >>>> >
> >>>> > Best,
> >>>> > Dmitry
> >>>> >
> >>>> > On Mon, Nov 21, 2011 at 5:06 AM, Erick Erickson
> >>>> > <erickerick...@gmail.com
> >>>> >wrote:
> >>>> >
> >>>> >> As it happens I'm working on SOLR-2438 which should address this.
> This
> >>>> >> patch
> >>>> >> will provide two things:
> >>>> >>
> >>>> >> The ability to define a new analysis chain in your schema.xml,
> >>>> >> currently
> >>>> >> called
> >>>> >> "multiterm" that will be applied to queries of various sorts,
> >>>> >> including wildcard,
> >>>> >> prefix, range. This will be somewhat of an "expert" thing to make
> >>>> >> yourself...
> >>>> >>
> >>>> >> In the absence of an explicit definition it'll synthesize a
> multiterm
> >>>> >> analyzer
> >>>> >> out of the query analyzer, taking any char fitlers, and
> >>>> >> lowercaseFilter (if present),
> >>>> >> and ASCIIFoldingfilter (if present) and putting them in the
> multiterm
> >>>> >> analyzer along
> >>>> >> with a (hardcoded) WhitespaceTokenizer.
> >>>> >>
> >>>> >> As of 3.6 and 4.0, this will be the default behavior, although you
> can
> >>>> >> explicitly
> >>>> >> define a field type parameter to specify the current behavior.
> >>>> >>
> >>>> >> The reason it is on 3.6 is that I want it to bake for a while
> before
> >>>> >> getting into the
> >>>> >> wild, so I have no intention of trying to get it into the 3.5
> release.
> >>>> >>
> >>>> >> The patch is up for review now, I'd like another set of eyeballs or
> >>>> >> two on it before
> >>>> >> committing.
> >>>> >>
> >>>> >> The patch that's up there now is against trunk but I hope to have
> a 3x
> >>>> >> patch that
> >>>> >> I'll apply to the 3x code line after 3.5 RC1 is cut.
> >>>> >>
> >>>> >> Best
> >>>> >> Erick
> >>>> >>
> >>>> >>
> >>>> >> On Fri, Nov 18, 2011 at 12:05 PM, Ahmet Arslan <iori...@yahoo.com>
> >>>> wrote:
> >>>> >> >
> >>>> >> >> You're right:
> >>>> >> >>
> >>>> >> >> public SolrQueryParser(IndexSchema schema, String
> >>>> >> >> defaultField) {
> >>>> >> >> ...
> >>>> >> >> setLowercaseExpandedTerms(false);
> >>>> >> >> ...
> >>>> >> >> }
> >>>> >> >
> >>>> >> > Please note that lowercaseExpandedTerms uses String.toLowercase()
> >>>> (uses
> >>>> >>  default Locale) which is a Locale sensitive operation.
> >>>> >> >
> >>>> >> > In Lucene AnalyzingQueryParser exists for this purposes, but I am
> >>>> >> > not
> >>>> >> sure if it is ported to solr.
> >>>> >> >
> >>>> >> >
> >>>> >>
> >>>>
> http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html
> >>>> >> >
> >>>> >>
> >>>> >
> >>>>
> >>>
> >>
> >
> >
> > --
> > Regards,
> >
> > Dmitry Kan
> >
>

Re: wild card search and lower-casing

Reply via email to