Re: How to edit / compile the SOLR source code

JavaGuy84 Thu, 11 Mar 2010 17:42:35 -0800

Erik,

That was a wonderful explanation, I hope many folks in this forum will be
benefited from the explanation you have given here.


Actually I Googled and found the solution when you had earlier mentioned
that I can do a leading wildcard without hacking the code. 

I found out the patch that had been already available to resolve this issue
(by using ReversedWildcardFilterFactory) and I have started to implement
that idea.


Thanks a lot for your valuable time..

SOLR rocks!!!!

Thanks,
Barani



Erick Erickson wrote:
> 
> Leaving aside some historical reasons, the root of
> the issue is that any search has to identify all the
> terms in a field that satisfy it. Let's take a normal
> non-leading wildcard case first.
> 
> Finding all the terms like 'some*' will have to
> deal with many fewer terms than 's*'. Just dealing with
> that many terms will decrease performance, regardless
> of the underlying mechanisms used. Imagine you're
> searching down an ordered list of all the terms for
> a field, assembling a list, and then comparing that list with
> all the terms in that field with your list.....
> 
> So, pure wildcard serches, i.e. just *, would have to
> handle all the terms in the index for the field.
> 
> The situation with leading wildcards is worse than
> trailing, since all the terms in the index have to be
> examined. Even doing something as bad as
> a* will examine only terms starting in a. But looking
> for *a has to examine each and every term in the index
> because australia and zebra both qualify, there aren't
> any good shortcuts if you think of having an ordered
> list of terms in a field.
> 
> So performance can degrade pretty dramatically when
> you allow this kind of thing and the original writers
> (my opinion here, I wasn't one of them) decided it was
> much better to disallow it by default and require users
> to dig around for the why rather than have them
> crash and burn a lot by something that seems innocent
> if you aren't familiar with the issues involved.
> 
> A better approach is, and this isn't very obvious,
> is to index your terms reversed, and do leading wildcard
> searches on the *reversed* field as trailing wildcards.
> E.g. 'some' gets indexed as 'emos' and the wildcard
> search '*me' gets searched in the reversed field as
> 'em*'.
> 
> There may still be performance issues if you allow
> single-letter wildcards, e.g. s* or *s, although a lot of
> work has been done in this area in the last few years.
> You'll have to measure in your situation. And beware
> that a really common problem when deciding how many
> real letters to allow is that it all works fine in your test
> data, but when you load your real corpus and suddenly
> SOLR/Lucene has to deal with 100,000 terms that
> might match rather than the 1,000 in your test set, response
> time changes....for the worse.
> 
> So I'd look around for the reversed idea (See SOLR-1321
> in the JIRA), and at least one of the schema examples
> has it.
> 
> One hurdle for me was asking the question "does it
> really help the user to allow one or two leading
> characters in a wildcard search?". Surprisingly often,
> that's of no use to real users because so many
> terms match that it's overwhelming. YMMV, but it's
> a good question to ask if you find yourself in a
> quagmire because you allow a* type of queries.
> 
> There are other strategies too, but that seems easiest....
> 
> Now, all that said, SOLR has done significant work
> to make wildcards work well, these are just general
> things to look out for when thinking about wildcards...
> 
> I really think hacking the parser will come back to bite
> you as both as a maintenance and performance issue,
> I wouldn't go there without a pretty exhaustive look at
> other options.
> 
> HTH
> Erick
> 
> On Thu, Mar 11, 2010 at 6:29 PM, JavaGuy84 <bbar...@gmail.com> wrote:
> 
>>
>> Eric,
>>
>> Thanks a lot for your reply.
>>
>> I was able to successfully hack the query parser and enabled the leading
>> wild card search.
>>
>> As of today I hacked the code for this reason only, I am not sure how to
>> make the leading wild card search to work without hacking the code and
>> this
>> type of search is the preferred type of search in our organization.
>>
>> I had previously searched all over the web to find out 'why' that feature
>> was disabled as default but couldn't find any solid answer stating the
>> reason. In one of the posting in nabble it was mentioned that it might
>> take
>> a performance hit if we enable the leading wild card search, can you
>> please
>> let me know your comments on that?
>>
>> But I am very much interested in contributing some new stuff to SOLR
>> group
>> so I consider this as a starting point..
>>
>>
>> Thanks,
>> Barani
>>
>> Erick Erickson wrote:
>> >
>> > See Trey's comment, but before you go there.....
>> >
>> > What about SOLR's wildcard searching capabilities aren't
>> > working for you now? There are a couple of tricks for making
>> > leading wildcard searches work quickly, but this is a solved
>> > problem. Although whether the existing solutions work in
>> > your situation may be an open question...
>> >
>> > Or do you have to hack into the parser for other reasons?
>> >
>> > Best
>> > Erick
>> >
>> > On Thu, Mar 11, 2010 at 12:07 PM, JavaGuy84 <bbar...@gmail.com> wrote:
>> >
>> >>
>> >> Hi,
>> >>
>> >> Sorry for asking this very simple question but I am very new to SOLR
>> and
>> >> I
>> >> want to play with its source code.
>> >>
>> >> As a initial step I have a requirement to enable wildcard search
>> (*text)
>> >> in
>> >> SOLR. I am trying to figure out a way to import the complete SOLR
>> build
>> >> to
>> >> Eclipse and edit QueryParsing.java file but I am not able to import (I
>> >> tried
>> >> to import with ant project in Eclipse and selected the build.xml file
>> and
>> >> got an error stating javac is not present in the build.xml file).
>> >>
>> >> Can someone help me out with the initial steps on how to import / edit
>> /
>> >> compile / test the SOLR source?
>> >>
>> >> Thanks a lot for your help!!!
>> >>
>> >> Thanks,
>> >> B
>> >> --
>> >> View this message in context:
>> >>
>> http://old.nabble.com/How-to-edit---compile-the-SOLR-source-code-tp27866410p27866410.html
>> >> Sent from the Solr - User mailing list archive at Nabble.com.
>> >>
>> >>
>> >
>> >
>>
>> --
>> View this message in context:
>> http://old.nabble.com/How-to-edit---compile-the-SOLR-source-code-tp27866410p27871470.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://old.nabble.com/How-to-edit---compile-the-SOLR-source-code-tp27866410p27872470.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to edit / compile the SOLR source code

Reply via email to