Erik, That was a wonderful explanation, I hope many folks in this forum will be benefited from the explanation you have given here.
Actually I Googled and found the solution when you had earlier mentioned that I can do a leading wildcard without hacking the code. I found out the patch that had been already available to resolve this issue (by using ReversedWildcardFilterFactory) and I have started to implement that idea. Thanks a lot for your valuable time.. SOLR rocks!!!! Thanks, Barani Erick Erickson wrote: > > Leaving aside some historical reasons, the root of > the issue is that any search has to identify all the > terms in a field that satisfy it. Let's take a normal > non-leading wildcard case first. > > Finding all the terms like 'some*' will have to > deal with many fewer terms than 's*'. Just dealing with > that many terms will decrease performance, regardless > of the underlying mechanisms used. Imagine you're > searching down an ordered list of all the terms for > a field, assembling a list, and then comparing that list with > all the terms in that field with your list..... > > So, pure wildcard serches, i.e. just *, would have to > handle all the terms in the index for the field. > > The situation with leading wildcards is worse than > trailing, since all the terms in the index have to be > examined. Even doing something as bad as > a* will examine only terms starting in a. But looking > for *a has to examine each and every term in the index > because australia and zebra both qualify, there aren't > any good shortcuts if you think of having an ordered > list of terms in a field. > > So performance can degrade pretty dramatically when > you allow this kind of thing and the original writers > (my opinion here, I wasn't one of them) decided it was > much better to disallow it by default and require users > to dig around for the why rather than have them > crash and burn a lot by something that seems innocent > if you aren't familiar with the issues involved. > > A better approach is, and this isn't very obvious, > is to index your terms reversed, and do leading wildcard > searches on the *reversed* field as trailing wildcards. > E.g. 'some' gets indexed as 'emos' and the wildcard > search '*me' gets searched in the reversed field as > 'em*'. > > There may still be performance issues if you allow > single-letter wildcards, e.g. s* or *s, although a lot of > work has been done in this area in the last few years. > You'll have to measure in your situation. And beware > that a really common problem when deciding how many > real letters to allow is that it all works fine in your test > data, but when you load your real corpus and suddenly > SOLR/Lucene has to deal with 100,000 terms that > might match rather than the 1,000 in your test set, response > time changes....for the worse. > > So I'd look around for the reversed idea (See SOLR-1321 > in the JIRA), and at least one of the schema examples > has it. > > One hurdle for me was asking the question "does it > really help the user to allow one or two leading > characters in a wildcard search?". Surprisingly often, > that's of no use to real users because so many > terms match that it's overwhelming. YMMV, but it's > a good question to ask if you find yourself in a > quagmire because you allow a* type of queries. > > There are other strategies too, but that seems easiest.... > > Now, all that said, SOLR has done significant work > to make wildcards work well, these are just general > things to look out for when thinking about wildcards... > > I really think hacking the parser will come back to bite > you as both as a maintenance and performance issue, > I wouldn't go there without a pretty exhaustive look at > other options. > > HTH > Erick > > On Thu, Mar 11, 2010 at 6:29 PM, JavaGuy84 <bbar...@gmail.com> wrote: > >> >> Eric, >> >> Thanks a lot for your reply. >> >> I was able to successfully hack the query parser and enabled the leading >> wild card search. >> >> As of today I hacked the code for this reason only, I am not sure how to >> make the leading wild card search to work without hacking the code and >> this >> type of search is the preferred type of search in our organization. >> >> I had previously searched all over the web to find out 'why' that feature >> was disabled as default but couldn't find any solid answer stating the >> reason. In one of the posting in nabble it was mentioned that it might >> take >> a performance hit if we enable the leading wild card search, can you >> please >> let me know your comments on that? >> >> But I am very much interested in contributing some new stuff to SOLR >> group >> so I consider this as a starting point.. >> >> >> Thanks, >> Barani >> >> Erick Erickson wrote: >> > >> > See Trey's comment, but before you go there..... >> > >> > What about SOLR's wildcard searching capabilities aren't >> > working for you now? There are a couple of tricks for making >> > leading wildcard searches work quickly, but this is a solved >> > problem. Although whether the existing solutions work in >> > your situation may be an open question... >> > >> > Or do you have to hack into the parser for other reasons? >> > >> > Best >> > Erick >> > >> > On Thu, Mar 11, 2010 at 12:07 PM, JavaGuy84 <bbar...@gmail.com> wrote: >> > >> >> >> >> Hi, >> >> >> >> Sorry for asking this very simple question but I am very new to SOLR >> and >> >> I >> >> want to play with its source code. >> >> >> >> As a initial step I have a requirement to enable wildcard search >> (*text) >> >> in >> >> SOLR. I am trying to figure out a way to import the complete SOLR >> build >> >> to >> >> Eclipse and edit QueryParsing.java file but I am not able to import (I >> >> tried >> >> to import with ant project in Eclipse and selected the build.xml file >> and >> >> got an error stating javac is not present in the build.xml file). >> >> >> >> Can someone help me out with the initial steps on how to import / edit >> / >> >> compile / test the SOLR source? >> >> >> >> Thanks a lot for your help!!! >> >> >> >> Thanks, >> >> B >> >> -- >> >> View this message in context: >> >> >> http://old.nabble.com/How-to-edit---compile-the-SOLR-source-code-tp27866410p27866410.html >> >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> >> >> >> > >> > >> >> -- >> View this message in context: >> http://old.nabble.com/How-to-edit---compile-the-SOLR-source-code-tp27866410p27871470.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://old.nabble.com/How-to-edit---compile-the-SOLR-source-code-tp27866410p27872470.html Sent from the Solr - User mailing list archive at Nabble.com.