Leaving aside some historical reasons, the root of the issue is that any search has to identify all the terms in a field that satisfy it. Let's take a normal non-leading wildcard case first.
Finding all the terms like 'some*' will have to deal with many fewer terms than 's*'. Just dealing with that many terms will decrease performance, regardless of the underlying mechanisms used. Imagine you're searching down an ordered list of all the terms for a field, assembling a list, and then comparing that list with all the terms in that field with your list..... So, pure wildcard serches, i.e. just *, would have to handle all the terms in the index for the field. The situation with leading wildcards is worse than trailing, since all the terms in the index have to be examined. Even doing something as bad as a* will examine only terms starting in a. But looking for *a has to examine each and every term in the index because australia and zebra both qualify, there aren't any good shortcuts if you think of having an ordered list of terms in a field. So performance can degrade pretty dramatically when you allow this kind of thing and the original writers (my opinion here, I wasn't one of them) decided it was much better to disallow it by default and require users to dig around for the why rather than have them crash and burn a lot by something that seems innocent if you aren't familiar with the issues involved. A better approach is, and this isn't very obvious, is to index your terms reversed, and do leading wildcard searches on the *reversed* field as trailing wildcards. E.g. 'some' gets indexed as 'emos' and the wildcard search '*me' gets searched in the reversed field as 'em*'. There may still be performance issues if you allow single-letter wildcards, e.g. s* or *s, although a lot of work has been done in this area in the last few years. You'll have to measure in your situation. And beware that a really common problem when deciding how many real letters to allow is that it all works fine in your test data, but when you load your real corpus and suddenly SOLR/Lucene has to deal with 100,000 terms that might match rather than the 1,000 in your test set, response time changes....for the worse. So I'd look around for the reversed idea (See SOLR-1321 in the JIRA), and at least one of the schema examples has it. One hurdle for me was asking the question "does it really help the user to allow one or two leading characters in a wildcard search?". Surprisingly often, that's of no use to real users because so many terms match that it's overwhelming. YMMV, but it's a good question to ask if you find yourself in a quagmire because you allow a* type of queries. There are other strategies too, but that seems easiest.... Now, all that said, SOLR has done significant work to make wildcards work well, these are just general things to look out for when thinking about wildcards... I really think hacking the parser will come back to bite you as both as a maintenance and performance issue, I wouldn't go there without a pretty exhaustive look at other options. HTH Erick On Thu, Mar 11, 2010 at 6:29 PM, JavaGuy84 <bbar...@gmail.com> wrote: > > Eric, > > Thanks a lot for your reply. > > I was able to successfully hack the query parser and enabled the leading > wild card search. > > As of today I hacked the code for this reason only, I am not sure how to > make the leading wild card search to work without hacking the code and this > type of search is the preferred type of search in our organization. > > I had previously searched all over the web to find out 'why' that feature > was disabled as default but couldn't find any solid answer stating the > reason. In one of the posting in nabble it was mentioned that it might take > a performance hit if we enable the leading wild card search, can you please > let me know your comments on that? > > But I am very much interested in contributing some new stuff to SOLR group > so I consider this as a starting point.. > > > Thanks, > Barani > > Erick Erickson wrote: > > > > See Trey's comment, but before you go there..... > > > > What about SOLR's wildcard searching capabilities aren't > > working for you now? There are a couple of tricks for making > > leading wildcard searches work quickly, but this is a solved > > problem. Although whether the existing solutions work in > > your situation may be an open question... > > > > Or do you have to hack into the parser for other reasons? > > > > Best > > Erick > > > > On Thu, Mar 11, 2010 at 12:07 PM, JavaGuy84 <bbar...@gmail.com> wrote: > > > >> > >> Hi, > >> > >> Sorry for asking this very simple question but I am very new to SOLR and > >> I > >> want to play with its source code. > >> > >> As a initial step I have a requirement to enable wildcard search (*text) > >> in > >> SOLR. I am trying to figure out a way to import the complete SOLR build > >> to > >> Eclipse and edit QueryParsing.java file but I am not able to import (I > >> tried > >> to import with ant project in Eclipse and selected the build.xml file > and > >> got an error stating javac is not present in the build.xml file). > >> > >> Can someone help me out with the initial steps on how to import / edit / > >> compile / test the SOLR source? > >> > >> Thanks a lot for your help!!! > >> > >> Thanks, > >> B > >> -- > >> View this message in context: > >> > http://old.nabble.com/How-to-edit---compile-the-SOLR-source-code-tp27866410p27866410.html > >> Sent from the Solr - User mailing list archive at Nabble.com. > >> > >> > > > > > > -- > View this message in context: > http://old.nabble.com/How-to-edit---compile-the-SOLR-source-code-tp27866410p27871470.html > Sent from the Solr - User mailing list archive at Nabble.com. > >