Hi Les,
We ended up modifying the QueryParser to pass prefix and suffix queries
through the Analyzer. For us, it was about stemming. If you decide to use
an analyzer that incorporated stemming, there are cases where wildcard
queries will not return the expected results.
Example: searcher
Perform the lucene search. If you get no or few hits, send the query term
to a spell checker, like ispell. Echo the alternative spelling(s) to the
user.
DaveB
Dario
I'm sorry, I did not read the complete thread. Do you mean - analyzer ==
stemmer? Does it really work? If I was a stemmer, I would let searche
intact. ;-)
-g-
[EMAIL PROTECTED] wrote:
Hi Les,
We ended up modifying the QueryParser to pass prefix and suffix queries
through the Analyzer. For
Your analyzers can optionally incorporate stemming, along with the other
things that analyzers do (lowercasing, etc...). The stemming algorithms
are all different. This searcher example was made up, but, there are
instances where stemming at index time and not stemming wildcard searches
will
Ah, I got it. THX. In the good old days, the wildcards were used as a
fix for missing stemming module. I am not sure if you can combine these
two opposite approaches successfully. I see the following drawbacks of
your solution.
Example:
built* (-built) could be changed to build* (no built, but
True enough. We're supporting search of a product database, so, for us, it
made sense to increase coverage and accept the loss of precision. Our
solution is definitely not globally applicable.
DaveB
Thanks, for the answer.
I was searching for a solution not based on a dictionary, but on the list of
terms (with relative frequency) contained in the Lucene index.
In this way (I think) I can obtain more significant results,
I can use this method on multiple languages (without relative
Hi,
can you suffer me a link with an overview document of this method?
I couldn't find.
Thanks,
Dario
- Original Message -
From: Leo Galambos [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Friday, May 30, 2003 4:25 PM
Subject: Re: Search for similar terms
You need
http://cs.felk.cvut.cz/psc/members.html
http://cs.felk.cvut.cz/psc/event/1998/p13.html
or contact prof. Melichar for more details:
http://webis.felk.cvut.cz/people/melichar.html
-g-
Dario Dentale wrote:
Hi,
can you suffer me a link with an overview document of this method?
I couldn't find.
Hi,
please have a look at the FuzzyTermEnum class in Lucene.
There is an impressive implementation of Levenshtein distance
there that you can use; simply set the fuzzy distance higher
than 0.5 (0.75 seems to work fine) and modify the
termCompare method such that the last term produced is
I have a field, 'PartNumber', that has '-' in its value (e.g.
SG-XRRH-C1M0-A).
After indexing, I can perform certain queries. However, I feel confused to
explain the behavior.
- if searching for
PartNumber:SG
it will return multiple hits. I assume the anaylzer might take out '-'.
- if
On Friday 30 May 2003 09:55, Leo Galambos wrote:
Ah, I got it. THX. In the good old days, the wildcards were used as a
fix for missing stemming module. I am not sure if you can combine these
two opposite approaches successfully. I see the following drawbacks of
your solution.
Example:
I found some references to an SQLDirectory class in the mailing list
archives but I was unable to actually locate the package anywhere in the
CVS (I looked in both the primary and the sandbox) nor could I find it
in Google.
Anyhow, I have written my own implementation of a database-backed
Which API do I use to perform full-text search? The APIs in Javadoc also
refer to a field name. In my case, I would like to do a search (e.g. bob)
across all documents, irrespective of the field name.
-
To unsubscribe, e-mail:
14 matches
Mail list logo