How would you deal with a query like a*z though?
I suspect, however, that you only care about suffix queries and
stemming those. If thats the case, then you could subclass
getWildcardQuery and do internal stemming (remove trailing wildcard,
run it through the analyzer directly there and return
How would you deal with a query like a*z though?
Yeah I know, a user submitting that is certainly possible. I have no idea. I
am starting to think that NOT stemming on indexing might be the safest
solution.
- Original Message -
From: Erik Hatcher [EMAIL PROTECTED]
To: Lucene Users List
this then becomes unitedxxwildcardxx, which we can then turn
into a WildcardQuery united*
The problem here is that the term united will never exist in the indexing due
to the stemming which did not stem properly due to our escape mechanism.
How can I solve this problem?
Do stemming algorithms take into consideration abbreviations too? Some
examples:
mg = milligrams
US = United States
CD = compact disc
vcr = video casette recorder
And, the next logical question, if stemming does not take care of
abbreviations, are there any solutions that include abbreviations
On Jan 24, 2005, at 7:24 AM, Kevin L. Cobb wrote:
Do stemming algorithms take into consideration abbreviations too?
No, they don't. Adding abbreviations, aliases, synonyms, etc is not
stemming.
And, the next logical question, if stemming does not take care of
abbreviations, are there any
Morus Walter wrote:
Owen Densmore writes:
1 - I'm a bit concerned that reasonable stemming (Porter/Snowball)
apparently produces non-word stems .. i.e. not really human readable.
(Example: generate, generates, generated, generating - generat)
Although in typical queries this is not important
1 - I'm a bit concerned that reasonable stemming
(Porter/Snowball)
apparently produces non-word stems .. i.e. not
really human readable.
It is possible to derive the human-readable form of a
stemmed term using either re-analysis of indexed
content or TermPositionVector. Either
I want to understand how Lucene uses stemming but can't find any
documentation on the Lucene site. I'll continue to google but hope that
this list can help narrow my search. I have several questions on the
subject currently but hesitate to list them here since finding a good
document
Hi Kevin,
Stemming is an optional operation and is done in the analysis step.
Lucene comes with a Porter stemmer and a Filter that you can use in an
Analyzer:
./src/java/org/apache/lucene/analysis/PorterStemFilter.java
./src/java/org/apache/lucene/analysis/PorterStemmer.java
You can find more
: Stemming
Hi Kevin,
Stemming is an optional operation and is done in the analysis step.
Lucene comes with a Porter stemmer and a Filter that you can use in an
Analyzer:
./src/java/org/apache/lucene/analysis/PorterStemFilter.java
./src/java/org/apache/lucene/analysis/PorterStemmer.java
You can find
. Might as well take the final plunge.
-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: Friday, January 21, 2005 9:12 AM
To: Lucene Users List
Subject: Re: Stemming
Hi Kevin,
Stemming is an optional operation and is done in the analysis step.
Lucene
to search and discover.
Our initial approach will be vector based, looking at Latent Semantic
Indexing (LSI) as a potential tool, although if that's not needed,
we'll stop at reasonably simple stemming with a weighted document term
matrix (DTM). (Bear in mind I couldn't even pronounce most
) as a potential tool, although if that's not needed,
we'll stop at reasonably simple stemming with a weighted document term
matrix (DTM). (Bear in mind I couldn't even pronounce most of these
concepts last week, so go easy if I'm incoherent!)
It looks to me that Lucene has a quite well factored
:[EMAIL PROTECTED]
Sent: Thursday, January 20, 2005 2:10 PM
To: Lucene Users List
Subject: Re: Newbie: Human Readable Stemming, Lucene Architecture,
etc!
Hi,
One thing to point out. I think Lucene is not using LSI as the
underlying retrieval model. It uses vector space model
Owen Densmore writes:
1 - I'm a bit concerned that reasonable stemming (Porter/Snowball)
apparently produces non-word stems .. i.e. not really human readable.
(Example: generate, generates, generated, generating - generat)
Although in typical queries this is not important because
Hi,
I'm new to Lucene, so I apologize if this issue has been discussed
before (I'm sure it has), but I had a hard time finding an answer using
google. (Maybe this would be a good candidate for the FAQ!) :)
Is it possible to enable stem queries on a per-query basis? It doesn't
seem to be possible
From what I've read, if you want to have a choice, the easiest way is
to index the documents twice. Once with stemming on and once with it off
placing the results in two different indexes. Then at query time,
select which index you want to use based on whether you want stemming on
or off
, the easiest way is
: to index the documents twice. Once with stemming on and once with it off
: placing the results in two different indexes. Then at query time,
: select which index you want to use based on whether you want stemming on
: or off.
As I understand it, the intented place
Hi, I have used the DEMOS of lucene and I want to know as it is
possible to be added Stemming for my applications.
--
Miguel Angel Angeles R.
Asesoria en Conectividad y Servidores
Telf. 97451277
-
To unsubscribe, e-mail
Miguel Angel schrieb:
Hi, I have used the DEMOS of lucene and I want to know as it is
possible to be added Stemming for my applications.
have a look to the lucene-sandbox. Under contributions there are
stemmers for many different languages
Hi Yousef
You are not doing anything wrong - its just how the Porter stemmer works!
The problem with Porter is that it tries to do everything in a purely algorithmic way
- which doesn't cater for irregular conjugations etc.
Don't worry too much though, as long as you do the same stemming
Has anyone on the list implemented a dictionary-based English stemmer
with Lucene? Perhaps based on the freely-available ispell dictionaries
or something like that? The Porter and Snowball stemmers have not
worked that well for our application, but it is a bit daunting to start
from scratch in
Hi,
my application uses a GermanAnalyzer for tokenizing a search string and
constructing Query classes:
Analyzer an = new
org.apache.lucene.analysis.de.GermanAnalyzer();
TokenStream ts = an.tokenStream(fieldName, new
StringReader(fieldText));
I have noticed a strange problem
For now you could check out the current lucene version from cvs and
just comment out the following line:
uppercase = Character.isUpperCase( term.charAt( 0 ) );
In GermanStemmer.java of course ;))
Regards
Christoph
-
Hi all
Does the lucene will do stemming of a word? If yes can anyone
say how to do it in java using lucene api.
Thanks
rgds
srinivas
__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com
Hi all
Can anyone tell, where can i get the process flow diagrams kind
of thing for lucene. I want to know how lucene works.
Thanks
rgds
srinivas
__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com
Check out PorterStemFilter class and Analyzer class. Then look at some
Analyzer implementations and see how to implement your own
PorterAnalyzer.
Otis
--- M Srinivas Rao [EMAIL PROTECTED] wrote:
Hi all
Does the lucene will do stemming of a word? If yes can anyone
say how to do it in java
In our search application the user can turn stemming off and on.
With Lucene will I have to maintain two sets of indexes to create this functionality,
one
stemming and one non-stemming index?
Or
Is there a way to query a stemming index so that it does not return stems?
Thanks,
Joel
PROTECTED] wrote:
In our search application the user can turn stemming off and on.
With Lucene will I have to maintain two sets of indexes to create
this functionality, one
stemming and one non-stemming index?
Or
Is there a way to query a stemming index so that it does not return
stems
29 matches
Mail list logo