Re: wildcards, stemming and searching

2005-02-10 Thread Erik Hatcher
How would you deal with a query like a*z though? I suspect, however, that you only care about suffix queries and stemming those. If thats the case, then you could subclass getWildcardQuery and do internal stemming (remove trailing wildcard, run it through the analyzer directly there and return

Re: wildcards, stemming and searching

2005-02-10 Thread aaz
How would you deal with a query like a*z though? Yeah I know, a user submitting that is certainly possible. I have no idea. I am starting to think that NOT stemming on indexing might be the safest solution. - Original Message - From: Erik Hatcher [EMAIL PROTECTED] To: Lucene Users List

wildcards, stemming and searching

2005-02-09 Thread aaz
this then becomes unitedxxwildcardxx, which we can then turn into a WildcardQuery united* The problem here is that the term united will never exist in the indexing due to the stemming which did not stem properly due to our escape mechanism. How can I solve this problem?

RE: Stemming

2005-01-24 Thread Kevin L. Cobb
Do stemming algorithms take into consideration abbreviations too? Some examples: mg = milligrams US = United States CD = compact disc vcr = video casette recorder And, the next logical question, if stemming does not take care of abbreviations, are there any solutions that include abbreviations

Re: Stemming

2005-01-24 Thread Erik Hatcher
On Jan 24, 2005, at 7:24 AM, Kevin L. Cobb wrote: Do stemming algorithms take into consideration abbreviations too? No, they don't. Adding abbreviations, aliases, synonyms, etc is not stemming. And, the next logical question, if stemming does not take care of abbreviations, are there any

Re: Newbie: Human Readable Stemming, Lucene Architecture, etc!

2005-01-21 Thread Andrzej Bialecki
Morus Walter wrote: Owen Densmore writes: 1 - I'm a bit concerned that reasonable stemming (Porter/Snowball) apparently produces non-word stems .. i.e. not really human readable. (Example: generate, generates, generated, generating - generat) Although in typical queries this is not important

Re: Newbie: Human Readable Stemming, Lucene Architecture, etc!

2005-01-21 Thread mark harwood
1 - I'm a bit concerned that reasonable stemming (Porter/Snowball) apparently produces non-word stems .. i.e. not really human readable. It is possible to derive the human-readable form of a stemmed term using either re-analysis of indexed content or TermPositionVector. Either

Stemming

2005-01-21 Thread Kevin L. Cobb
I want to understand how Lucene uses stemming but can't find any documentation on the Lucene site. I'll continue to google but hope that this list can help narrow my search. I have several questions on the subject currently but hesitate to list them here since finding a good document

Re: Stemming

2005-01-21 Thread Otis Gospodnetic
Hi Kevin, Stemming is an optional operation and is done in the analysis step. Lucene comes with a Porter stemmer and a Filter that you can use in an Analyzer: ./src/java/org/apache/lucene/analysis/PorterStemFilter.java ./src/java/org/apache/lucene/analysis/PorterStemmer.java You can find more

RE: Stemming

2005-01-21 Thread Kevin L. Cobb
: Stemming Hi Kevin, Stemming is an optional operation and is done in the analysis step. Lucene comes with a Porter stemmer and a Filter that you can use in an Analyzer: ./src/java/org/apache/lucene/analysis/PorterStemFilter.java ./src/java/org/apache/lucene/analysis/PorterStemmer.java You can find

Re: Stemming

2005-01-21 Thread Chris Lamprecht
. Might as well take the final plunge. -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Friday, January 21, 2005 9:12 AM To: Lucene Users List Subject: Re: Stemming Hi Kevin, Stemming is an optional operation and is done in the analysis step. Lucene

Newbie: Human Readable Stemming, Lucene Architecture, etc!

2005-01-20 Thread Owen Densmore
to search and discover. Our initial approach will be vector based, looking at Latent Semantic Indexing (LSI) as a potential tool, although if that's not needed, we'll stop at reasonably simple stemming with a weighted document term matrix (DTM). (Bear in mind I couldn't even pronounce most

Re: Newbie: Human Readable Stemming, Lucene Architecture, etc!

2005-01-20 Thread jian chen
) as a potential tool, although if that's not needed, we'll stop at reasonably simple stemming with a weighted document term matrix (DTM). (Bear in mind I couldn't even pronounce most of these concepts last week, so go easy if I'm incoherent!) It looks to me that Lucene has a quite well factored

RE: Newbie: Human Readable Stemming, Lucene Architecture, etc!

2005-01-20 Thread Chuck Williams
:[EMAIL PROTECTED] Sent: Thursday, January 20, 2005 2:10 PM To: Lucene Users List Subject: Re: Newbie: Human Readable Stemming, Lucene Architecture, etc! Hi, One thing to point out. I think Lucene is not using LSI as the underlying retrieval model. It uses vector space model

Re: Newbie: Human Readable Stemming, Lucene Architecture, etc!

2005-01-20 Thread Morus Walter
Owen Densmore writes: 1 - I'm a bit concerned that reasonable stemming (Porter/Snowball) apparently produces non-word stems .. i.e. not really human readable. (Example: generate, generates, generated, generating - generat) Although in typical queries this is not important because

Query based stemming

2005-01-07 Thread Peter Kim
Hi, I'm new to Lucene, so I apologize if this issue has been discussed before (I'm sure it has), but I had a hard time finding an answer using google. (Maybe this would be a good candidate for the FAQ!) :) Is it possible to enable stem queries on a per-query basis? It doesn't seem to be possible

Re: Query based stemming

2005-01-07 Thread Jim Lynch
From what I've read, if you want to have a choice, the easiest way is to index the documents twice. Once with stemming on and once with it off placing the results in two different indexes. Then at query time, select which index you want to use based on whether you want stemming on or off

Re: Query based stemming

2005-01-07 Thread Chris Hostetter
, the easiest way is : to index the documents twice. Once with stemming on and once with it off : placing the results in two different indexes. Then at query time, : select which index you want to use based on whether you want stemming on : or off. As I understand it, the intented place

about Stemming

2004-11-13 Thread Miguel Angel
Hi, I have used the DEMOS of lucene and I want to know as it is possible to be added Stemming for my applications. -- Miguel Angel Angeles R. Asesoria en Conectividad y Servidores Telf. 97451277 - To unsubscribe, e-mail

Re: about Stemming

2004-11-13 Thread Bernhard Messer
Miguel Angel schrieb: Hi, I have used the DEMOS of lucene and I want to know as it is possible to be added Stemming for my applications. have a look to the lucene-sandbox. Under contributions there are stemmers for many different languages

Re: Stemming Oddness

2004-11-06 Thread Pete Lewis
Hi Yousef You are not doing anything wrong - its just how the Porter stemmer works! The problem with Porter is that it tries to do everything in a purely algorithmic way - which doesn't cater for irregular conjugations etc. Don't worry too much though, as long as you do the same stemming

Stemming options

2004-04-11 Thread Boris Goldowsky
Has anyone on the list implemented a dictionary-based English stemmer with Lucene? Perhaps based on the freely-available ispell dictionaries or something like that? The Porter and Snowball stemmers have not worked that well for our application, but it is a bit daunting to start from scratch in

Problem with tokenizing/stemming in GermanAnalyzer

2003-02-17 Thread Volker Luedeling
Hi, my application uses a GermanAnalyzer for tokenizing a search string and constructing Query classes: Analyzer an = new org.apache.lucene.analysis.de.GermanAnalyzer(); TokenStream ts = an.tokenStream(fieldName, new StringReader(fieldText)); I have noticed a strange problem

Re: Problem with tokenizing/stemming in GermanAnalyzer

2003-02-17 Thread Christoph Kiehl
For now you could check out the current lucene version from cvs and just comment out the following line: uppercase = Character.isUpperCase( term.charAt( 0 ) ); In GermanStemmer.java of course ;)) Regards Christoph -

stemming feature

2002-12-10 Thread M Srinivas Rao
Hi all Does the lucene will do stemming of a word? If yes can anyone say how to do it in java using lucene api. Thanks rgds srinivas __ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com

stemming feature

2002-12-10 Thread M Srinivas Rao
Hi all Can anyone tell, where can i get the process flow diagrams kind of thing for lucene. I want to know how lucene works. Thanks rgds srinivas __ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com

Re: stemming feature

2002-12-10 Thread Otis Gospodnetic
Check out PorterStemFilter class and Analyzer class. Then look at some Analyzer implementations and see how to implement your own PorterAnalyzer. Otis --- M Srinivas Rao [EMAIL PROTECTED] wrote: Hi all Does the lucene will do stemming of a word? If yes can anyone say how to do it in java

Stemming

2002-05-02 Thread Joel Bernstein
In our search application the user can turn stemming off and on. With Lucene will I have to maintain two sets of indexes to create this functionality, one stemming and one non-stemming index? Or Is there a way to query a stemming index so that it does not return stems? Thanks, Joel

Re: Stemming

2002-05-02 Thread Otis Gospodnetic
PROTECTED] wrote: In our search application the user can turn stemming off and on. With Lucene will I have to maintain two sets of indexes to create this functionality, one stemming and one non-stemming index? Or Is there a way to query a stemming index so that it does not return stems