Expected Behavior from QueryParser and Standard Analyzer with Version.LUCENE_*

2011-05-09 Thread Chris Currens
Hello, I have some questions about what kind of behavior is expected when passing Version.LUCENE_24/29/30 to QueryParser and the StandardAnalyzer when parsing a query. I know that passing the Version to the constructors make Lucene act that like version, with all features and bugs intact. The

Re: Whitespace/Standard Analyzer and punctuation

2009-09-30 Thread Karl Wettin
You could look in to modifying the standard tokenizer lexer code to handle punctuation (there is a patch in the isssue tracker for the old javacc grammer to handle punctuation) and there is also the Gate NLP project which has a fairly nice sentence splitter you might find useful. Add a

Whitespace/Standard Analyzer and punctuation

2009-09-29 Thread Max Lynch
I would like my searches to match John Smith when John Smith is in a document, but not separated with punctuation. For example, when I was using StandardAnalyzer, John. Smith was matching, which is wrong for me. Right now I am using WhitespaceAnalyzer but instead searching for John Smith John

Re: Is there a list of special characters for standard analyzer?

2009-07-31 Thread ohaya
Phil Whelan phil...@gmail.com wrote: On Thu, Jul 30, 2009 at 7:12 PM, oh...@cox.net wrote: I was wonder if there is a list of special characters for the standard analyzer? What I mean by special is characters that the analyzer considers break characters. For example, if I

Re: Is there a list of special characters for standard analyzer?

2009-07-31 Thread AHMET ARSLAN
I guess that the obvious question is Which characters are considered 'punctuation characters'?. Punctuation = (_|-|/|.|,) In particular, does the analyzer consider = (equal) and : (colon) to be punctuation characters? : is special character at QueryParser (if you are using it). If you

Re: Is there a list of special characters for standard analyzer?

2009-07-31 Thread ohaya
Hi Ahmet, Thanks for the clarification and information! That was exactly what I was looking for. Jim AHMET ARSLAN iori...@yahoo.com wrote: I guess that the obvious question is Which characters are considered 'punctuation characters'?. Punctuation = (_|-|/|.|,) In

Re: Is there a list of special characters for standard analyzer?

2009-07-31 Thread Simon Willnauer
On Fri, Jul 31, 2009 at 5:00 PM, oh...@cox.net wrote: Hi Ahmet, Thanks for the clarification and information!  That was exactly what I was looking for. Jim AHMET ARSLAN iori...@yahoo.com wrote: I guess that the obvious question is Which characters are considered 'punctuation

Is there a list of special characters for standard analyzer?

2009-07-30 Thread ohaya
Hi, I was wonder if there is a list of special characters for the standard analyzer? What I mean by special is characters that the analyzer considers break characters. For example, if I have something like foo=something, apparently the analyzer considers this as two terms, foo

Re: Is there a list of special characters for standard analyzer?

2009-07-30 Thread Phil Whelan
On Thu, Jul 30, 2009 at 7:12 PM, oh...@cox.net wrote: I was wonder if there is a list of special characters for the standard analyzer? What I mean by special is characters that the analyzer considers break characters. For example, if I have something like foo=something, apparently

Standard Analyzer

2008-08-25 Thread Kalani Ruwanpathirana
Hi, I am using StandardAnalyzer when creating the Lucene index. It indexes the word work as it is but does not index the word wo*rk in that manner. Can I index such words (including * and ?) as it is? Otherwise I have no way to index and search for words like wo*rk, you?, etc. Thanks -- Kalani

Re: Standard Analyzer

2008-08-25 Thread tom
AUTOMATIC REPLY Tom Roberts is out of the office till 2nd September 2008. LUX reopens on 1st September 2008 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Re: Standard Analyzer

2008-08-25 Thread tom
AUTOMATIC REPLY Tom Roberts is out of the office till 2nd September 2008. LUX reopens on 1st September 2008 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Standard Analyzer

2008-08-25 Thread Karl Wettin
25 aug 2008 kl. 09.19 skrev Kalani Ruwanpathirana: Hi, I am using StandardAnalyzer when creating the Lucene index. It indexes the word work as it is but does not index the word wo*rk in that manner. Can I index such words (including * and ?) as it is? Otherwise I have no way to index

Re: Standard Analyzer

2008-08-25 Thread Kalani Ruwanpathirana
Hi, Thanks, I tried WhitespaceAnalyzer too, but it seems case sensitive. If I need to search for words like correct?, html (it escapes , and another few characters too) I need to index those kind of words. On Mon, Aug 25, 2008 at 1:15 PM, Karl Wettin [EMAIL PROTECTED] wrote: 25 aug 2008 kl.

Re: Standard Analyzer

2008-08-25 Thread Karl Wettin
25 aug 2008 kl. 11.14 skrev Kalani Ruwanpathirana: Hi, Thanks, I tried WhitespaceAnalyzer too, but it seems case sensitive. Then you simply add a LowercaseFilter to the chain in the Analyzer: public final class WhitespaceAnalyzer extends Analyzer { public TokenStream tokenStream(String

Lucene standard analyzer internationalization

2008-04-22 Thread Prashant Malik
HI , We have been observing the following problem while tokenizing using lucene's StandardAnalyzer. Tokens that we get is different on different machines. I am suspecting it has something to do with the Locale settings on individual machines? For example the word 'CÃ(c)sar' is split as

RE: Lucene standard analyzer internationalization

2008-04-22 Thread Steven A Rowe
Hi Prashant, On 04/22/2008 at 2:23 PM, Prashant Malik wrote: We have been observing the following problem while tokenizing using lucene's StandardAnalyzer. Tokens that we get is different on different machines. I am suspecting it has something to do with the Locale settings on individual

Re: Lucene standard analyzer internationalization

2008-04-22 Thread Prashant Malik
Yes the version of lucene and java are exactly the same on the different machines. Infact we unjared lucene and jared it with our jar and are running from the same nfs mounts on both the machines Also we have tried with lucene2.2.0 and 2.3.1. with the same result . also about the actual string u

RE: Lucene standard analyzer internationalization

2008-04-22 Thread Steven A Rowe
Hi Prashant, What is the Unicode code point associated with the 3,4,5 character? Steve On 04/22/2008 at 4:45 PM, Prashant Malik wrote: Yes the version of lucene and java are exactly the same on the different machines. Infact we unjared lucene and jared it with our jar and are running from

Re: Lucene standard analyzer internationalization

2008-04-22 Thread Chris Hostetter
: Yes the version of lucene and java are exactly the same on the different : machines. : Infact we unjared lucene and jared it with our jar and are running from the : same nfs mounts on both the machines i didn't do an indepth code read, but a quick skim of StandardTokenizerImpl didn't turn up

Standard Analyzer Escapes

2007-07-13 Thread Walt Stoneburner
In reading the documentation for escape characters, I'm having a little trouble understanding what it wants me to do for certain special cases. http://lucene.apache.org/java/docs/queryparsersyntax.html#Escaping%20Special%20Characters says: Lucene supports escaping special characters that are

Re: Standard Analyzer Escapes

2007-07-13 Thread Yonik Seeley
I just tried some things fast via the Solr admin interface, and everything seems fine. I think you are probably confusing what the parser does vs what the analyzer does. Try your tests with an un-tokenized field to remove that effect. -Yonik On 7/13/07, Walt Stoneburner [EMAIL PROTECTED] wrote:

Re: Standard Analyzer Escapes

2007-07-13 Thread Mark Miller
This is certainly the case. StandardAnalyzer has a regex matcher that looks for a possible company name involving an or an @. The QueryParser is escaping the '' -- all of the affects described are standard results of using the StandardAnalzyer. Any double '' will break text, but 'sdfdfdfsdf'

Re: custom stop word list for standard analyzer

2007-04-13 Thread Chris Hostetter
: Apologies and thanks all at the same time, everyone. No apologies neccessary, you're not the first person to be confused by this, which is why I asked if you had any ideas on how we can improve hte docs -- people who know the APIs inside and out aren't in the best position to understand how to

custom stop word list for standard analyzer

2007-04-12 Thread Michael Barbarelli
I know this is a relatively fundamental thing to arrange, but I'm having trouble. Can I instantiate a standard analyzer with an argument containing my own stop words? If so, how? Will they be appended to or override the built-in stop words? Or, do I have to modify the analyzer class itself

Re: custom stop word list for standard analyzer

2007-04-12 Thread Paul Cowan
Michael Barbarelli wrote: Can I instantiate a standard analyzer with an argument containing my own stop words? If so, how? Will they be appended to or override the built-in stop words? You can do it with one of the alternate constructors, and they'll override the build-in list

Re: custom stop word list for standard analyzer

2007-04-12 Thread Chris Hostetter
: Michael Barbarelli wrote: : Can I instantiate a standard analyzer with an argument containing my own : stop words? If so, how? Will they be appended to or override the built-in I'm relly suprised how often this question gets asked ... Michael (or anyone else for that matter) do you have

Re: Modifying the standard analyzer

2006-07-07 Thread Chris Hostetter
: But ParseException extends IOException, so I don't see a problem there. : I wish my compiler agreed with you:) Which it seems to do until you : rebuild the files with javacc. I saw at least two other posts about this : problem on the web with no answer given... : This guy also found the same

Re: Modifying the standard analyzer

2006-07-07 Thread Mark Miller
Thank you so much. I apologize for my ignorance. Mark On 7/7/06, Chris Hostetter [EMAIL PROTECTED] wrote: : But ParseException extends IOException, so I don't see a problem there. : I wish my compiler agreed with you:) Which it seems to do until you : rebuild the files with javacc. I saw

RE: Problems in standard Analyzer

2005-09-26 Thread Kunemann Frank
, September 26, 2005 5:46 AM To: java-user@lucene.apache.org Subject: Problems in standard Analyzer Hi Mark and other Gurus, I am indexing one value as a key field (rtf txt indexing) , value is like 12345 or 123-09-34 or it can be like MN12345. Problem is if the value is like 12345 or 123-23-98

RE: Problems in standard Analyzer

2005-09-26 Thread M å n i s h
I thought of not using any Analyzer, but the problem is I got other queries that I am appending to this value with either OR or AND, so for that part of query I need Standard Analyzer , I think I should index that value like normal text, then may be it will work. -Original Message

RE: Problems in standard Analyzer

2005-09-26 Thread Kunemann Frank
@lucene.apache.org Subject: RE: Problems in standard Analyzer I thought of not using any Analyzer, but the problem is I got other queries that I am appending to this value with either OR or AND, so for that part of query I need Standard Analyzer , I think I should index that value like normal text

Re: Problems in standard Analyzer

2005-09-26 Thread Anand Kishore
To: java-user@lucene.apache.org Subject: RE: Problems in standard Analyzer It should be possible to combine queries using different types of analyzers. The only problem I can see is if you're using one single line for the whole query. Frank -Original Message- From: M å n i s h

RE: Problems in standard Analyzer

2005-09-26 Thread M å n i s h
] Sent: Monday, September 26, 2005 3:07 PM To: java-user@lucene.apache.org Subject: RE: Problems in standard Analyzer The problem is that in limo you can only use standard analyzers for your queries. As you've already seen some of them will change the key value to something else or even remove them

Problems in standard Analyzer

2005-09-25 Thread M å n i s h
Hi Mark and other Gurus, I am indexing one value as a key field (rtf txt indexing) , value is like 12345 or 123-09-34 or it can be like MN12345. Problem is if the value is like 12345 or 123-23-98 , Standard Analyzer is able to search it, but if the value is like MN12345 search will not return