Re: StandardAnalyzer question

2006-09-29 Thread Doron Cohen
QueryParser can do that for you - something like: QueryParser qp = new QueryParser( "CONTENTS" , new StandardAnalyzer() ); qp.setDefaultOperator ( Operator.AND ); Query q = qp.parse ( "TOOLS FOR TRAILER" ); Result query should be: +content:tools +content:trailer "Van Ng

Re: [BULK] StandardAnalyzer question

2006-09-29 Thread Ryan Heinen
Van Nguyen wrote: I have a field in my index that is being tokenized using the StandardAnalyzer. Let’s say that field was: TOOLS FOR TRAILER The word “FOR” is a stop word so it is not being indexed (based on the StandardAnaylzyer). When someone types in TOOLS FOR TRAILER in, I have a Boole

StandardAnalyzer question

2006-09-29 Thread Van Nguyen
I have a field in my index that is being tokenized using the StandardAnalyzer.  Let’s say that field was:   TOOLS FOR TRAILER   The word “FOR” is a stop word so it is not being indexed (based on the StandardAnaylzyer).  When someone types in TOOLS FOR TRAILER in, I have a BooleanQuery s

RE: StandardAnalyzer question

2006-07-21 Thread Ngo, Anh \(ISS Southfield\)
It works now. Thank you very much. I forgot to run javacc for the StandardTokenizer.jj Sincerely, Anh Ngo -Original Message- From: Mark Miller [mailto:[EMAIL PROTECTED] Sent: Friday, July 21, 2006 5:33 PM To: java-user@lucene.apache.org Subject: Re: StandardAnalyzer question

Re: StandardAnalyzer question

2006-07-21 Thread Mark Miller
ers [ "\u0041"-"\u005a", "\u0061"-"\u007a", "\u00c0"-"\u00d6", "\u00d8"-"\u00f6", "\u00f8"-"\u00ff", "\u0100"-"\u1fff", "\u005f"

RE: StandardAnalyzer question

2006-07-21 Thread Ngo, Anh \(ISS Southfield\)
;\u005a", "\u0061"-"\u007a", "\u00c0"-"\u00d6", "\u00d8"-"\u00f6", "\u00f8"-"\u00ff", "\u0100"-"\u1fff", "\u005f" ] > Please help.

Re: StandardAnalyzer question

2006-07-21 Thread Doron Cohen
"\u002d" would add "-". Originally request was for "_" - "\u005f" "Mark Miller" <[EMAIL PROTECTED]> wrote on 21/07/2006 13:09:28: > | < #LETTER: // unicode letters > [ >"\u0041"-"\u005a", >"\u0061"-"\u007a", >"\u00c0"-"\u00d6", >"\u00d8"-

Re: StandardAnalyzer question

2006-07-21 Thread Mark Miller
"\u00f8"-"\u00ff", "\u0100"-"\u1fff", "\u002d" ] On 7/21/06, Ngo, Anh (ISS Southfield) <[EMAIL PROTECTED]> wrote: Hello Mark, Please show me how to add "-" to #LETTER definition Thanks, Anh Ngo -Origi

Re: StandardAnalyzer question

2006-07-21 Thread Mark Miller
"."|",") > > | <#HAS_DIGIT:// at least one digit > (|)* > > (|)* > > > > > Should I remove "_" and recompile the source code? > > Sincerely, > > > Anh Ngo > > -Original Message- > From: Daniel Na

RE: StandardAnalyzer question

2006-07-21 Thread Ngo, Anh \(ISS Southfield\)
Hello Mark, Please show me how to add "-" to #LETTER definition Thanks, Anh Ngo -Original Message- From: Mark Miller [mailto:[EMAIL PROTECTED] Sent: Friday, July 21, 2006 3:51 PM To: java-user@lucene.apache.org Subject: Re: StandardAnalyzer question I do not beleive

Re: StandardAnalyzer question

2006-07-21 Thread Mark Miller
// at least one digit (|)* (|)* > Should I remove "_" and recompile the source code? Sincerely, Anh Ngo -Original Message- From: Daniel Naber [mailto:[EMAIL PROTECTED] Sent: Friday, July 21, 2006 2:49 PM To: java-user@lucene.apache.org Subject

RE: StandardAnalyzer question

2006-07-21 Thread Ngo, Anh \(ISS Southfield\)
recompile the source code? Sincerely, Anh Ngo -Original Message- From: Daniel Naber [mailto:[EMAIL PROTECTED] Sent: Friday, July 21, 2006 2:49 PM To: java-user@lucene.apache.org Subject: Re: StandardAnalyzer question On Freitag 21 Juli 2006 16:16, Ngo, Anh (ISS Southfield) wrote: > The luc

Re: StandardAnalyzer question

2006-07-21 Thread Daniel Naber
On Freitag 21 Juli 2006 16:16, Ngo, Anh (ISS Southfield) wrote: > The lucene 2.0.0 StandardAnalyzer does treat the "_"(underscore) as a > token.  Is there a way I can make StandardAnalyzer don't tokenize for > "_" or any given characters? You need to add "_" to the #LETTER definition in StandardT

StandardAnalyzer question

2006-07-21 Thread Ngo, Anh \(ISS Southfield\)
Hello The lucene 2.0.0 StandardAnalyzer does treat the "_"(underscore) as a token. Is there a way I can make StandardAnalyzer don't tokenize for "_" or any given characters? I'd like to keep all features that StandardAnalyzer have but want to modified it a bit for my need? How do I control what

Re: StandardAnalyzer question ...

2006-02-20 Thread Oskar Berger
Hello, Not yet an expert in the field, but as I've understood the thing the terms are indexed as you specify them (through the filters) but the contents are stored depending on whether you want it or not (Filed.UnStored(), which happens to be on its way to get deprecated). So maybe you search the

StandardAnalyzer question ...

2006-02-20 Thread Mufaddal Khumri
Hi, When StandardAnalyzer is used to index documents, arent the terms, amongst other things, lower cased and stored that ways in the index? I have a index field that I index like this: ramWriter = new IndexWriter(ramDir, standardAnalyzer, true); ... ... doc.add(Field.Text("categoryN