RE: Disabling modifiers?

2003-12-16 Thread Iain Young
rence * * www.microfocus.com/devforum * * -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: 16 December 2003 12:31 To: Lucene Users List Subject: Re: Disabling modifiers? On Tuesday, December 16, 2003, at 07:28 AM, Erik Hatcher

RE: Disabling modifiers?

2003-12-16 Thread Iain Young
Thanks Karl. -Original Message- From: Karl Penney [mailto:[EMAIL PROTECTED] Sent: 16 December 2003 13:58 To: Lucene Users List Subject: Re: Disabling modifiers? One of the token patterns defined by the StandardTokenizer.jj is this

Re: Disabling modifiers?

2003-12-16 Thread Karl Penney
sers List'" <[EMAIL PROTECTED]> Sent: Tuesday, December 16, 2003 7:46 AM Subject: RE: Disabling modifiers? > I think it is a problem with the indexing. I've found another example... > > WS-CA-PP00-PROCESS-YYMM > > I've looked at the index, and it has been to

RE: Disabling modifiers?

2003-12-16 Thread Iain Young
w.microfocus.com/devforum * * -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: 16 December 2003 12:31 To: Lucene Users List Subject: Re: Disabling modifiers? On Tuesday, December 16, 2003, at 07:28 AM, Erik Hatcher wrote: > And yes, if you are using StandardTokenizer, you ar

Re: Disabling modifiers?

2003-12-16 Thread Erik Hatcher
On Tuesday, December 16, 2003, at 07:28 AM, Erik Hatcher wrote: And yes, if you are using StandardTokenizer, you are probably not tokenizing COBOL quite like you expect. Is there a COBOL parser you could tap into that could give you the tokens you want? Ummm. nevermind that last question...

Re: Disabling modifiers?

2003-12-16 Thread Erik Hatcher
On Tuesday, December 16, 2003, at 05:46 AM, Iain Young wrote: Treating them as two separate words when quoted is indicative of your analyzer not being sufficient for your domain. What Analyzer are you using? Do you have knowledge of what it is tokenizing text into? I have created a custom analyz

RE: Disabling modifiers?

2003-12-16 Thread Iain Young
I think it is a problem with the indexing. I've found another example... WS-CA-PP00-PROCESS-YYMM I've looked at the index, and it has been tokenized into 3 words... WS CA-PP00-PROCESS YYMM Looks as though I might have to use a custom tokenizer as well as an analyzer then, but any ideas as to wh

RE: Disabling modifiers?

2003-12-16 Thread Iain Young
regor Heinrich [mailto:[EMAIL PROTECTED] Sent: 15 December 2003 18:32 To: 'Lucene Users List' Subject: RE: Disabling modifiers? If you don't want to fiddle with the JavaCC source of QueryParser.jj, you could work with a regular expression that works in front of the actual query parser.

RE: Disabling modifiers?

2003-12-16 Thread Iain Young
> Treating them as two separate words when quoted is indicative of your > analyzer not being sufficient for your domain. What Analyzer are you > using? Do you have knowledge of what it is tokenizing text into? I have created a custom analyzer (CobolAnalyzer) which contains some custom stop wor

RE: Disabling modifiers?

2003-12-15 Thread Gregor Heinrich
If you don't want to fiddle with the JavaCC source of QueryParser.jj, you could work with a regular expression that works in front of the actual query parser. I just did something similar because I input Lucene's query strings into a latent semantic analysis algorithm and remove words with + and ?

Re: Disabling modifiers?

2003-12-15 Thread Erik Hatcher
On Monday, December 15, 2003, at 12:12 PM, Iain Young wrote: A quick question. Is there any way to disable the - and + modifiers in the QueryParser? Not currently. I've had a bit of success by putting quotes around the offending names, (as suggested on this list), but the results are still less