Changing the Punctuation definition for StandardAnalyzer

2007-12-20 Thread tareque
I am using StandardAnalyzer for my indexes. Now I don't want to be able to be search whole email addresses, and want to consider '@' as a punctuation too. Because my users would rather be able to search for user id and/or the host name to return all the email addresses than searching by the whole a

Re: Changing the Punctuation definition for StandardAnalyzer

2007-12-20 Thread Karl Wettin
20 dec 2007 kl. 18.43 skrev [EMAIL PROTECTED]: I am using StandardAnalyzer for my indexes. Now I don't want to be able to be search whole email addresses, and want to consider '@' as a punctuation too. Because my users would rather be able to search for user id and/ or the host name to ret

Re: Changing the Punctuation definition for StandardAnalyzer

2007-12-20 Thread tareque
Thanks Karl, I would rather like to modify the lexer grammar. But exactly where it is defined. After having a quick look, seems like StandardTokenizerTokenManager.java may be where it is being done. Ampersand having a decimal value of '38', I was assuming that the following step is taken when face

Re: Changing the Punctuation definition for StandardAnalyzer

2007-12-20 Thread Karl Wettin
20 dec 2007 kl. 20.21 skrev [EMAIL PROTECTED]: I would rather like to modify the lexer grammar. But exactly where it is defined. After having a quick look, seems like StandardTokenizerTokenManager.java may be where it is being done. http://svn.apache.org/repos/asf/lucene/java/trunk/src/java

Re: Changing the Punctuation definition for StandardAnalyzer

2007-12-20 Thread tareque
Karl, I should have mentioned before, I have Lucene 1.9.1. In fact I had previously located the grammar in StandardTokenizer.jj (just wasn't sure if that was the one u were talking about) and had commented out EMAIL entries from all the following files: StandardTokenizer.java StandardTokenizer.j

Re: Changing the Punctuation definition for StandardAnalyzer

2007-12-20 Thread Karl Wettin
20 dec 2007 kl. 22.32 skrev [EMAIL PROTECTED]: In fact I had previously located the grammar in StandardTokenizer.jj (just wasn't sure if that was the one u were talking about) and had commented out EMAIL entries from all the following files: StandardTokenizer.java StandardTokenizer.jj Stand

Re: Changing the Punctuation definition for StandardAnalyzer

2007-12-20 Thread tareque
I actually hadn't implemented the TokenFilter solution before deciding not to go with that solution, so didn't have any benchmark. But eventually I have taken care of this problem with a different variation of your quick and dirty solution. I have captured the character '@' in FastCharStream.java,