StandardTokenizer and e-mail

2004-05-21 Thread Albert Vila
Hi all, I want to achieve the following, when I indexing the '[EMAIL PROTECTED]', I want to index the '[EMAIL PROTECTED]' token, then the 'xyz' token, the 'company' token and the 'com'token. This way, you'll be able to find the document searching for '[EMAIL PROTECTED]', for 'xyz' only, or for

Re: StandardTokenizer and e-mail

2004-05-21 Thread Otis Gospodnetic
Si, si. Write your own TokenFilter sub-class that overrides next() and extracts those other elements/tokens from an email address token and uses Token's setPositionIncrement(0) to store the extracted tokens in the same position as the original email. Otis --- Albert Vila [EMAIL PROTECTED] wrote:

now maybe Mozlla/IMAP URLs - Re: StandardTokenizer and e-mail

2004-05-21 Thread David Spencer
This reminds me - if you have a search engine that indexes a mail store and you present results in a web page to a browser, you want to (of course...well I think this is obvious) send back a URL that would cause the users native mail client to pull up the msg. IMAP has a URL format, and I use

Re: StandardTokenizer and e-mail

2004-05-21 Thread Erik Hatcher
Further on this... If you are using StandardTokenizer, the token for an e-mail address has the type value of EMAIL, which you could use to pick up specifically in a custom TokenFilter implementation and split it how you like, passing through everything else. Take a look at StandardFilter's