special handling of certain terms with embedded periods

2007-08-09 Thread Donna L Gresh
Is there a good way to handle the following scenario: I have certain terms with embedded periods for which I want to leave them intact (not split at the periods). For example in my application a particular skill might be SAP.FIN (SAP financial), and it should not be split into SAP and FIN. Is t

Re: special handling of certain terms with embedded periods

2007-08-09 Thread karl wettin
9 aug 2007 kl. 16.36 skrev Donna L Gresh: Is there a good way to handle the following scenario: I have certain terms with embedded periods for which I want to leave them intact (not split at the periods). For example in my application a particular skill might be SAP.FIN (SAP financial), and

Re: special handling of certain terms with embedded periods

2007-08-09 Thread Erick Erickson
Some possibilities... > write your own tokenizer and/or filter. If you alter your BNF, you'll have to maintain it in later releases. > use some simple transformations for the input *before* tokenizing. > there's been some discussion that StandardAnalyzer (and, I assume, the Standard* beasts

Re: special handling of certain terms with embedded periods

2007-08-09 Thread Donna L Gresh
bject Re: special handling of certain terms with embedded periods Some possibilities... > write your own tokenizer and/or filter. If you alter your BNF, you'll have to maintain it in later releases. > use some simple transformations for the input *before* tokenizing

Re: special handling of certain terms with embedded periods

2007-08-09 Thread Mark Miller
Donna L Gresh wrote: But your point about the StandardAnalyzer being slow is well-taken, and I'll keep that in mind. A new StandardAnalyzer that is 6x faster was recently committed on the trunk. Should be in next release. - Mark -