The standard analyzer/tokenizer should do a decent job of splitting on dot,
hyphen, and underscore, in addition to whitespace and other punctuation.
Can you post some specific test cases you are concerned with? (You should
always run some test cases.)
-- Jack Krupansky
On Tue, Apr 12, 2016 at 10
Hi Chamarty,
Well, there are a lot of options here.
1) Use LetterTokenizer
2) Use WordDelimeterFilter combined with WhiteSpaceTokenizer
3) Use MappingCharFilter to replace those characters with spaces
.
.
.
Ahmet
On Tuesday, April 12, 2016 3:58 PM, PrasannaKumar Chamarty
wrote:
Hi,
What
On the living github version of LUCENE-5317, I'm trying to migrate to 6.0, and
most is fairly clear.
However, how do I modify the following code to return spans only from documents
that match the -Filter- Query.
For each LeafReaderContext, I used to get a DocIdSet, call the iterator on
that, a
Hi,
What is the best way (in terms of maintenance required with new lucene
releases) to allow splitting of words on "." and "_" for indexing ? Thank
you.
Hi,
What is the best way (in terms of maintenance required with new lucene
releases) to allow splitting of words (into tokens) on "." and "_" for
indexing ?
Please note that I am using lucene through Jackrabbit. Jackrabbit's Search
configuration can be found at http://wiki.apache.org/jackrabbit/S