there.
very appreciative of your thoughts.
Nick
On Friday, October 17, 2014 4:57:52 PM UTC-7, Nick Tackes wrote:
Hello, I am experimenting with word_delimiter and have an example with a
special character that is indexed. The character is in the type table for
the word delimiter. analysis
table for
the word delimiter. analysis of the tokenization looks good, but when i
attempt to do a match query it doesnt seem to respect tokenization as
expected.
The example indexes 'HER2+ Breast Cancer'. Tokenization is 'her2+',
'breast', 'cancer', which is good. searching for 'HER2
Hello, I am experimenting with word_delimiter and have an example with a
special character that is indexed. The character is in the type table for
the word delimiter. analysis of the tokenization looks good, but when i
attempt to do a match query it doesnt seem to respect tokenization
. pol and the matching document
contains U.S. politics, as follows: em*U*/emem*U.S*/em.
emPol/emitics (the letter U is highlighted twice). I see how word
delimiter creates the same token for different prefixes (U tokens for U
and U.) , but the highlighting seems strange to me because U and U.S
Hi team - I just wanted to share complete config file wherein I am able to
see this problem with word delimiter (unless I got the config wrong). My
config is below and if I analyze the string 650-454-2343, I get the
following tokens:
1. 650-454-2343 [expected since we have preserve_original
hi everyone - I have changed the mapping so that it now looks like below.
However for a given input say 123-456-8989, the generated tokens are:
a) 123-456-8989 b) 123 c) 456 d) 8989 e) 1234568989
I was expecting just two tokens: a) 123-456-8989 b) 1234568989
Would you know what might be going
Hi all,
I would really appreciate if anyone could navigate me, how I should set my
word_delimiter_filter to skip words containing hyphen? The desired result
is that the word with '-' in them, will be ignored by word delimiter filter.
One possible way that I tried to implement was using
Patrick,
If I understand correctly, you just want to preserve the dashes as is and
not word-delimit on them. You can try something like this (I am just
preserving the - symbol: \\u002D):
analysis: {
analyzer: {
wd1: {
tokenizer: whitespace,