Re: word delimiter

2014-10-20 Thread Nick Tackes
there. very appreciative of your thoughts. Nick On Friday, October 17, 2014 4:57:52 PM UTC-7, Nick Tackes wrote: Hello, I am experimenting with word_delimiter and have an example with a special character that is indexed. The character is in the type table for the word delimiter. analysis

Re: word delimiter

2014-10-20 Thread Nick Tackes
table for the word delimiter. analysis of the tokenization looks good, but when i attempt to do a match query it doesnt seem to respect tokenization as expected. The example indexes 'HER2+ Breast Cancer'. Tokenization is 'her2+', 'breast', 'cancer', which is good. searching for 'HER2

word delimiter

2014-10-17 Thread Nick Tackes
Hello, I am experimenting with word_delimiter and have an example with a special character that is indexed. The character is in the type table for the word delimiter. analysis of the tokenization looks good, but when i attempt to do a match query it doesnt seem to respect tokenization

edgeNGram tokenizer with the word delimiter filter

2014-04-26 Thread Hieu Nguyen
. pol and the matching document contains U.S. politics, as follows: em*U*/emem*U.S*/em. emPol/emitics (the letter U is highlighted twice). I see how word delimiter creates the same token for different prefixes (U tokens for U and U.) , but the highlighting seems strange to me because U and U.S

Issue with using word delimiter

2014-04-22 Thread Amit Soni
Hi team - I just wanted to share complete config file wherein I am able to see this problem with word delimiter (unless I got the config wrong). My config is below and if I analyze the string 650-454-2343, I get the following tokens: 1. 650-454-2343 [expected since we have preserve_original

Re: Issue with using word delimiter filter

2014-04-21 Thread Amit Soni
hi everyone - I have changed the mapping so that it now looks like below. However for a given input say 123-456-8989, the generated tokens are: a) 123-456-8989 b) 123 c) 456 d) 8989 e) 1234568989 I was expecting just two tokens: a) 123-456-8989 b) 1234568989 Would you know what might be going

Word delimiter filter - ignore words with hyphen

2014-01-27 Thread Patrick Norwood
Hi all, I would really appreciate if anyone could navigate me, how I should set my word_delimiter_filter to skip words containing hyphen? The desired result is that the word with '-' in them, will be ignored by word delimiter filter. One possible way that I tried to implement was using

Re: Word delimiter filter - ignore words with hyphen

2014-01-27 Thread Binh Ly
Patrick, If I understand correctly, you just want to preserve the dashes as is and not word-delimit on them. You can try something like this (I am just preserving the - symbol: \\u002D): analysis: { analyzer: { wd1: { tokenizer: whitespace,