Hi all, I was just thinking about Phrase Queries and punctuation ( and in general how to manage increment positions when such a sentence delimiter happens).
At the moment for multi valued fields we have the "increment position gap" which allow to avoid phrase queries to span different values for the same field. In a single valued textual fields, we may have hundreds of different sentences ( separated by punctuation). Generally we don't want phrase queries to span different sentences so I would expect a similar position increment behaviour. A possible solution could be to have a tokenizer which is able to split sentences ( a lot of approaches in NLP are already there to be used) and add an incrementPositionGap between sentences as well ( < multi value increment position gap). A very naive solution would be to add the position increment whenever we find a punctuation delimiter ( such in the standard tokenizer happens for stop words. I have not analysed the implementations in details yet, At this stage I was just wondering if anyone has faced this problem with Lucene and Solr ? Which kind of side effects could happen if we add the increment position gap on a punctuation delimiter basis, by default on the Standard Tokenizer ? Cheers ----- --------------- Alessandro Benedetti Search Consultant, R&D Software Engineer, Director Sease Ltd. - www.sease.io -- View this message in context: http://lucene.472066.n3.nabble.com/Phrase-Queries-and-Punctuation-tp4318290.html Sent from the Solr - User mailing list archive at Nabble.com.