Hi Ahmet!! I went ahead and did something I thought it was not a clean solution and then when I read your post and I found we thought of the same solution, including the European_Parliament with the _ :)
So I guess there would be no way to do this more cleanly, maybe only implementing my own Tokenizer and Filters, but I honestly couldn't find a tutorial for implement a customized solr Tokenizer. If I end up needing to do it I will write a tutorial. So for now I'm doing PatternReplaceCharFilterFactory to replace "European Parliament" with <MD5Hash>European_Parliament (initially I didnt use the md5hash European_Parliament). Then I replace it back after the StandardTokenizerFactory ran, into "European Parliament". Well I guess I just found a way to do a 2 words token :) I had seen the ShingleFilterFactory but the problem is I don't need the whole phrase in tokens of 2 words and I understood it's what it does. Of course I would need some filter that would handle a .txt with the tokens to merge, like "European" and "Parliament". I'm still having some other problem now but maybe I find a solution after I read the page you annexed which seems great. Solr is considering #European as #European and European, meaning it does 2 facets for one token. I want it to consider it only as #European. I ran the analyzer debugger in my Solr admin console and I don't see how he can be doing that. Would you know of a reason for this? Thanks for your reply and that page you annexed seems excelent and I'll read it through. -- View this message in context: http://lucene.472066.n3.nabble.com/Facets-termvectors-relevancy-and-Multi-word-tokenizing-tp4120101p4120361.html Sent from the Solr - User mailing list archive at Nabble.com.