Hello, > > but honestly i haven't relaly tried anything like this ... > the code for > parsing the synonyms.txt file probaly splits the individual > synonyms on > whitespace to prodce multiple tokens which might screw you up > ... you may > need to get creative (perhaps use a PatternReplaceFilter to > encode your > spaces as "_" before hte SynonymFilter and then another one > to convert the > "_" back to " " after the Synonym filter ... kludgy but it might work)
I had to build exactly this recently, but without solr and only lucene. I chose to create a CompressFilter as the last filter, to reduce all tokens into one single token (since it were facet fields i do know there where only a couple of tokens, and not thousands, because then compressing them in a single token might be a problem (not sure)) So for building synonyms on facet fields which can contain multiple tokens, I would add your own SynonymAnalyzer, that compresses tokens and when a compressed token is found in a synonym map, replace the token with the synonym. So, in your SynonymAnalyzer something like private Map synonyms; // initialize it public TokenStream tokenStream(String fieldName, Reader reader) { TokenStream result = super.tokenStream(fieldName, reader); if(fieldName.equals("synonym_field")){ result = new CompressFilter(result,synonyms); } else if(fieldName.equals("compressed_field")){ result = new CompressFilter(result); } return result; } and your CompressFilter public CompressFilter(TokenStream in, Map synonyms) { super(in); this.synonyms = synonyms; } public CompressFilter(TokenStream in) { super(in); } public Token next() throws IOException { Token t = input.next(); if(t==null){ return null; } StringBuffer sb = new StringBuffer(); while(t!=null){ sb.append(t.termText()); t = input.next(); } if(synonyms!=null){ if(synonyms.containsKey(sb.toString())){ sb = new StringBuffer( (String)synonyms.get(sb.toString()) ); } else{ return null; // synonym not found } } return new Token(sb.toString(), 0, sb.toString().length()); } I am not sure though how easy it is to put this in solr, but i suppose it isn't hard. Obviously, I am not sure what happens with the CompressFilter when there are *many* tokens in the "synonym_field" field. Regards Ard > > : Now I want create a link for each of these value so that > the user can filter > : the results by that title by clicking on the link. For > example, if I click > : on "Software Engineer", the results are now narrowed down > to just include > : records with "Software Engineer" in their title. Since > "title" field can > : contain special chars like '+', '&' ..., I really can't > find a clean way to > : do this. At the moment, I replace all the space by '+' and > it seems to work > : for words like "Software engineer" (converted to > "Software+Engineer"). > : However, "C++ Programmer" is converted to "C+++Programmer", > and it doesn't > : seem to work (return no results). Any ideas? > > for starters you need to URL encode *all* of hte characters, > not just the > spaces ... space escapes to "+" but only becuase "+" escapes to %2B. > > second, if you are dealing with multi-word values like this in your > facets, you need to make sure to quote them when doing fq queries to > (before url encoding) ... so if you have a facet.field > "skills" that lists > "C++ Programmer" as the value, the fq query you want to use > would be... > skills:"C++ Programmer" > > when you URL encode that it should become... > > fq=skills%3A%22C%2B%2B+Programmer%22 > > ...use teh echoParams=explicit&debugQuery=true params to see > exactly what > your params look like when they've been URL decoded and what your > query objects look like once they've been parsed. > > > > -Hoss > >