On 2/20/2015 4:24 PM, Rishi Easwaran wrote: > Also, the tokenizer we use is very similar to the following. > ftp://zimbra.imladris.sk/src/HELIX-720.fbsd/ZimbraServer/src/java/com/zimbra/cs/index/analysis/UniversalTokenizer.java > ftp://zimbra.imladris.sk/src/HELIX-720.fbsd/ZimbraServer/src/java/com/zimbra/cs/index/analysis/UniversalLexer.jflex > > > From the looks of it the text is being indexed as a single token and not > broken across whitespace.
I can't claim to know how analyzer code works. I did manage to see the code, but it doesn't mean much to me. I would suggest using the analysis tab in the Solr admin interface. On that page, select the field or fieldType, set the "verbose" flag and type the actual field contents into the "index" side of the page. When you click the Analyze Values button, it will show you what Solr does with the input at index time. Do you still have access to any machines (dev or otherwise) running the old version with the custom component? If so, do the same things on the analysis page for that version that you did on the new version, and see whether it does something different. If it does do something different, then you will need to track down the problem in the code for your custom analyzer. Thanks, Shawn