Try setting generateWordParts=1 in your WDF. Also, having a WhitespaceTokenizer makes little sense for URL's, there should be no whitespace in a URL, the StandardTokenizer can tokenize a URL. Anyway, the problem is your WDF. -----Original message----- From: Max Lynch <ihas...@gmail.com> Sent: Thu 23-09-2010 23:00 To: solr-user@lucene.apache.org; Subject: Search a URL
Is there a tokenizer that will allow me to search for parts of a URL? For example, the search "google" would match on the data " http://mail.google.com/dlkjadf" This tokenizer factory doesn't seem to be sufficient: <fieldType name="text_standard" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/> </analyzer> </fieldType> Thanks.