> OK - I read a bit more and it > appears an appropriate analysis pipeline (which would > extract text from XML using SAX, say) is all that's > required, and existing highlighting ought to be able to > accomplish what I'm after. So I guess the only > question I have now before writing code is where is the > existing implementation :) - anyone?
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory may remove xml tags too.