Re: HTML encode extracted docs - Problems with solr.HTMLStripCharFilter

2010-06-01 Thread Damian Bursztyn
Did anybody find a way to fix this more than removing the HTMLStripCharFilter analyzer during the indexing? Thanks On Sat, Mar 13, 2010 at 7:55 PM, Lance Norskog wrote: > HTMLStripCharFilter is only in the analyzer: it creates searchable > terms from the HTML input. The raw HTML is stored and f

Re: HTML encode extracted docs - Problems with solr.HTMLStripCharFilter

2010-03-13 Thread Lance Norskog
HTMLStripCharFilter is only in the analyzer: it creates searchable terms from the HTML input. The raw HTML is stored and fetched. There are some bugs in term positions and highlighting, An EntityProcessor wrapping the HTMLStripCharFIlter would be really useful. On Tue, Mar 9, 2010 at 5:31 AM, Mar

RE: HTML encode extracted docs - Problems with solr.HTMLStripCharFilter

2010-03-09 Thread Mark Roberts
Sounds like "solr.HTMLStripCharFilter" may work... except, I'm getting a couple of problems: 1) HTML still seems to be getting into my content field All I did was add to the index analyzer for the my "text" fieldType. 2) Some it seems to have broken my highlighting, I get this error: 'org.a