This page: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters <http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters>shows you many of the SOLR analyzers and filters. Would one of the various *HTMLStrip* stuff work?
HTH ERick On Mon, Jan 11, 2010 at 2:44 PM, darniz <rnizamud...@edmunds.com> wrote: > > Thanks we were having the saem issue. > We are trying to store article content and we are strong a field like > <p>This article is for blah </p>. > Wheni see the analysis.jsp page it does strip out the <p> tags and is > indexed. but when we fetch the document it returns the field with the <p> > tags. > From solr point of view, its correct but our issue is that this kind of > html > tags is screwing up our display of our page. Is there an easy way to esure > how to strip out hte html tags, or do we have to take care of manually. > > Thanks > Rashid > > > aseem cheema wrote: > > > > Alright. It turns out that escapedTags is not for what I thought it is > > for. > > The problem that I am having with HTMLStripCharFilterFactory is that > > it strips the html while indexing the field, but not while storing the > > field. That is why what is see in analysis.jsp, which is index > > analysis, does not match what gets stored... because.. well HTML is > > stripped only for indexing. Makes so much sense. > > > > Thanks to Ryan McKinley for clarifying this. > > Aseem > > > > On Wed, Nov 11, 2009 at 9:50 AM, aseem cheema <aseemche...@gmail.com> > > wrote: > >> I am trying to post a document with the following content using SolrJ: > >> <center>content</center> > >> I need the xml/html tags to be ignored. Even though this works fine in > >> analysis.jsp, this does not work with SolrJ, as the client escapes the > >> < and > with < and > and HTMLStripCharFilterFactory does not > >> strip those escaped tags. How can I achieve this? Any ideas will be > >> highly appreciated. > >> > >> There is escapedTags in HTMLStripCharFilterFactory constructor. Is > >> there a way to get that to work? > >> Thanks > >> -- > >> Aseem > >> > > > > > > > > -- > > Aseem > > > > > > -- > View this message in context: > http://old.nabble.com/XmlUpdateRequestHandler-with-HTMLStripCharFilterFactory-tp26305561p27116434.html > Sent from the Solr - User mailing list archive at Nabble.com. > >