This page: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
<http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters>shows you many
of the SOLR analyzers and filters. Would one of
the various *HTMLStrip* stuff work?

HTH
ERick

On Mon, Jan 11, 2010 at 2:44 PM, darniz <rnizamud...@edmunds.com> wrote:

>
> Thanks we were having the saem issue.
> We are trying to store article content and we are strong a field like
> <p>This article is for blah </p>.
> Wheni see the analysis.jsp page it does strip out the <p> tags and is
> indexed. but when we fetch the document it returns the field with the <p>
> tags.
> From solr point of view, its correct but our issue is that this kind of
> html
> tags is screwing up our display of our page. Is there an easy way to esure
> how to strip out hte html tags, or do we have to take care of manually.
>
> Thanks
> Rashid
>
>
> aseem cheema wrote:
> >
> > Alright. It turns out that escapedTags is not for what I thought it is
> > for.
> > The problem that I am having with HTMLStripCharFilterFactory is that
> > it strips the html while indexing the field, but not while storing the
> > field. That is why what is see in analysis.jsp, which is index
> > analysis, does not match what gets stored... because.. well HTML is
> > stripped only for indexing. Makes so much sense.
> >
> > Thanks to Ryan McKinley for clarifying this.
> > Aseem
> >
> > On Wed, Nov 11, 2009 at 9:50 AM, aseem cheema <aseemche...@gmail.com>
> > wrote:
> >> I am trying to post a document with the following content using SolrJ:
> >> <center>content</center>
> >> I need the xml/html tags to be ignored. Even though this works fine in
> >> analysis.jsp, this does not work with SolrJ, as the client escapes the
> >> < and > with &lt; and &gt; and HTMLStripCharFilterFactory does not
> >> strip those escaped tags. How can I achieve this? Any ideas will be
> >> highly appreciated.
> >>
> >> There is escapedTags in HTMLStripCharFilterFactory constructor. Is
> >> there a way to get that to work?
> >> Thanks
> >> --
> >> Aseem
> >>
> >
> >
> >
> > --
> > Aseem
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/XmlUpdateRequestHandler-with-HTMLStripCharFilterFactory-tp26305561p27116434.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Reply via email to