Re: XmlUpdateRequestHandler with HTMLStripCharFilterFactory

darniz Mon, 11 Jan 2010 13:44:04 -0800

no problem

Erick Erickson wrote:
> 
> Ah, I read your post too fast and ignored the title. Sorry 'bout that.
> 
> Erick
> 
> On Mon, Jan 11, 2010 at 2:55 PM, darniz <rnizamud...@edmunds.com> wrote:
> 
>>
>> Well thats the whole discussion we are talking about.
>> I had the impression that the html tags are filtered and then the field
>> is
>> stored without tags. But looks like the html tags are removed and terms
>> are
>> indexed purely for indexing, and the actual text is stored in raw format.
>>
>> Lets say for example if i enter a field like
>> <field name="body"><p>honda car road review</field>
>> When i do analysis on the body field the html filter removes the <p> tag
>> and
>> indexed works honda, car, road, review. But when i fetch body field to
>> display in my document it returns <p>honda car road review
>>
>> I hope i make sense.
>> thanks
>> darniz
>>
>>
>>
>> Erick Erickson wrote:
>> >
>> > This page: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
>> > <http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters>shows you
>> > many
>> > of the SOLR analyzers and filters. Would one of
>> > the various *HTMLStrip* stuff work?
>> >
>> > HTH
>> > ERick
>> >
>> > On Mon, Jan 11, 2010 at 2:44 PM, darniz <rnizamud...@edmunds.com>
>> wrote:
>> >
>> >>
>> >> Thanks we were having the saem issue.
>> >> We are trying to store article content and we are strong a field like
>> >> <p>This article is for blah </p>.
>> >> Wheni see the analysis.jsp page it does strip out the <p> tags and is
>> >> indexed. but when we fetch the document it returns the field with the
>> <p>
>> >> tags.
>> >> From solr point of view, its correct but our issue is that this kind
>> of
>> >> html
>> >> tags is screwing up our display of our page. Is there an easy way to
>> >> esure
>> >> how to strip out hte html tags, or do we have to take care of
>> manually.
>> >>
>> >> Thanks
>> >> Rashid
>> >>
>> >>
>> >> aseem cheema wrote:
>> >> >
>> >> > Alright. It turns out that escapedTags is not for what I thought it
>> is
>> >> > for.
>> >> > The problem that I am having with HTMLStripCharFilterFactory is that
>> >> > it strips the html while indexing the field, but not while storing
>> the
>> >> > field. That is why what is see in analysis.jsp, which is index
>> >> > analysis, does not match what gets stored... because.. well HTML is
>> >> > stripped only for indexing. Makes so much sense.
>> >> >
>> >> > Thanks to Ryan McKinley for clarifying this.
>> >> > Aseem
>> >> >
>> >> > On Wed, Nov 11, 2009 at 9:50 AM, aseem cheema
>> <aseemche...@gmail.com>
>> >> > wrote:
>> >> >> I am trying to post a document with the following content using
>> SolrJ:
>> >> >> <center>content</center>
>> >> >> I need the xml/html tags to be ignored. Even though this works fine
>> in
>> >> >> analysis.jsp, this does not work with SolrJ, as the client escapes
>> the
>> >> >> < and > with &lt; and &gt; and HTMLStripCharFilterFactory does not
>> >> >> strip those escaped tags. How can I achieve this? Any ideas will be
>> >> >> highly appreciated.
>> >> >>
>> >> >> There is escapedTags in HTMLStripCharFilterFactory constructor. Is
>> >> >> there a way to get that to work?
>> >> >> Thanks
>> >> >> --
>> >> >> Aseem
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Aseem
>> >> >
>> >> >
>> >>
>> >> --
>> >> View this message in context:
>> >>
>> http://old.nabble.com/XmlUpdateRequestHandler-with-HTMLStripCharFilterFactory-tp26305561p27116434.html
>> >> Sent from the Solr - User mailing list archive at Nabble.com.
>> >>
>> >>
>> >
>> >
>>
>> --
>> View this message in context:
>> http://old.nabble.com/XmlUpdateRequestHandler-with-HTMLStripCharFilterFactory-tp26305561p27116601.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
>


-- 
View this message in context: 
http://old.nabble.com/XmlUpdateRequestHandler-with-HTMLStripCharFilterFactory-tp26305561p27118304.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: XmlUpdateRequestHandler with HTMLStripCharFilterFactory

Reply via email to