no problem Erick Erickson wrote: > > Ah, I read your post too fast and ignored the title. Sorry 'bout that. > > Erick > > On Mon, Jan 11, 2010 at 2:55 PM, darniz <rnizamud...@edmunds.com> wrote: > >> >> Well thats the whole discussion we are talking about. >> I had the impression that the html tags are filtered and then the field >> is >> stored without tags. But looks like the html tags are removed and terms >> are >> indexed purely for indexing, and the actual text is stored in raw format. >> >> Lets say for example if i enter a field like >> <field name="body"><p>honda car road review</field> >> When i do analysis on the body field the html filter removes the <p> tag >> and >> indexed works honda, car, road, review. But when i fetch body field to >> display in my document it returns <p>honda car road review >> >> I hope i make sense. >> thanks >> darniz >> >> >> >> Erick Erickson wrote: >> > >> > This page: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters >> > <http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters>shows you >> > many >> > of the SOLR analyzers and filters. Would one of >> > the various *HTMLStrip* stuff work? >> > >> > HTH >> > ERick >> > >> > On Mon, Jan 11, 2010 at 2:44 PM, darniz <rnizamud...@edmunds.com> >> wrote: >> > >> >> >> >> Thanks we were having the saem issue. >> >> We are trying to store article content and we are strong a field like >> >> <p>This article is for blah </p>. >> >> Wheni see the analysis.jsp page it does strip out the <p> tags and is >> >> indexed. but when we fetch the document it returns the field with the >> <p> >> >> tags. >> >> From solr point of view, its correct but our issue is that this kind >> of >> >> html >> >> tags is screwing up our display of our page. Is there an easy way to >> >> esure >> >> how to strip out hte html tags, or do we have to take care of >> manually. >> >> >> >> Thanks >> >> Rashid >> >> >> >> >> >> aseem cheema wrote: >> >> > >> >> > Alright. It turns out that escapedTags is not for what I thought it >> is >> >> > for. >> >> > The problem that I am having with HTMLStripCharFilterFactory is that >> >> > it strips the html while indexing the field, but not while storing >> the >> >> > field. That is why what is see in analysis.jsp, which is index >> >> > analysis, does not match what gets stored... because.. well HTML is >> >> > stripped only for indexing. Makes so much sense. >> >> > >> >> > Thanks to Ryan McKinley for clarifying this. >> >> > Aseem >> >> > >> >> > On Wed, Nov 11, 2009 at 9:50 AM, aseem cheema >> <aseemche...@gmail.com> >> >> > wrote: >> >> >> I am trying to post a document with the following content using >> SolrJ: >> >> >> <center>content</center> >> >> >> I need the xml/html tags to be ignored. Even though this works fine >> in >> >> >> analysis.jsp, this does not work with SolrJ, as the client escapes >> the >> >> >> < and > with < and > and HTMLStripCharFilterFactory does not >> >> >> strip those escaped tags. How can I achieve this? Any ideas will be >> >> >> highly appreciated. >> >> >> >> >> >> There is escapedTags in HTMLStripCharFilterFactory constructor. Is >> >> >> there a way to get that to work? >> >> >> Thanks >> >> >> -- >> >> >> Aseem >> >> >> >> >> > >> >> > >> >> > >> >> > -- >> >> > Aseem >> >> > >> >> > >> >> >> >> -- >> >> View this message in context: >> >> >> http://old.nabble.com/XmlUpdateRequestHandler-with-HTMLStripCharFilterFactory-tp26305561p27116434.html >> >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> >> >> >> > >> > >> >> -- >> View this message in context: >> http://old.nabble.com/XmlUpdateRequestHandler-with-HTMLStripCharFilterFactory-tp26305561p27116601.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > >
-- View this message in context: http://old.nabble.com/XmlUpdateRequestHandler-with-HTMLStripCharFilterFactory-tp26305561p27118304.html Sent from the Solr - User mailing list archive at Nabble.com.