Is it possible to return the HTML field highlighted? On Fri, May 4, 2012 at 1:27 PM, Jack Krupansky <j...@basetechnology.com>wrote:
> 1. The raw html field (call it, "text_html") would be a "string" type > field that is "stored" but not "indexed". This is the field you direct DIH > to output to. This is the field you would return in your search results > with the HTML to be displayed. > > 2. The stripped field (call it, "text_stripped") would be a "text" type > field (where "text" is a field type you add that uses the HTML strip char > filter as shown below) that is not "stored" but is "indexed. Add a > CopyField to your schema that copies from the raw html field to the > stripped field (say, "text_html" to "text_stripped".) > > For reference on HTML strip (HTMLStripCharFilterFactory), see: > http://wiki.apache.org/solr/**AnalyzersTokenizersTokenFilter**s<http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters> > > Which has: > > <fieldtype name="text" class="solr.TextField"> > <analyzer> > <charFilter class="solr.**HTMLStripCharFilterFactory"/> > <charFilter class="solr.**MappingCharFilterFactory" mapping="mapping-** > ISOLatin1Accent.txt"/> > <tokenizer class="solr.**StandardTokenizerFactory"/> > <filter class="solr.**LowerCaseFilterFactory"/> > <filter class="solr.StopFilterFactory"**/> > <filter class="solr.**PorterStemFilterFactory"/> > </analyzer> > </fieldtype> > > Although, you might want to call that field type "text_stripped" to avoid > confusion with a simple text field > > You can add HTMLStripCharFilterFactory to some other field type that you > might want to use, but this "charFilter" needs to be before the > "tokenizer". The "text" field type above is just an example. > > -- Jack Krupansky > > -----Original Message----- From: okayndc > Sent: Friday, May 04, 2012 1:01 PM > To: solr-user@lucene.apache.org > Subject: Re: how to present html content in browse > > > Hello, > > I'm having a hard time understanding this, and I had this same question. > > When using DIH should the HTML field be stored in the raw HTML string field > or the stripped field? > Also what source field(s) need to be copied and to what destination? > > Thanks > > > On Thu, May 3, 2012 at 10:15 PM, Lance Norskog <goks...@gmail.com> wrote: > > Make two fields, one with stores the stripped HTML and another that >> stores the parsed HTML. You can use <copyField> so that you do not >> have to submit the html page twice. >> >> You would mark the stripped field 'indexed=true stored=false' and the >> full text field the other way around. The full text field should be a >> String type. >> >> On Thu, May 3, 2012 at 1:04 PM, srini <softtec...@gmail.com> wrote: >> > I am indexing records from database using DIH. The content of my record >> is in >> > html format. When I use browse >> > I would like to show the content in html format, not in text format. Any >> > ideas? >> > >> > -- >> > View this message in context: >> http://lucene.472066.n3.**nabble.com/how-to-present-** >> html-content-in-browse-**tp3960327.html<http://lucene.472066.n3.nabble.com/how-to-present-html-content-in-browse-tp3960327.html> >> > Sent from the Solr - User mailing list archive at Nabble.com. >> >> >> >> -- >> Lance Norskog >> goks...@gmail.com >> >> >