Is it possible to return the HTML field highlighted?

On Fri, May 4, 2012 at 1:27 PM, Jack Krupansky <j...@basetechnology.com>wrote:

> 1. The raw html field (call it, "text_html") would be a "string" type
> field that is "stored" but not "indexed". This is the field you direct DIH
> to output to. This is the field you would return in your search results
> with the HTML to be displayed.
>
> 2. The stripped field (call it, "text_stripped") would be a "text" type
> field (where "text" is a field type you add that uses the HTML strip char
> filter as shown below) that is not "stored" but is "indexed. Add a
> CopyField to your schema that copies from the raw html field to the
> stripped field (say, "text_html" to "text_stripped".)
>
> For reference on HTML strip (HTMLStripCharFilterFactory), see:
> http://wiki.apache.org/solr/**AnalyzersTokenizersTokenFilter**s<http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters>
>
> Which has:
>
> <fieldtype name="text" class="solr.TextField">
>  <analyzer>
>   <charFilter class="solr.**HTMLStripCharFilterFactory"/>
>   <charFilter class="solr.**MappingCharFilterFactory" mapping="mapping-**
> ISOLatin1Accent.txt"/>
>   <tokenizer class="solr.**StandardTokenizerFactory"/>
>   <filter class="solr.**LowerCaseFilterFactory"/>
>   <filter class="solr.StopFilterFactory"**/>
>   <filter class="solr.**PorterStemFilterFactory"/>
>  </analyzer>
> </fieldtype>
>
> Although, you might want to call that field type "text_stripped" to avoid
> confusion with a simple text field
>
> You can add HTMLStripCharFilterFactory to some other field type that you
> might want to use, but this "charFilter" needs to be before the
> "tokenizer". The "text" field type above is just an example.
>
> -- Jack Krupansky
>
> -----Original Message----- From: okayndc
> Sent: Friday, May 04, 2012 1:01 PM
> To: solr-user@lucene.apache.org
> Subject: Re: how to present html content in browse
>
>
> Hello,
>
> I'm having a hard time understanding this, and I had this same question.
>
> When using DIH should the HTML field be stored in the raw HTML string field
> or the stripped field?
> Also what source field(s) need to be copied and to what destination?
>
> Thanks
>
>
> On Thu, May 3, 2012 at 10:15 PM, Lance Norskog <goks...@gmail.com> wrote:
>
>  Make two fields, one with stores the stripped HTML and another that
>> stores the parsed HTML. You can use <copyField> so that you do not
>> have to submit the html page twice.
>>
>> You would mark the stripped field 'indexed=true stored=false' and the
>> full text field the other way around. The full text field should be a
>> String type.
>>
>> On Thu, May 3, 2012 at 1:04 PM, srini <softtec...@gmail.com> wrote:
>> > I am indexing records from database using DIH. The content of my record
>> is in
>> > html format. When I use browse
>> > I would like to show the content in html format, not in text format. Any
>> > ideas?
>> >
>> > --
>> > View this message in context:
>> http://lucene.472066.n3.**nabble.com/how-to-present-**
>> html-content-in-browse-**tp3960327.html<http://lucene.472066.n3.nabble.com/how-to-present-html-content-in-browse-tp3960327.html>
>> > Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>>
>>
>

Reply via email to