All clear. 

Thanks David,

> On 24 May 2020, at 18:57, David Smiley <> wrote:
> These strategies are not mutually exclusive.  Yes I do suggest having the
> HTML in whole go into one searchable field to satisfy your highlighting
> use-case.  But I can imagine you will also want some document metadata in
> separate fields.  It's up to you to parse that out somehow and add it.  You
> mentioned you are using bin/post but, IMO, that capability is more for
> quick experimentation / tutorials, some POCs, or very simple use-cases.  I
> doubt you can do what I suggest while still using bin/post.  You might be
> able to use "SolrCell" AKA ExtractingRequestHandler directly, which is what
> bin/post does with HTML.
> Good luck!
> ~ David
>> On Sun, May 24, 2020 at 10:52 AM Serkan KAZANCI <>
>> wrote:
>> Hi David,
>> I have many meta-tags in html documents like  <meta name="tarih"
>> content="2019-10-15T23:59:59Z"> which matches the field descriptions in
>> schema file.
>> As I understand, you propose to index the whole html document as one text
>> file and map it to a search field (do you?) . That would take care of the
>> html highlight issue, however I would lose the field information coming
>> from meta-tags .
>> So is it possible to index the html document as html document ?
>> (preserving the field data coming from meta-tags and not strip the html
>> tags)
>> Then I could use solr.HTMLStripCharFilterFactory for analysis.
>> Thank You,
>> Serkan,
>> -----Original Message-----
>> From: David Smiley []
>> Sent: Sunday, May 24, 2020 5:26 PM
>> To: solr-user
>> Subject: Re: highlighting a whole html document using Unified highlighter
>> Instead of stripping the HTML for the stored value, leave it be and remove
>> it during the analysis stage with solr.HTMLStripCharFilterFactory
>> <
>> This means the searchable text will only be the visible text, basically.
>> And the highlighter will only highlight what's searchable.
>> I suggest doing some experimentation for searching for words that you know
>> are directly adjacent (no spaces) to opening and closing tags to make sure
>> that the inserted HTML markup for the highlight balance correctly.  Use a
>> "phrase query" (quoted) as well, and see if you can highlight around markup
>> like "phrase</p>query" to see what happens.  You might need to set
>> hl.weightMatches=false to ensure the words separately are highlighted.  I
>> suspect you will find there is a problem, and the root cause is here:
>> LUCENE-5734 <>   It's on
>> my long TODO list but hasn't bitten me lately so I've neglected it.
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> On Sun, May 24, 2020 at 7:20 AM Serkan KAZANCI <>
>> wrote:
>>> Thanks Jörn for the answer,
>>> I use post tool to index html documents, so the html tags are stripped
>>> when indexed and stored. The remaining text is mapped to the field
>> content
>>> by default.
>>> hl.fragsize=0 works perfect for the indexed document, but I can only
>>> display highlighted text-only version of html document because the html
>>> tags are stripped.
>>> So is it possible to index and store the html document without stripping
>>> the html tags, so that when the document is displayed with hl.fragsize=0
>>> parameter, it is displayed as original html document?
>>> Or
>>> Is it possible to give a whole html document as a parameter to the
>> Unified
>>> highlighter so that output is also a highlighted html document?
>>> Or
>>> Do you have a better idea to highlight the keywords of the whole html
>>> document?
>>> Thanks,
>>> Serkan
>>> -----Original Message-----
>>> From: Jörn Franke []
>>> Sent: Sunday, May 24, 2020 1:22 PM
>>> To:
>>> Subject: Re: highlighting a whole html document using Unified highlighter
>>> hl.fragsize=0
>>>> Am 24.05.2020 um 11:49 schrieb Serkan KAZANCI <>:
>>>> Hi,
>>>> I use solr to search over a million html documents, when a document is
>>>> searched and displayed, I want to highlight the keywords that are used
>> to
>>>> find and access the document.
>>>> Unified highlighter is fast, accurate and supports different languages
>>> but
>>>> only highlights passages with given parameters.
>>>> How can I highlight a whole html document using Unified highlighter? I
>>> have
>>>> written a php code but it cannot do the complex word stemming
>> functions.
>>>> Thanks,
>>>> Serkan

Reply via email to