Hi David

Getting closer now.

First of all, a bit of a mistake on my part. I have two cores set up and I
was changing the solrconfig.xml on the wrong core doh!!  That's why
highlighting wasn't being turned off.

I think I've got the unified highlighter working.
storeOffsetsWithPositions was already configured on my field type
definition, not the field definition, so that was ok.

What it boils down to now I think is hl.maxAnalyzedChars. I'm getting
highlighting on some records and not others, making it confusing as to
where the match is with my dismax parser.  I increased
my hl.maxAnalyzedChars to 1300000 and now it's highlighting more records.
Two questions:

1. Have you any guidelines as to what could be a
maximum hl.maxAnalyzedChars without impacting performance or memory?

2. Do you know a way to query the maximum length of text in a field so that
I can set hl.maxAnalyzedChars accordingly?  Just thinking I can probably
modify my java indexer to log the maximum content length.  Actually, I
probably don't want the maximum but some value that highlights 90-95%
records

Thanks
Shaun

On Tue, 12 Jan 2021 at 16:30, David Smiley <dsmi...@apache.org> wrote:

> On Tue, Jan 12, 2021 at 9:39 AM Shaun Campbell <campbell.sh...@gmail.com>
> wrote:
>
> > Hi David
> >
> > First of all I wanted to say I'm working off your book!!  Third edition,
> > and I think it's a bit out of date now. I was just going to try following
> > the section on the Postings highlighter, but I see that's been absorbed
> > into the Unified highlighter. I find your book easier to follow than the
> > official documentation though.
> >
>
> Thanks :-D.  I do maintain the Solr Reference Guide for the parts of code I
> touch, including highlighting, so I hope what's there makes sense too.
>
>
> > I am going to try to configure the unified highlighter, and I will add
> that
> > storeOffsetsWithPositions to the schema (which I saw in your book) and I
> > will try indexing again from scratch.  Was getting some funny things
> going
> > on where I thought I'd turned highlighting off and it was still giving me
> > highlights.
> >
>
> hl=true/false
>
>
> > Actually just re-reading your email again, are you saying that you can't
> > configure highlighting in solrconfig.xml? That's where I always configure
> > original highlighting in my dismax search handler. Am I supposed to add
> > highlighting to each request?
> >
>
> You can set highlighting and other *parameters* in solrconfig.xml for
> request handlers.  But the dedicated <highlighting> plugin info is only for
> the original and Fast Vector Highlighters.
>
> ~ David
>
>
> >
> > Thanks
> > Shaun
> >
> > On Mon, 11 Jan 2021 at 20:57, David Smiley <dsmi...@apache.org> wrote:
> >
> > > Hello!
> > >
> > > I worked on the UnifiedHighlighter a lot and want to help you!
> > >
> > > On Mon, Jan 11, 2021 at 9:58 AM Shaun Campbell <
> campbell.sh...@gmail.com
> > >
> > > wrote:
> > >
> > > > I've been using highlighting for a while, using the original
> > highlighter,
> > > > and just come across a problem with fields that contain a large
> amount
> > of
> > > > text, approx 250k characters. I only have about 2,000 records but
> each
> > > one
> > > > contains a journal publication to search through.
> > > >
> > > > What I noticed is that some records didn't return a highlight even
> > though
> > > > they matched on the content. I noticed the hl.maxAnalyzedChars
> > parameter
> > > > and increased that, but  it allowed some records to be highlighted,
> but
> > > not
> > > > all, and then it caused memory problems on the server.  Performance
> is
> > > also
> > > > very poor.
> > > >
> > >
> > > I've been thinking hl.maxAnalyzedChars should maybe default to no limit
> > --
> > > it's a performance threshold but perhaps better to opt-in to such a
> limit
> > > then scratch your head for a long time wondering why a search result
> > isn't
> > > showing highlights.
> > >
> > >
> > > > To try to fix this I've tried  to configure the unified highlighter
> in
> > my
> > > > solrconfig.xml instead.   It seems to be working but again I'm
> missing
> > > some
> > > > highlighted records.
> > > >
> > >
> > > There is no configuration of that highlighter in solrconfig.xml; it's
> > > entirely parameter driven (runtime).
> > >
> > >
> > > > The other thing is I've tried to adjust my unified highlighting
> > settings
> > > in
> > > > solrconfig.xml and they don't  seem to be having any effect even
> after
> > > > restarting Solr.  I was just wondering whether there is any
> > highlighting
> > > > information stored at index time. It's taking over 4hours to index my
> > > > records so it's not easy to keep reindexing my content.
> > > >
> > > > Any ideas on how to handle highlighting of large content  would be
> > > > appreciated.
> > > >
> > > > Shaun
> > > >
> > >
> > > Please read the documentation here thoroughly:
> > >
> > >
> >
> https://lucene.apache.org/solr/guide/8_6/highlighting.html#the-unified-highlighter
> > > (or earlier version as applicable)
> > > Since you have large bodies of text to highlight, you would strongly
> > > benefit from putting offsets into the search index (and re-index) --
> > > storeOffsetsWithPositions.  That's an option on the field/fieldType in
> > your
> > > schema; it may not be obvious reading the docs.  You have to opt-in to
> > > that; Solr doesn't normally store any info in the index for
> highlighting.
> > >
> > > ~ David Smiley
> > > Apache Lucene/Solr Search Developer
> > > http://www.linkedin.com/in/davidwsmiley
> > >
> >
>

Reply via email to