Hi David First of all I wanted to say I'm working off your book!! Third edition, and I think it's a bit out of date now. I was just going to try following the section on the Postings highlighter, but I see that's been absorbed into the Unified highlighter. I find your book easier to follow than the official documentation though.
I am going to try to configure the unified highlighter, and I will add that storeOffsetsWithPositions to the schema (which I saw in your book) and I will try indexing again from scratch. Was getting some funny things going on where I thought I'd turned highlighting off and it was still giving me highlights. Actually just re-reading your email again, are you saying that you can't configure highlighting in solrconfig.xml? That's where I always configure original highlighting in my dismax search handler. Am I supposed to add highlighting to each request? Thanks Shaun On Mon, 11 Jan 2021 at 20:57, David Smiley <dsmi...@apache.org> wrote: > Hello! > > I worked on the UnifiedHighlighter a lot and want to help you! > > On Mon, Jan 11, 2021 at 9:58 AM Shaun Campbell <campbell.sh...@gmail.com> > wrote: > > > I've been using highlighting for a while, using the original highlighter, > > and just come across a problem with fields that contain a large amount of > > text, approx 250k characters. I only have about 2,000 records but each > one > > contains a journal publication to search through. > > > > What I noticed is that some records didn't return a highlight even though > > they matched on the content. I noticed the hl.maxAnalyzedChars parameter > > and increased that, but it allowed some records to be highlighted, but > not > > all, and then it caused memory problems on the server. Performance is > also > > very poor. > > > > I've been thinking hl.maxAnalyzedChars should maybe default to no limit -- > it's a performance threshold but perhaps better to opt-in to such a limit > then scratch your head for a long time wondering why a search result isn't > showing highlights. > > > > To try to fix this I've tried to configure the unified highlighter in my > > solrconfig.xml instead. It seems to be working but again I'm missing > some > > highlighted records. > > > > There is no configuration of that highlighter in solrconfig.xml; it's > entirely parameter driven (runtime). > > > > The other thing is I've tried to adjust my unified highlighting settings > in > > solrconfig.xml and they don't seem to be having any effect even after > > restarting Solr. I was just wondering whether there is any highlighting > > information stored at index time. It's taking over 4hours to index my > > records so it's not easy to keep reindexing my content. > > > > Any ideas on how to handle highlighting of large content would be > > appreciated. > > > > Shaun > > > > Please read the documentation here thoroughly: > > https://lucene.apache.org/solr/guide/8_6/highlighting.html#the-unified-highlighter > (or earlier version as applicable) > Since you have large bodies of text to highlight, you would strongly > benefit from putting offsets into the search index (and re-index) -- > storeOffsetsWithPositions. That's an option on the field/fieldType in your > schema; it may not be obvious reading the docs. You have to opt-in to > that; Solr doesn't normally store any info in the index for highlighting. > > ~ David Smiley > Apache Lucene/Solr Search Developer > http://www.linkedin.com/in/davidwsmiley >