Hi David Getting closer now.
First of all, a bit of a mistake on my part. I have two cores set up and I was changing the solrconfig.xml on the wrong core doh!! That's why highlighting wasn't being turned off. I think I've got the unified highlighter working. storeOffsetsWithPositions was already configured on my field type definition, not the field definition, so that was ok. What it boils down to now I think is hl.maxAnalyzedChars. I'm getting highlighting on some records and not others, making it confusing as to where the match is with my dismax parser. I increased my hl.maxAnalyzedChars to 1300000 and now it's highlighting more records. Two questions: 1. Have you any guidelines as to what could be a maximum hl.maxAnalyzedChars without impacting performance or memory? 2. Do you know a way to query the maximum length of text in a field so that I can set hl.maxAnalyzedChars accordingly? Just thinking I can probably modify my java indexer to log the maximum content length. Actually, I probably don't want the maximum but some value that highlights 90-95% records Thanks Shaun On Tue, 12 Jan 2021 at 16:30, David Smiley <dsmi...@apache.org> wrote: > On Tue, Jan 12, 2021 at 9:39 AM Shaun Campbell <campbell.sh...@gmail.com> > wrote: > > > Hi David > > > > First of all I wanted to say I'm working off your book!! Third edition, > > and I think it's a bit out of date now. I was just going to try following > > the section on the Postings highlighter, but I see that's been absorbed > > into the Unified highlighter. I find your book easier to follow than the > > official documentation though. > > > > Thanks :-D. I do maintain the Solr Reference Guide for the parts of code I > touch, including highlighting, so I hope what's there makes sense too. > > > > I am going to try to configure the unified highlighter, and I will add > that > > storeOffsetsWithPositions to the schema (which I saw in your book) and I > > will try indexing again from scratch. Was getting some funny things > going > > on where I thought I'd turned highlighting off and it was still giving me > > highlights. > > > > hl=true/false > > > > Actually just re-reading your email again, are you saying that you can't > > configure highlighting in solrconfig.xml? That's where I always configure > > original highlighting in my dismax search handler. Am I supposed to add > > highlighting to each request? > > > > You can set highlighting and other *parameters* in solrconfig.xml for > request handlers. But the dedicated <highlighting> plugin info is only for > the original and Fast Vector Highlighters. > > ~ David > > > > > > Thanks > > Shaun > > > > On Mon, 11 Jan 2021 at 20:57, David Smiley <dsmi...@apache.org> wrote: > > > > > Hello! > > > > > > I worked on the UnifiedHighlighter a lot and want to help you! > > > > > > On Mon, Jan 11, 2021 at 9:58 AM Shaun Campbell < > campbell.sh...@gmail.com > > > > > > wrote: > > > > > > > I've been using highlighting for a while, using the original > > highlighter, > > > > and just come across a problem with fields that contain a large > amount > > of > > > > text, approx 250k characters. I only have about 2,000 records but > each > > > one > > > > contains a journal publication to search through. > > > > > > > > What I noticed is that some records didn't return a highlight even > > though > > > > they matched on the content. I noticed the hl.maxAnalyzedChars > > parameter > > > > and increased that, but it allowed some records to be highlighted, > but > > > not > > > > all, and then it caused memory problems on the server. Performance > is > > > also > > > > very poor. > > > > > > > > > > I've been thinking hl.maxAnalyzedChars should maybe default to no limit > > -- > > > it's a performance threshold but perhaps better to opt-in to such a > limit > > > then scratch your head for a long time wondering why a search result > > isn't > > > showing highlights. > > > > > > > > > > To try to fix this I've tried to configure the unified highlighter > in > > my > > > > solrconfig.xml instead. It seems to be working but again I'm > missing > > > some > > > > highlighted records. > > > > > > > > > > There is no configuration of that highlighter in solrconfig.xml; it's > > > entirely parameter driven (runtime). > > > > > > > > > > The other thing is I've tried to adjust my unified highlighting > > settings > > > in > > > > solrconfig.xml and they don't seem to be having any effect even > after > > > > restarting Solr. I was just wondering whether there is any > > highlighting > > > > information stored at index time. It's taking over 4hours to index my > > > > records so it's not easy to keep reindexing my content. > > > > > > > > Any ideas on how to handle highlighting of large content would be > > > > appreciated. > > > > > > > > Shaun > > > > > > > > > > Please read the documentation here thoroughly: > > > > > > > > > https://lucene.apache.org/solr/guide/8_6/highlighting.html#the-unified-highlighter > > > (or earlier version as applicable) > > > Since you have large bodies of text to highlight, you would strongly > > > benefit from putting offsets into the search index (and re-index) -- > > > storeOffsetsWithPositions. That's an option on the field/fieldType in > > your > > > schema; it may not be obvious reading the docs. You have to opt-in to > > > that; Solr doesn't normally store any info in the index for > highlighting. > > > > > > ~ David Smiley > > > Apache Lucene/Solr Search Developer > > > http://www.linkedin.com/in/davidwsmiley > > > > > >