Hi Walter, Sorry for the slow response… and to be honest, I don’t have a good answer for this behaviour. I’m really not sure what’s going on.
I did look over the available settings for excerpts: http://sphinxsearch.com/docs/current.html#api-func-buildexcerpts <http://sphinxsearch.com/docs/current.html#api-func-buildexcerpts> … and anything that I feel would influence what you’re seeing (e.g. exact_phrase) defaults to what would be ideal in your site anyway. I’m not sure if upgrading Sphinx would have any impact, but it may be worthwhile - at least to 2.2.11. That said, there’s nothing in the release notes for 2.2.10/11 that I can spot that suggests any change in behaviour. If you really wanted to dig into it, I’d suggest building a test app that can reproduce the problem with a smaller dataset, and potentially share that here so I can have a look as well. Of course, it very much sounds like a Sphinx issue rather than anything to do with Thinking Sphinx, so whether I can actually fix things is not super likely. Wish I could be more helpful! — Pat > On 9 Jul 2019, at 8:46 am, Walter Lee Davis <wa...@wdstudio.com> wrote: > > I wonder if anyone knows how Sphinx goes about constructing the snippets that > are returned along with the matches to a search term. This page illustrates a > wild variety of examples of how one search term can be interpreted: > > https://oll.libertyfund.org/search/results?q=power+corrupts > > Note the first hit, from Alvis on Shakespeare. The exact phrase exists in the > third line of the snippet (on a desktop screen, YMMV). It is not highlighted. > In the third example, the result is from deep in the weeds of the footnotes, > and hits on the word power, and actually highlights it. The fifth hit gets > both power (twice) and corrupts, but misses the stem of corrupts in corrupt. > The second-to-the-last hit on that page, in Liberty, Order, and Justice, goes > on for several screens (208,135 words), with a single snippet that has grown > to encompass 725 individual keyword hits in one "paragraph" of source text. > > I'm using Thinking Sphinx 3.1.2, and Sphinx is version 2.2.9 > > Here's the controller method that constructed this page: > > @results = ThinkingSphinx.search > "\"#{ThinkingSphinx::Query.escape(params[:q].to_s)}\"", > :page => params[:page], > :star => true, > :excerpts => { > :limit => 1000, > :around => 40, > :force_all_words => true, > :chunk_separator => '</li><li>' > }.reject{ |r| r.class.to_s == 'NilClass' } rescue Kaminari::paginate_array > [] > @results.context[:panes] << ThinkingSphinx::Panes::ExcerptsPane > @hits = @results.total_entries rescue 0 > > And these results are from mostly titles, but some pages. Here's the > definition for both: > > # titles_index.rb > ThinkingSphinx::Index.define :title, :with => :active_record do > set_property :group_concat_max_len => 10.megabytes > > indexes :title, :sortable => true > indexes teaser > indexes content.plain, :as => :plain_text > indexes author_name, :sortable => true > has roles(:person_id), :as => :people_ids > has :id, :as => :title_id > has author_id, created_at, updated_at > has set, :as => :title_set > where sanitize_sql(["publish", true]) > end > > #pages_index.rb > ThinkingSphinx::Index.define :page, :with => :active_record do > > indexes :title, :sortable => true > indexes teaser > indexes body > has created_at, updated_at > end > > In the view, I'm using this tortured bit of ERB: > > <%= content_tag( :ol, > "<li>#{result.excerpts.plain_contents}</li>".gsub(/<li>\s*<\/li>/,'').html_safe > ) if result.respond_to?(:plain_contents) %> > > And there's no way to explain why some results are wrapped in the <span > class="match"> in the output from Sphinx, while others (nearby, in the same > set of results) are not. > > Thanks in advance if anyone can enlighten me or point me toward documentation > of this feature. This is all very old code, maybe 6 or 8 years since I last > touched it. I've moved it to a newer server since I wrote all this, but > nothing much changed when I did that. My client would like to know, and I > don't have any good answers. > > Walter > > -- > You received this message because you are subscribed to the Google Groups > "Thinking Sphinx" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to thinking-sphinx+unsubscr...@googlegroups.com. > To post to this group, send email to thinking-sphinx@googlegroups.com. > Visit this group at https://groups.google.com/group/thinking-sphinx. > To view this discussion on the web visit > https://groups.google.com/d/msgid/thinking-sphinx/AAECECFD-619C-49AC-B4E7-63A6C87C2595%40wdstudio.com. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To unsubscribe from this group and stop receiving emails from it, send an email to thinking-sphinx+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/thinking-sphinx/B3882117-5A14-4307-B5C6-7388A670C375%40freelancing-gods.com.