[ts] Search results and matched term highlighting

Walter Lee Davis Mon, 08 Jul 2019 15:47:00 -0700

I wonder if anyone knows how Sphinx goes about constructing the snippets that 
are returned along with the matches to a search term. This page illustrates a 
wild variety of examples of how one search term can be interpreted:


https://oll.libertyfund.org/search/results?q=power+corrupts

Note the first hit, from Alvis on Shakespeare. The exact phrase exists in the 
third line of the snippet (on a desktop screen, YMMV). It is not highlighted. 
In the third example, the result is from deep in the weeds of the footnotes, 
and hits on the word power, and actually highlights it. The fifth hit gets both 
power (twice) and corrupts, but misses the stem of corrupts in corrupt. The 
second-to-the-last hit on that page, in Liberty, Order, and Justice, goes on 
for several screens (208,135 words), with a single snippet that has grown to 
encompass 725 individual keyword hits in one "paragraph" of source text.

I'm using Thinking Sphinx 3.1.2, and Sphinx is version 2.2.9

Here's the controller method that constructed this page:

    @results = ThinkingSphinx.search 
"\"#{ThinkingSphinx::Query.escape(params[:q].to_s)}\"",
    :page => params[:page],
    :star => true,
    :excerpts => {
      :limit    => 1000,
      :around     => 40,
      :force_all_words => true,
      :chunk_separator => '</li><li>'
    }.reject{ |r| r.class.to_s == 'NilClass' } rescue Kaminari::paginate_array 
[]
    @results.context[:panes] << ThinkingSphinx::Panes::ExcerptsPane
    @hits = @results.total_entries rescue 0

And these results are from mostly titles, but some pages. Here's the definition 
for both:

# titles_index.rb
ThinkingSphinx::Index.define :title, :with => :active_record do
  set_property :group_concat_max_len => 10.megabytes

  indexes :title, :sortable => true
  indexes teaser
  indexes content.plain, :as => :plain_text
  indexes author_name, :sortable => true
  has roles(:person_id), :as => :people_ids 
  has :id, :as => :title_id
  has author_id, created_at, updated_at
  has set, :as => :title_set
  where sanitize_sql(["publish", true])
end

#pages_index.rb
ThinkingSphinx::Index.define :page, :with => :active_record do

  indexes :title, :sortable => true
  indexes teaser
  indexes body
  has created_at, updated_at
end

In the view, I'm using this tortured bit of ERB:

        <%= content_tag( :ol, 
"<li>#{result.excerpts.plain_contents}</li>".gsub(/<li>\s*<\/li>/,'').html_safe 
) if result.respond_to?(:plain_contents) %>

And there's no way to explain why some results are wrapped in the <span 
class="match"> in the output from Sphinx, while others (nearby, in the same set 
of results) are not.

Thanks in advance if anyone can enlighten me or point me toward documentation 
of this feature. This is all very old code, maybe 6 or 8 years since I last 
touched it. I've moved it to a newer server since I wrote all this, but nothing 
much changed when I did that. My client would like to know, and I don't have 
any good answers.

Walter

-- 
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to thinking-sphinx+unsubscr...@googlegroups.com.
To post to this group, send email to thinking-sphinx@googlegroups.com.
Visit this group at https://groups.google.com/group/thinking-sphinx.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/thinking-sphinx/AAECECFD-619C-49AC-B4E7-63A6C87C2595%40wdstudio.com.
For more options, visit https://groups.google.com/d/optout.

[ts] Search results and matched term highlighting

Reply via email to