Hi Steve

I've got a way forward to greatly improve the speed of indexing… unfortunately, 
it's not going to work within Thinking Sphinx easily right now.

Sphinx has the ability to gather attribute and field values from separate 
queries - this existed for TS v1/v2 for attributes, and fields was added in TS 
v3, but the catch is those separate queries don't work for HABTM joins. I'd 
love to change that, it's just painful from an ActiveRecord perspective because 
you're not dealing with a model's table as the base, but the HABTM join table.

Here's the configuration for the relevant source that I modified by hand:
https://gist.github.com/pat/6080031

You'll see that the main query is nice and short - and then there's each of the 
MVA and joined field definitions. If you put this in the generated source 
definition in config/development.sphinx.conf, and then run the indexer manually 
(NOT through the rake task, that'll overwrite this):
  indexer --config config/development.sphinx.conf --all --rotate

(Remove --rotate if Sphinx isn't running.) You'll see it's pretty damn fast.

Now, ways forward? Well, I'd love to write something for TS v3 that can handle 
HABTM - it's just a shame that it might need to be pure ARel rather than 
ActiveRecord-built (which can otherwise help with joins).

But otherwise: switch from HABTM to has_many/has_many :through - make each of 
the joins an actual model. Then, you can add :source => :query to each of the 
appropriate field and attribute definitions, and it should generate something 
pretty much the same.

Hope this provides some clarity at the very least! And also: thanks for the 
test app, really helped with debugging!

-- 
Pat


On 25/07/2013, at 2:54 PM, Steve Kenworthy wrote:

> Hi there,
> 
> Firstly, thinking-sphinx is awesome and I love it. Thanks Pat for an 
> excellent project. V3 is looking great and represents a lot of hard work and 
> effort.
> 
> I've been using thinking-sphinx to index a document model and it's really 
> slowed down when I add lots of associations in the index. In fact, it never 
> finishes on my machine (8Gig RAM, 8 CPU's) when I add 4 indexes.
> 
> Times:
>       • 4 seconds - when 1 association (images) is indexed
>       • 6 seconds - when 2 associations (images and subscribers) are indexed
>       • 23 seconds - when 2 associations (images and countries) are indexed
>       • 115 seconds - when 3 associations (images, subscribers and tags) are 
> indexed
>       • 113 seconds - when 3 associations (images, subscribers and videos) 
> are indexed (just to prove it's not tags slowing it down)
>       • ꝏ (not finishing) - when 4 associations or more are selected.
> 
> Here's my index file:
> 
> ThinkingSphinx::Index.define :document, with: :active_record, delta: true, 
> sql_range_step: 999999999, group_concat_max_len: 16384 do
> 
>   has countries(:id), as: :country_ids
>   has images(:id), as: :image_ids, facet: true
>   has subscribers(:id), as: :subscriber_ids, facet: true
>   has tags(:id), as: :tag_ids, facet: true
>   has videos(:id), as: :video_ids, facet: true
> 
>   indexes countries.name, as: :countries
>   indexes images.title, as: :images
>   indexes subscribers.title, as: :subscribers
>   indexes tags.name, as: :tags
>   indexes videos.title, as: :videos
> 
>   has updated_at
> 
> end
> 
> The generated sql is a massive group_by query and is not finishing. See it 
> here https://github.com/crossroads/rails3-ts-example#what-sphinx-is-doing
> 
> I'd really appreciate some advice on how to optimise this so indexing becomes 
> viable again. Do I just have too much going on here? I'm using facets, 
> indexes and attributes. Perhaps there is a better way to optimise? A friend 
> suggested pre-computing with some joins... how would this work?
> 
> Vital stats: using mysql v14.14, sphinx 2.0.4, Ubuntu, rails 3.2.13, 
> thinking-sphinx 3.0.4
> 
> For those who'd like to take a look, I've uploaded a sample project here 
> https://github.com/crossroads/rails3-ts-example which can be cloned. If you 
> follow the instructions, it will setup a db with test data and reproduce the 
> problem quickly.
> 
> There's also the sphinx generated SQL and EXPLAIN: 
> https://github.com/crossroads/rails3-ts-example#what-sphinx-is-doing 
> 
> Thanks in advance for anyone taking the time to read.
> 
> Regards,
> Steve
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Thinking Sphinx" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/thinking-sphinx.
> For more options, visit https://groups.google.com/groups/opt_out.
>  
>  


-- 
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/thinking-sphinx.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to