Heya Steve

Was just looking into how difficult this would be to implement properly, and 
noticed I have added the ability to take a string as the source query - instead 
of the column references. So, it's possible without hacking around in the index 
definition itself:

https://gist.github.com/pat/6088629

It's worth noting that the document id (Sphinx's equivalent of a primary key) 
involves the normal primary key with an offset and a multiplier. Make sure 
those two integers match what's in your generated index in sql_query. They may 
change when you add other indices to your app (depends on alphabetical order of 
your index files).

Also: there's probably some metaprogramming you could add to simplify things a 
bit more.

Would love to hear if this approach helps with your real app and not just the 
test one :)

-- 
Pat

On 26/07/2013, at 12:14 AM, Pat Allan wrote:

> Hi Steve
> 
> I've got a way forward to greatly improve the speed of indexing… 
> unfortunately, it's not going to work within Thinking Sphinx easily right now.
> 
> Sphinx has the ability to gather attribute and field values from separate 
> queries - this existed for TS v1/v2 for attributes, and fields was added in 
> TS v3, but the catch is those separate queries don't work for HABTM joins. 
> I'd love to change that, it's just painful from an ActiveRecord perspective 
> because you're not dealing with a model's table as the base, but the HABTM 
> join table.
> 
> Here's the configuration for the relevant source that I modified by hand:
> https://gist.github.com/pat/6080031
> 
> You'll see that the main query is nice and short - and then there's each of 
> the MVA and joined field definitions. If you put this in the generated source 
> definition in config/development.sphinx.conf, and then run the indexer 
> manually (NOT through the rake task, that'll overwrite this):
>  indexer --config config/development.sphinx.conf --all --rotate
> 
> (Remove --rotate if Sphinx isn't running.) You'll see it's pretty damn fast.
> 
> Now, ways forward? Well, I'd love to write something for TS v3 that can 
> handle HABTM - it's just a shame that it might need to be pure ARel rather 
> than ActiveRecord-built (which can otherwise help with joins).
> 
> But otherwise: switch from HABTM to has_many/has_many :through - make each of 
> the joins an actual model. Then, you can add :source => :query to each of the 
> appropriate field and attribute definitions, and it should generate something 
> pretty much the same.
> 
> Hope this provides some clarity at the very least! And also: thanks for the 
> test app, really helped with debugging!
> 
> -- 
> Pat
> 
> 
> On 25/07/2013, at 2:54 PM, Steve Kenworthy wrote:
> 
>> Hi there,
>> 
>> Firstly, thinking-sphinx is awesome and I love it. Thanks Pat for an 
>> excellent project. V3 is looking great and represents a lot of hard work and 
>> effort.
>> 
>> I've been using thinking-sphinx to index a document model and it's really 
>> slowed down when I add lots of associations in the index. In fact, it never 
>> finishes on my machine (8Gig RAM, 8 CPU's) when I add 4 indexes.
>> 
>> Times:
>>      • 4 seconds - when 1 association (images) is indexed
>>      • 6 seconds - when 2 associations (images and subscribers) are indexed
>>      • 23 seconds - when 2 associations (images and countries) are indexed
>>      • 115 seconds - when 3 associations (images, subscribers and tags) are 
>> indexed
>>      • 113 seconds - when 3 associations (images, subscribers and videos) 
>> are indexed (just to prove it's not tags slowing it down)
>>      • ꝏ (not finishing) - when 4 associations or more are selected.
>> 
>> Here's my index file:
>> 
>> ThinkingSphinx::Index.define :document, with: :active_record, delta: true, 
>> sql_range_step: 999999999, group_concat_max_len: 16384 do
>> 
>>  has countries(:id), as: :country_ids
>>  has images(:id), as: :image_ids, facet: true
>>  has subscribers(:id), as: :subscriber_ids, facet: true
>>  has tags(:id), as: :tag_ids, facet: true
>>  has videos(:id), as: :video_ids, facet: true
>> 
>>  indexes countries.name, as: :countries
>>  indexes images.title, as: :images
>>  indexes subscribers.title, as: :subscribers
>>  indexes tags.name, as: :tags
>>  indexes videos.title, as: :videos
>> 
>>  has updated_at
>> 
>> end
>> 
>> The generated sql is a massive group_by query and is not finishing. See it 
>> here https://github.com/crossroads/rails3-ts-example#what-sphinx-is-doing
>> 
>> I'd really appreciate some advice on how to optimise this so indexing 
>> becomes viable again. Do I just have too much going on here? I'm using 
>> facets, indexes and attributes. Perhaps there is a better way to optimise? A 
>> friend suggested pre-computing with some joins... how would this work?
>> 
>> Vital stats: using mysql v14.14, sphinx 2.0.4, Ubuntu, rails 3.2.13, 
>> thinking-sphinx 3.0.4
>> 
>> For those who'd like to take a look, I've uploaded a sample project here 
>> https://github.com/crossroads/rails3-ts-example which can be cloned. If you 
>> follow the instructions, it will setup a db with test data and reproduce the 
>> problem quickly.
>> 
>> There's also the sphinx generated SQL and EXPLAIN: 
>> https://github.com/crossroads/rails3-ts-example#what-sphinx-is-doing 
>> 
>> Thanks in advance for anyone taking the time to read.
>> 
>> Regards,
>> Steve
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Thinking Sphinx" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/thinking-sphinx.
>> For more options, visit https://groups.google.com/groups/opt_out.
>> 
>> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Thinking Sphinx" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/thinking-sphinx.
> For more options, visit https://groups.google.com/groups/opt_out.
> 
> 


-- 
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/thinking-sphinx.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to