Thanks Pat,

Two very useful and insightful replies. I sat down to analyse what you 
wrote just after you sent the second post and it's definitely worked very 
well so far. Indexing is much faster with the string source query, and I've 
worked out the correct multipliers (thanks for the tip).

I'll finish off the indexes and test the app to confirm but so far so good.

Thanks for that!
Regards,
Steve

On Friday, July 26, 2013 8:57:38 PM UTC+8, Pat Allan wrote:

> Heya Steve 
>
> Was just looking into how difficult this would be to implement properly, 
> and noticed I have added the ability to take a string as the source query - 
> instead of the column references. So, it's possible without hacking around 
> in the index definition itself: 
>
> https://gist.github.com/pat/6088629 
>
> It's worth noting that the document id (Sphinx's equivalent of a primary 
> key) involves the normal primary key with an offset and a multiplier. Make 
> sure those two integers match what's in your generated index in sql_query. 
> They may change when you add other indices to your app (depends on 
> alphabetical order of your index files). 
>
> Also: there's probably some metaprogramming you could add to simplify 
> things a bit more. 
>
> Would love to hear if this approach helps with your real app and not just 
> the test one :) 
>
> -- 
> Pat 
>
> On 26/07/2013, at 12:14 AM, Pat Allan wrote: 
>
> > Hi Steve 
> > 
> > I've got a way forward to greatly improve the speed of indexing… 
> unfortunately, it's not going to work within Thinking Sphinx easily right 
> now. 
> > 
> > Sphinx has the ability to gather attribute and field values from 
> separate queries - this existed for TS v1/v2 for attributes, and fields was 
> added in TS v3, but the catch is those separate queries don't work for 
> HABTM joins. I'd love to change that, it's just painful from an 
> ActiveRecord perspective because you're not dealing with a model's table as 
> the base, but the HABTM join table. 
> > 
> > Here's the configuration for the relevant source that I modified by 
> hand: 
> > https://gist.github.com/pat/6080031 
> > 
> > You'll see that the main query is nice and short - and then there's each 
> of the MVA and joined field definitions. If you put this in the generated 
> source definition in config/development.sphinx.conf, and then run the 
> indexer manually (NOT through the rake task, that'll overwrite this): 
> >  indexer --config config/development.sphinx.conf --all --rotate 
> > 
> > (Remove --rotate if Sphinx isn't running.) You'll see it's pretty damn 
> fast. 
> > 
> > Now, ways forward? Well, I'd love to write something for TS v3 that can 
> handle HABTM - it's just a shame that it might need to be pure ARel rather 
> than ActiveRecord-built (which can otherwise help with joins). 
> > 
> > But otherwise: switch from HABTM to has_many/has_many :through - make 
> each of the joins an actual model. Then, you can add :source => :query to 
> each of the appropriate field and attribute definitions, and it should 
> generate something pretty much the same. 
> > 
> > Hope this provides some clarity at the very least! And also: thanks for 
> the test app, really helped with debugging! 
> > 
> > -- 
> > Pat 
> > 
> > 
> > On 25/07/2013, at 2:54 PM, Steve Kenworthy wrote: 
> > 
> >> Hi there, 
> >> 
> >> Firstly, thinking-sphinx is awesome and I love it. Thanks Pat for an 
> excellent project. V3 is looking great and represents a lot of hard work 
> and effort. 
> >> 
> >> I've been using thinking-sphinx to index a document model and it's 
> really slowed down when I add lots of associations in the index. In fact, 
> it never finishes on my machine (8Gig RAM, 8 CPU's) when I add 4 indexes. 
> >> 
> >> Times: 
> >>         • 4 seconds - when 1 association (images) is indexed 
> >>         • 6 seconds - when 2 associations (images and subscribers) are 
> indexed 
> >>         • 23 seconds - when 2 associations (images and countries) are 
> indexed 
> >>         • 115 seconds - when 3 associations (images, subscribers and 
> tags) are indexed 
> >>         • 113 seconds - when 3 associations (images, subscribers and 
> videos) are indexed (just to prove it's not tags slowing it down) 
> >>         • ꝏ (not finishing) - when 4 associations or more are selected. 
> >> 
> >> Here's my index file: 
> >> 
> >> ThinkingSphinx::Index.define :document, with: :active_record, delta: 
> true, sql_range_step: 999999999, group_concat_max_len: 16384 do 
> >> 
> >>  has countries(:id), as: :country_ids 
> >>  has images(:id), as: :image_ids, facet: true 
> >>  has subscribers(:id), as: :subscriber_ids, facet: true 
> >>  has tags(:id), as: :tag_ids, facet: true 
> >>  has videos(:id), as: :video_ids, facet: true 
> >> 
> >>  indexes countries.name, as: :countries 
> >>  indexes images.title, as: :images 
> >>  indexes subscribers.title, as: :subscribers 
> >>  indexes tags.name, as: :tags 
> >>  indexes videos.title, as: :videos 
> >> 
> >>  has updated_at 
> >> 
> >> end 
> >> 
> >> The generated sql is a massive group_by query and is not finishing. See 
> it here 
> https://github.com/crossroads/rails3-ts-example#what-sphinx-is-doing 
> >> 
> >> I'd really appreciate some advice on how to optimise this so indexing 
> becomes viable again. Do I just have too much going on here? I'm using 
> facets, indexes and attributes. Perhaps there is a better way to optimise? 
> A friend suggested pre-computing with some joins... how would this work? 
> >> 
> >> Vital stats: using mysql v14.14, sphinx 2.0.4, Ubuntu, rails 3.2.13, 
> thinking-sphinx 3.0.4 
> >> 
> >> For those who'd like to take a look, I've uploaded a sample project 
> here https://github.com/crossroads/rails3-ts-example which can be cloned. 
> If you follow the instructions, it will setup a db with test data and 
> reproduce the problem quickly. 
> >> 
> >> There's also the sphinx generated SQL and EXPLAIN: 
> https://github.com/crossroads/rails3-ts-example#what-sphinx-is-doing 
> >> 
> >> Thanks in advance for anyone taking the time to read. 
> >> 
> >> Regards, 
> >> Steve 
> >> 
> >> -- 
> >> You received this message because you are subscribed to the Google 
> Groups "Thinking Sphinx" group. 
> >> To unsubscribe from this group and stop receiving emails from it, send 
> an email to [email protected] <javascript:>. 
> >> To post to this group, send email to 
> >> [email protected]<javascript:>. 
>
> >> Visit this group at http://groups.google.com/group/thinking-sphinx. 
> >> For more options, visit https://groups.google.com/groups/opt_out. 
> >> 
> >> 
> > 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "Thinking Sphinx" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to [email protected] <javascript:>. 
> > To post to this group, send email to 
> > [email protected]<javascript:>. 
>
> > Visit this group at http://groups.google.com/group/thinking-sphinx. 
> > For more options, visit https://groups.google.com/groups/opt_out. 
> > 
> > 
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/thinking-sphinx.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to