Heya Steve Was just looking into how difficult this would be to implement properly, and noticed I have added the ability to take a string as the source query - instead of the column references. So, it's possible without hacking around in the index definition itself:
https://gist.github.com/pat/6088629 It's worth noting that the document id (Sphinx's equivalent of a primary key) involves the normal primary key with an offset and a multiplier. Make sure those two integers match what's in your generated index in sql_query. They may change when you add other indices to your app (depends on alphabetical order of your index files). Also: there's probably some metaprogramming you could add to simplify things a bit more. Would love to hear if this approach helps with your real app and not just the test one :) -- Pat On 26/07/2013, at 12:14 AM, Pat Allan wrote: > Hi Steve > > I've got a way forward to greatly improve the speed of indexing… > unfortunately, it's not going to work within Thinking Sphinx easily right now. > > Sphinx has the ability to gather attribute and field values from separate > queries - this existed for TS v1/v2 for attributes, and fields was added in > TS v3, but the catch is those separate queries don't work for HABTM joins. > I'd love to change that, it's just painful from an ActiveRecord perspective > because you're not dealing with a model's table as the base, but the HABTM > join table. > > Here's the configuration for the relevant source that I modified by hand: > https://gist.github.com/pat/6080031 > > You'll see that the main query is nice and short - and then there's each of > the MVA and joined field definitions. If you put this in the generated source > definition in config/development.sphinx.conf, and then run the indexer > manually (NOT through the rake task, that'll overwrite this): > indexer --config config/development.sphinx.conf --all --rotate > > (Remove --rotate if Sphinx isn't running.) You'll see it's pretty damn fast. > > Now, ways forward? Well, I'd love to write something for TS v3 that can > handle HABTM - it's just a shame that it might need to be pure ARel rather > than ActiveRecord-built (which can otherwise help with joins). > > But otherwise: switch from HABTM to has_many/has_many :through - make each of > the joins an actual model. Then, you can add :source => :query to each of the > appropriate field and attribute definitions, and it should generate something > pretty much the same. > > Hope this provides some clarity at the very least! And also: thanks for the > test app, really helped with debugging! > > -- > Pat > > > On 25/07/2013, at 2:54 PM, Steve Kenworthy wrote: > >> Hi there, >> >> Firstly, thinking-sphinx is awesome and I love it. Thanks Pat for an >> excellent project. V3 is looking great and represents a lot of hard work and >> effort. >> >> I've been using thinking-sphinx to index a document model and it's really >> slowed down when I add lots of associations in the index. In fact, it never >> finishes on my machine (8Gig RAM, 8 CPU's) when I add 4 indexes. >> >> Times: >> • 4 seconds - when 1 association (images) is indexed >> • 6 seconds - when 2 associations (images and subscribers) are indexed >> • 23 seconds - when 2 associations (images and countries) are indexed >> • 115 seconds - when 3 associations (images, subscribers and tags) are >> indexed >> • 113 seconds - when 3 associations (images, subscribers and videos) >> are indexed (just to prove it's not tags slowing it down) >> • ꝏ (not finishing) - when 4 associations or more are selected. >> >> Here's my index file: >> >> ThinkingSphinx::Index.define :document, with: :active_record, delta: true, >> sql_range_step: 999999999, group_concat_max_len: 16384 do >> >> has countries(:id), as: :country_ids >> has images(:id), as: :image_ids, facet: true >> has subscribers(:id), as: :subscriber_ids, facet: true >> has tags(:id), as: :tag_ids, facet: true >> has videos(:id), as: :video_ids, facet: true >> >> indexes countries.name, as: :countries >> indexes images.title, as: :images >> indexes subscribers.title, as: :subscribers >> indexes tags.name, as: :tags >> indexes videos.title, as: :videos >> >> has updated_at >> >> end >> >> The generated sql is a massive group_by query and is not finishing. See it >> here https://github.com/crossroads/rails3-ts-example#what-sphinx-is-doing >> >> I'd really appreciate some advice on how to optimise this so indexing >> becomes viable again. Do I just have too much going on here? I'm using >> facets, indexes and attributes. Perhaps there is a better way to optimise? A >> friend suggested pre-computing with some joins... how would this work? >> >> Vital stats: using mysql v14.14, sphinx 2.0.4, Ubuntu, rails 3.2.13, >> thinking-sphinx 3.0.4 >> >> For those who'd like to take a look, I've uploaded a sample project here >> https://github.com/crossroads/rails3-ts-example which can be cloned. If you >> follow the instructions, it will setup a db with test data and reproduce the >> problem quickly. >> >> There's also the sphinx generated SQL and EXPLAIN: >> https://github.com/crossroads/rails3-ts-example#what-sphinx-is-doing >> >> Thanks in advance for anyone taking the time to read. >> >> Regards, >> Steve >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Thinking Sphinx" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at http://groups.google.com/group/thinking-sphinx. >> For more options, visit https://groups.google.com/groups/opt_out. >> >> > > > -- > You received this message because you are subscribed to the Google Groups > "Thinking Sphinx" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/thinking-sphinx. > For more options, visit https://groups.google.com/groups/opt_out. > > -- You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/thinking-sphinx. For more options, visit https://groups.google.com/groups/opt_out.
