Awesome, works like a charm! See the SO question that refers to this thread: http://stackoverflow.com/questions/30913789/thinking-sphinx-indexing-performance.
On Tuesday, June 30, 2015 at 10:00:13 AM UTC+3, Pat Allan wrote: > > You’d have to end up with a fair bit of duplication, but it’s technically > possible. > > # creates both core and delta indices > ThinkingSphinx::Index.define(:article, > :with => :active_record, > :delta => ThinkingSphinx::Deltas::ResqueDelta > ) do > # … > end > > Is the equivalent of: > > # create core index > ThinkingSphinx::Index.define(:article, > :with => :active_record, > :delta? => false, > :delta_processor => > ThinkingSphinx::Deltas.processor_for(ThinkingSphinx::Deltas::ResqueDelta) > ) do > # … > end > > # create delta index > ThinkingSphinx::Index.define(:article, > :with => :active_record, > :delta? => true, > :delta_processor => > ThinkingSphinx::Deltas.processor_for(ThinkingSphinx::Deltas::ResqueDelta) > ) do > # … > end > > The first being the core index, the second being the delta, with the same > definition block normally being applied to both. If you want to have > something slightly different in the delta index definition block, I guess > you could try something along these lines? > > — > Pat > > > On 30 Jun 2015, at 4:45 pm, [email protected] <javascript:> wrote: > > > > Thanks, sharding the joined queries works. I'd also like to improve them > for the deltas. Is there any way to add "WHERE delta = 1" to the joined > queries in the delta definition? > > > > On Monday, June 29, 2015 at 5:12:27 PM UTC+3, Pat Allan wrote: > > I’m not sure why the sizes are so different, but I think the overall > issue is related to the three attributes that have :source => :query. > > > > I’d recommend making two changes to each of them: > > > > * Add a condition to each query that filters by the appropriate incident > ids (like you’re doing for the main query) so the results are sharded in > the same way. > > * Perhaps add a second SQL statement to each of those attributes > (separated by a semi-colon), with :source set to :ranged_query, as covered > in the Sphinx documentation: > > http://sphinxsearch.com/docs/current.html#conf-sql-attr-multi > > > > The first of those isn’t too complex, so I’d start with that. Certainly > the second is far more fiddly, but may be worthwhile. > > > > Hope this helps! > > > > — > > Pat > > > >> On 29 Jun 2015, at 8:52 pm, [email protected] wrote: > >> > >> I even less understand the number of bytes in delta indexes 6 - 10. Why > does 1_delta contain 1128 bytes and 6_delta 24M? They're on the same > records. > >> > >> On Monday, June 29, 2015 at 9:03:04 AM UTC+3, [email protected] wrote: > >> Rails version: 4.1.7 > >> TS version: 3.0.6 > >> > >> On Monday, June 29, 2015 at 5:17:37 AM UTC+3, Pat Allan wrote: > >> Hi Jonathan > >> > >> Can you share your index definitions so I can get a better idea of > where the problem might be? > >> > >> Also: which versions of Rails and Thinking Sphinx are you using? > >> > >> — > >> Pat > >> > >>> On 28 Jun 2015, at 11:47 pm, [email protected] wrote: > >>> > >>> Hi Pat, > >>> > >>> I implemented according to this, and the indexing time went down (5 > times faster on development). However, the delta indexing time went up (30 > times slower on development). See below the indexing stats: > >>> > >>> Total docs Bytes Time > (sec) Total docs Bytes Time > (sec) > >>> > incident_index_1_core 7331 6531122 39.436 > incident_index_6_core 7331 28239593 8.802 > > >>> > incident_index_1_delta 6 1128 0.184 > incident_index_6_delta 6 24763425 5.234 > > >>> > incident_index_2_core 7319 6751189 45.477 > incident_index_7_core 7319 28331726 8.819 > > >>> > incident_index_2_delta 5 843 0.233 > incident_index_7_delta 5 24763289 5.321 > > >>> > incident_index_3_core 7390 6803814 42.064 > incident_index_8_core 7390 28310121 7.913 > > >>> > incident_index_3_delta 8 2143 0.203 > incident_index_8_delta 8 24764366 5.282 > > >>> > incident_index_4_core 7278 6377664 37.665 > incident_index_9_core 7278 28162260 7.891 > > >>> > incident_index_4_delta 6 1108 0.436 > incident_index_9_delta 6 24763330 5.456 > > >>> > incident_index_5_core 7396 6601358 39.704 > incident_index_10_core 7396 28152075 9.562 > > >>> > incident_index_5_delta 6 944 0.216 > incident_index_10_delta 6 24763308 5.303 > > >>> > >>> Any idea why this is happening? > >>> > >>> Thanks, > >>> Jonathan > >>> > >>> On Friday, July 26, 2013 at 3:57:38 PM UTC+3, Pat Allan wrote: > >>> Heya Steve > >>> > >>> Was just looking into how difficult this would be to implement > properly, and noticed I have added the ability to take a string as the > source query - instead of the column references. So, it's possible without > hacking around in the index definition itself: > >>> > >>> https://gist.github.com/pat/6088629 > >>> > >>> It's worth noting that the document id (Sphinx's equivalent of a > primary key) involves the normal primary key with an offset and a > multiplier. Make sure those two integers match what's in your generated > index in sql_query. They may change when you add other indices to your app > (depends on alphabetical order of your index files). > >>> > >>> Also: there's probably some metaprogramming you could add to simplify > things a bit more. > >>> > >>> Would love to hear if this approach helps with your real app and not > just the test one :) > >>> > >>> -- > >>> Pat > >>> > >>> On 26/07/2013, at 12:14 AM, Pat Allan wrote: > >>> > >>> > Hi Steve > >>> > > >>> > I've got a way forward to greatly improve the speed of indexing… > unfortunately, it's not going to work within Thinking Sphinx easily right > now. > >>> > > >>> > Sphinx has the ability to gather attribute and field values from > separate queries - this existed for TS v1/v2 for attributes, and fields was > added in TS v3, but the catch is those separate queries don't work for > HABTM joins. I'd love to change that, it's just painful from an > ActiveRecord perspective because you're not dealing with a model's table as > the base, but the HABTM join table. > >>> > > >>> > Here's the configuration for the relevant source that I modified by > hand: > >>> > https://gist.github.com/pat/6080031 > >>> > > >>> > You'll see that the main query is nice and short - and then there's > each of the MVA and joined field definitions. If you put this in the > generated source definition in config/development.sphinx.conf, and then run > the indexer manually (NOT through the rake task, that'll overwrite this): > >>> > indexer --config config/development.sphinx.conf --all --rotate > >>> > > >>> > (Remove --rotate if Sphinx isn't running.) You'll see it's pretty > damn fast. > >>> > > >>> > Now, ways forward? Well, I'd love to write something for TS v3 that > can handle HABTM - it's just a shame that it might need to be pure ARel > rather than ActiveRecord-built (which can otherwise help with joins). > >>> > > >>> > But otherwise: switch from HABTM to has_many/has_many :through - > make each of the joins an actual model. Then, you can add :source => :query > to each of the appropriate field and attribute definitions, and it should > generate something pretty much the same. > >>> > > >>> > Hope this provides some clarity at the very least! And also: thanks > for the test app, really helped with debugging! > >>> > > >>> > -- > >>> > Pat > >>> > > >>> > > >>> > On 25/07/2013, at 2:54 PM, Steve Kenworthy wrote: > >>> > > >>> >> Hi there, > >>> >> > >>> >> Firstly, thinking-sphinx is awesome and I love it. Thanks Pat for > an excellent project. V3 is looking great and represents a lot of hard work > and effort. > >>> >> > >>> >> I've been using thinking-sphinx to index a document model and it's > really slowed down when I add lots of associations in the index. In fact, > it never finishes on my machine (8Gig RAM, 8 CPU's) when I add 4 indexes. > >>> >> > >>> >> Times: > >>> >> • 4 seconds - when 1 association (images) is indexed > >>> >> • 6 seconds - when 2 associations (images and subscribers) > are indexed > >>> >> • 23 seconds - when 2 associations (images and countries) > are indexed > >>> >> • 115 seconds - when 3 associations (images, subscribers > and tags) are indexed > >>> >> • 113 seconds - when 3 associations (images, subscribers > and videos) are indexed (just to prove it's not tags slowing it down) > >>> >> • ꝏ (not finishing) - when 4 associations or more are > selected. > >>> >> > >>> >> Here's my index file: > >>> >> > >>> >> ThinkingSphinx::Index.define :document, with: :active_record, > delta: true, sql_range_step: 999999999, group_concat_max_len: 16384 do > >>> >> > >>> >> has countries(:id), as: :country_ids > >>> >> has images(:id), as: :image_ids, facet: true > >>> >> has subscribers(:id), as: :subscriber_ids, facet: true > >>> >> has tags(:id), as: :tag_ids, facet: true > >>> >> has videos(:id), as: :video_ids, facet: true > >>> >> > >>> >> indexes countries.name, as: :countries > >>> >> indexes images.title, as: :images > >>> >> indexes subscribers.title, as: :subscribers > >>> >> indexes tags.name, as: :tags > >>> >> indexes videos.title, as: :videos > >>> >> > >>> >> has updated_at > >>> >> > >>> >> end > >>> >> > >>> >> The generated sql is a massive group_by query and is not finishing. > See it here > https://github.com/crossroads/rails3-ts-example#what-sphinx-is-doing > >>> >> > >>> >> I'd really appreciate some advice on how to optimise this so > indexing becomes viable again. Do I just have too much going on here? I'm > using facets, indexes and attributes. Perhaps there is a better way to > optimise? A friend suggested pre-computing with some joins... how would > this work? > >>> >> > >>> >> Vital stats: using mysql v14.14, sphinx 2.0.4, Ubuntu, rails > 3.2.13, thinking-sphinx 3.0.4 > >>> >> > >>> >> For those who'd like to take a look, I've uploaded a sample project > here https://github.com/crossroads/rails3-ts-example which can be cloned. > If you follow the instructions, it will setup a db with test data and > reproduce the problem quickly. > >>> >> > >>> >> There's also the sphinx generated SQL and EXPLAIN: > https://github.com/crossroads/rails3-ts-example#what-sphinx-is-doing > >>> >> > >>> >> Thanks in advance for anyone taking the time to read. > >>> >> > >>> >> Regards, > >>> >> Steve > >>> >> > >>> >> -- > >>> >> You received this message because you are subscribed to the Google > Groups "Thinking Sphinx" group. > >>> >> To unsubscribe from this group and stop receiving emails from it, > send an email to [email protected]. > >>> >> To post to this group, send email to [email protected]. > >>> >> Visit this group at http://groups.google.com/group/thinking-sphinx. > > >>> >> For more options, visit https://groups.google.com/groups/opt_out. > >>> >> > >>> >> > >>> > > >>> > > >>> > -- > >>> > You received this message because you are subscribed to the Google > Groups "Thinking Sphinx" group. > >>> > To unsubscribe from this group and stop receiving emails from it, > send an email to [email protected]. > >>> > To post to this group, send email to [email protected]. > >>> > Visit this group at http://groups.google.com/group/thinking-sphinx. > >>> > For more options, visit https://groups.google.com/groups/opt_out. > >>> > > >>> > > >>> > >>> > >>> > >>> -- > >>> You received this message because you are subscribed to the Google > Groups "Thinking Sphinx" group. > >>> To unsubscribe from this group and stop receiving emails from it, send > an email to [email protected] <javascript:>. > >>> To post to this group, send email to [email protected]. > >>> Visit this group at http://groups.google.com/group/thinking-sphinx. > >>> For more options, visit https://groups.google.com/d/optout. > >> > >> > >> -- > >> You received this message because you are subscribed to the Google > Groups "Thinking Sphinx" group. > >> To unsubscribe from this group and stop receiving emails from it, send > an email to [email protected]. > >> To post to this group, send email to [email protected]. > >> Visit this group at http://groups.google.com/group/thinking-sphinx. > >> For more options, visit https://groups.google.com/d/optout. > > > > > > -- > > You received this message because you are subscribed to the Google > Groups "Thinking Sphinx" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to [email protected] <javascript:>. > > To post to this group, send email to [email protected] > <javascript:>. > > Visit this group at http://groups.google.com/group/thinking-sphinx. > > For more options, visit https://groups.google.com/d/optout. > > -- You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/thinking-sphinx. For more options, visit https://groups.google.com/d/optout.
