Thanks, sharding the joined queries works. I'd also like to improve them for the deltas. Is there any way to add "WHERE delta = 1" to the joined queries in the delta definition?
On Monday, June 29, 2015 at 5:12:27 PM UTC+3, Pat Allan wrote: > > I’m not sure why the sizes are so different, but I think the overall issue > is related to the three attributes that have :source => :query. > > I’d recommend making two changes to each of them: > > * Add a condition to each query that filters by the appropriate incident > ids (like you’re doing for the main query) so the results are sharded in > the same way. > * Perhaps add a second SQL statement to each of those attributes > (separated by a semi-colon), with :source set to :ranged_query, as covered > in the Sphinx documentation: > http://sphinxsearch.com/docs/current.html#conf-sql-attr-multi > > The first of those isn’t too complex, so I’d start with that. Certainly > the second is far more fiddly, but may be worthwhile. > > Hope this helps! > > — > Pat > > On 29 Jun 2015, at 8:52 pm, [email protected] <javascript:> wrote: > > I even less understand the number of bytes in delta indexes 6 - 10. Why > does 1_delta contain 1128 bytes and 6_delta 24M? They're on the same > records. > > On Monday, June 29, 2015 at 9:03:04 AM UTC+3, [email protected] wrote: >> >> Rails version: 4.1.7 >> TS version: 3.0.6 >> >> On Monday, June 29, 2015 at 5:17:37 AM UTC+3, Pat Allan wrote: >>> >>> Hi Jonathan >>> >>> Can you share your index definitions so I can get a better idea of where >>> the problem might be? >>> >>> Also: which versions of Rails and Thinking Sphinx are you using? >>> >>> — >>> Pat >>> >>> On 28 Jun 2015, at 11:47 pm, [email protected] wrote: >>> >>> Hi Pat, >>> >>> I implemented according to this, and the indexing time went down (5 >>> times faster on development). However, the delta indexing time went up (30 >>> times slower on development). See below the indexing stats: >>> >>> Total docsBytesTime (sec)Total docsBytesTime (sec)incident_index_1_core >>> 7331653112239.436incident_index_6_core7331282395938.802 >>> incident_index_1_delta611280.184incident_index_6_delta6247634255.234 >>> incident_index_2_core7319675118945.477incident_index_7_core731928331726 >>> 8.819incident_index_2_delta58430.233incident_index_7_delta5247632895.321 >>> incident_index_3_core7390680381442.064incident_index_8_core739028310121 >>> 7.913incident_index_3_delta821430.203incident_index_8_delta824764366 >>> 5.282incident_index_4_core7278637766437.665incident_index_9_core7278 >>> 281622607.891incident_index_4_delta611080.436incident_index_9_delta6 >>> 247633305.456incident_index_5_core7396660135839.704 >>> incident_index_10_core7396281520759.562incident_index_5_delta69440.216 >>> incident_index_10_delta6247633085.303 >>> >>> Any idea why this is happening? >>> >>> Thanks, >>> Jonathan >>> >>> On Friday, July 26, 2013 at 3:57:38 PM UTC+3, Pat Allan wrote: >>>> >>>> Heya Steve >>>> >>>> Was just looking into how difficult this would be to implement >>>> properly, and noticed I have added the ability to take a string as the >>>> source query - instead of the column references. So, it's possible without >>>> hacking around in the index definition itself: >>>> >>>> https://gist.github.com/pat/6088629 >>>> >>>> It's worth noting that the document id (Sphinx's equivalent of a >>>> primary key) involves the normal primary key with an offset and a >>>> multiplier. Make sure those two integers match what's in your generated >>>> index in sql_query. They may change when you add other indices to your app >>>> (depends on alphabetical order of your index files). >>>> >>>> Also: there's probably some metaprogramming you could add to simplify >>>> things a bit more. >>>> >>>> Would love to hear if this approach helps with your real app and not >>>> just the test one :) >>>> >>>> -- >>>> Pat >>>> >>>> On 26/07/2013, at 12:14 AM, Pat Allan wrote: >>>> >>>> > Hi Steve >>>> > >>>> > I've got a way forward to greatly improve the speed of indexing… >>>> unfortunately, it's not going to work within Thinking Sphinx easily right >>>> now. >>>> > >>>> > Sphinx has the ability to gather attribute and field values from >>>> separate queries - this existed for TS v1/v2 for attributes, and fields >>>> was >>>> added in TS v3, but the catch is those separate queries don't work for >>>> HABTM joins. I'd love to change that, it's just painful from an >>>> ActiveRecord perspective because you're not dealing with a model's table >>>> as >>>> the base, but the HABTM join table. >>>> > >>>> > Here's the configuration for the relevant source that I modified by >>>> hand: >>>> > https://gist.github.com/pat/6080031 >>>> > >>>> > You'll see that the main query is nice and short - and then there's >>>> each of the MVA and joined field definitions. If you put this in the >>>> generated source definition in config/development.sphinx.conf, and then >>>> run >>>> the indexer manually (NOT through the rake task, that'll overwrite this): >>>> >>>> > indexer --config config/development.sphinx.conf --all --rotate >>>> > >>>> > (Remove --rotate if Sphinx isn't running.) You'll see it's pretty >>>> damn fast. >>>> > >>>> > Now, ways forward? Well, I'd love to write something for TS v3 that >>>> can handle HABTM - it's just a shame that it might need to be pure ARel >>>> rather than ActiveRecord-built (which can otherwise help with joins). >>>> > >>>> > But otherwise: switch from HABTM to has_many/has_many :through - make >>>> each of the joins an actual model. Then, you can add :source => :query to >>>> each of the appropriate field and attribute definitions, and it should >>>> generate something pretty much the same. >>>> > >>>> > Hope this provides some clarity at the very least! And also: thanks >>>> for the test app, really helped with debugging! >>>> > >>>> > -- >>>> > Pat >>>> > >>>> > >>>> > On 25/07/2013, at 2:54 PM, Steve Kenworthy wrote: >>>> > >>>> >> Hi there, >>>> >> >>>> >> Firstly, thinking-sphinx is awesome and I love it. Thanks Pat for an >>>> excellent project. V3 is looking great and represents a lot of hard work >>>> and effort. >>>> >> >>>> >> I've been using thinking-sphinx to index a document model and it's >>>> really slowed down when I add lots of associations in the index. In fact, >>>> it never finishes on my machine (8Gig RAM, 8 CPU's) when I add 4 indexes. >>>> >>>> >> >>>> >> Times: >>>> >> • 4 seconds - when 1 association (images) is indexed >>>> >> • 6 seconds - when 2 associations (images and subscribers) >>>> are indexed >>>> >> • 23 seconds - when 2 associations (images and countries) >>>> are indexed >>>> >> • 115 seconds - when 3 associations (images, subscribers and >>>> tags) are indexed >>>> >> • 113 seconds - when 3 associations (images, subscribers and >>>> videos) are indexed (just to prove it's not tags slowing it down) >>>> >> • ꝏ (not finishing) - when 4 associations or more are >>>> selected. >>>> >> >>>> >> Here's my index file: >>>> >> >>>> >> ThinkingSphinx::Index.define :document, with: :active_record, delta: >>>> true, sql_range_step: 999999999, group_concat_max_len: 16384 do >>>> >> >>>> >> has countries(:id), as: :country_ids >>>> >> has images(:id), as: :image_ids, facet: true >>>> >> has subscribers(:id), as: :subscriber_ids, facet: true >>>> >> has tags(:id), as: :tag_ids, facet: true >>>> >> has videos(:id), as: :video_ids, facet: true >>>> >> >>>> >> indexes countries.name, as: :countries >>>> >> indexes images.title, as: :images >>>> >> indexes subscribers.title, as: :subscribers >>>> >> indexes tags.name, as: :tags >>>> >> indexes videos.title, as: :videos >>>> >> >>>> >> has updated_at >>>> >> >>>> >> end >>>> >> >>>> >> The generated sql is a massive group_by query and is not finishing. >>>> See it here >>>> https://github.com/crossroads/rails3-ts-example#what-sphinx-is-doing >>>> >> >>>> >> I'd really appreciate some advice on how to optimise this so >>>> indexing becomes viable again. Do I just have too much going on here? I'm >>>> using facets, indexes and attributes. Perhaps there is a better way to >>>> optimise? A friend suggested pre-computing with some joins... how would >>>> this work? >>>> >> >>>> >> Vital stats: using mysql v14.14, sphinx 2.0.4, Ubuntu, rails 3.2.13, >>>> thinking-sphinx 3.0.4 >>>> >> >>>> >> For those who'd like to take a look, I've uploaded a sample project >>>> here https://github.com/crossroads/rails3-ts-example which can be >>>> cloned. If you follow the instructions, it will setup a db with test data >>>> and reproduce the problem quickly. >>>> >> >>>> >> There's also the sphinx generated SQL and EXPLAIN: >>>> https://github.com/crossroads/rails3-ts-example#what-sphinx-is-doing >>>> >> >>>> >> Thanks in advance for anyone taking the time to read. >>>> >> >>>> >> Regards, >>>> >> Steve >>>> >> >>>> >> -- >>>> >> You received this message because you are subscribed to the Google >>>> Groups "Thinking Sphinx" group. >>>> >> To unsubscribe from this group and stop receiving emails from it, >>>> send an email to [email protected]. >>>> >> To post to this group, send email to thinkin...@googlegroups. >>>> <http://googlegroups.com/>com <http://googlegroups.com/>. >>>> >> Visit this group at http://groups.google.com/group/thinking-sphinx. >>>> >> For more options, visit https://groups.google.com/groups/opt_out. >>>> >> >>>> >> >>>> > >>>> > >>>> > -- >>>> > You received this message because you are subscribed to the Google >>>> Groups "Thinking Sphinx" group. >>>> > To unsubscribe from this group and stop receiving emails from it, >>>> send an email to [email protected]. >>>> > To post to this group, send email to thinkin...@googlegroups. >>>> <http://googlegroups.com/>com <http://googlegroups.com/>. >>>> > Visit this group at http://groups.google.com/group/thinking-sphinx. >>>> > For more options, visit https://groups.google.com/groups/opt_out. >>>> > >>>> > >>>> >>>> >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Thinking Sphinx" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at http://groups.google.com/group/thinking-sphinx. >>> For more options, visit https://groups.google.com/d/optout. >>> >>> >>> > -- > You received this message because you are subscribed to the Google Groups > "Thinking Sphinx" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] <javascript:>. > To post to this group, send email to [email protected] > <javascript:>. > Visit this group at http://groups.google.com/group/thinking-sphinx. > For more options, visit https://groups.google.com/d/optout. > > > -- You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/thinking-sphinx. For more options, visit https://groups.google.com/d/optout.
