Re: [ts] Slow indexing (never finishes) when indexing 4 or more associations

jonsgold Mon, 29 Jun 2015 23:46:01 -0700

Thanks, sharding the joined queries works. I'd also like to improve them 
for the deltas. Is there any way to add "WHERE delta = 1" to the joined 
queries in the delta definition?


On Monday, June 29, 2015 at 5:12:27 PM UTC+3, Pat Allan wrote:
>
> I’m not sure why the sizes are so different, but I think the overall issue 
> is related to the three attributes that have :source => :query.
>
> I’d recommend making two changes to each of them:
>
> * Add a condition to each query that filters by the appropriate incident 
> ids (like you’re doing for the main query) so the results are sharded in 
> the same way.
> * Perhaps add a second SQL statement to each of those attributes 
> (separated by a semi-colon), with :source set to :ranged_query, as covered 
> in the Sphinx documentation:
> http://sphinxsearch.com/docs/current.html#conf-sql-attr-multi
>
> The first of those isn’t too complex, so I’d start with that. Certainly 
> the second is far more fiddly, but may be worthwhile.
>
> Hope this helps!
>
> — 
> Pat
>
> On 29 Jun 2015, at 8:52 pm, [email protected] <javascript:> wrote:
>
> I even less understand the number of bytes in delta indexes 6 - 10. Why 
> does 1_delta contain 1128 bytes and 6_delta 24M? They're on the same 
> records.
>
> On Monday, June 29, 2015 at 9:03:04 AM UTC+3, [email protected] wrote:
>>
>> Rails version: 4.1.7
>> TS version: 3.0.6
>>
>> On Monday, June 29, 2015 at 5:17:37 AM UTC+3, Pat Allan wrote:
>>>
>>> Hi Jonathan
>>>
>>> Can you share your index definitions so I can get a better idea of where 
>>> the problem might be?
>>>
>>> Also: which versions of Rails and Thinking Sphinx are you using?
>>>
>>> — 
>>> Pat
>>>
>>> On 28 Jun 2015, at 11:47 pm, [email protected] wrote:
>>>
>>> Hi Pat,
>>>
>>> I implemented according to this, and the indexing time went down (5 
>>> times faster on development). However, the delta indexing time went up (30 
>>> times slower on development). See below the indexing stats:
>>>
>>> Total docsBytesTime (sec)Total docsBytesTime (sec)incident_index_1_core
>>> 7331653112239.436incident_index_6_core7331282395938.802
>>> incident_index_1_delta611280.184incident_index_6_delta6247634255.234
>>> incident_index_2_core7319675118945.477incident_index_7_core731928331726
>>> 8.819incident_index_2_delta58430.233incident_index_7_delta5247632895.321
>>> incident_index_3_core7390680381442.064incident_index_8_core739028310121
>>> 7.913incident_index_3_delta821430.203incident_index_8_delta824764366
>>> 5.282incident_index_4_core7278637766437.665incident_index_9_core7278
>>> 281622607.891incident_index_4_delta611080.436incident_index_9_delta6
>>> 247633305.456incident_index_5_core7396660135839.704
>>> incident_index_10_core7396281520759.562incident_index_5_delta69440.216
>>> incident_index_10_delta6247633085.303
>>>
>>> Any idea why this is happening?
>>>
>>> Thanks,
>>> Jonathan
>>>
>>> On Friday, July 26, 2013 at 3:57:38 PM UTC+3, Pat Allan wrote:
>>>>
>>>> Heya Steve 
>>>>
>>>> Was just looking into how difficult this would be to implement 
>>>> properly, and noticed I have added the ability to take a string as the 
>>>> source query - instead of the column references. So, it's possible without 
>>>> hacking around in the index definition itself: 
>>>>
>>>> https://gist.github.com/pat/6088629 
>>>>
>>>> It's worth noting that the document id (Sphinx's equivalent of a 
>>>> primary key) involves the normal primary key with an offset and a 
>>>> multiplier. Make sure those two integers match what's in your generated 
>>>> index in sql_query. They may change when you add other indices to your app 
>>>> (depends on alphabetical order of your index files). 
>>>>
>>>> Also: there's probably some metaprogramming you could add to simplify 
>>>> things a bit more. 
>>>>
>>>> Would love to hear if this approach helps with your real app and not 
>>>> just the test one :) 
>>>>
>>>> -- 
>>>> Pat 
>>>>
>>>> On 26/07/2013, at 12:14 AM, Pat Allan wrote: 
>>>>
>>>> > Hi Steve 
>>>> > 
>>>> > I've got a way forward to greatly improve the speed of indexing… 
>>>> unfortunately, it's not going to work within Thinking Sphinx easily right 
>>>> now. 
>>>> > 
>>>> > Sphinx has the ability to gather attribute and field values from 
>>>> separate queries - this existed for TS v1/v2 for attributes, and fields 
>>>> was 
>>>> added in TS v3, but the catch is those separate queries don't work for 
>>>> HABTM joins. I'd love to change that, it's just painful from an 
>>>> ActiveRecord perspective because you're not dealing with a model's table 
>>>> as 
>>>> the base, but the HABTM join table. 
>>>> > 
>>>> > Here's the configuration for the relevant source that I modified by 
>>>> hand: 
>>>> > https://gist.github.com/pat/6080031 
>>>> > 
>>>> > You'll see that the main query is nice and short - and then there's 
>>>> each of the MVA and joined field definitions. If you put this in the 
>>>> generated source definition in config/development.sphinx.conf, and then 
>>>> run 
>>>> the indexer manually (NOT through the rake task, that'll overwrite this):
>>>>  
>>>> >  indexer --config config/development.sphinx.conf --all --rotate 
>>>> > 
>>>> > (Remove --rotate if Sphinx isn't running.) You'll see it's pretty 
>>>> damn fast. 
>>>> > 
>>>> > Now, ways forward? Well, I'd love to write something for TS v3 that 
>>>> can handle HABTM - it's just a shame that it might need to be pure ARel 
>>>> rather than ActiveRecord-built (which can otherwise help with joins). 
>>>> > 
>>>> > But otherwise: switch from HABTM to has_many/has_many :through - make 
>>>> each of the joins an actual model. Then, you can add :source => :query to 
>>>> each of the appropriate field and attribute definitions, and it should 
>>>> generate something pretty much the same. 
>>>> > 
>>>> > Hope this provides some clarity at the very least! And also: thanks 
>>>> for the test app, really helped with debugging! 
>>>> > 
>>>> > -- 
>>>> > Pat 
>>>> > 
>>>> > 
>>>> > On 25/07/2013, at 2:54 PM, Steve Kenworthy wrote: 
>>>> > 
>>>> >> Hi there, 
>>>> >> 
>>>> >> Firstly, thinking-sphinx is awesome and I love it. Thanks Pat for an 
>>>> excellent project. V3 is looking great and represents a lot of hard work 
>>>> and effort. 
>>>> >> 
>>>> >> I've been using thinking-sphinx to index a document model and it's 
>>>> really slowed down when I add lots of associations in the index. In fact, 
>>>> it never finishes on my machine (8Gig RAM, 8 CPU's) when I add 4 indexes.
>>>>  
>>>> >> 
>>>> >> Times: 
>>>> >>         • 4 seconds - when 1 association (images) is indexed 
>>>> >>         • 6 seconds - when 2 associations (images and subscribers) 
>>>> are indexed 
>>>> >>         • 23 seconds - when 2 associations (images and countries) 
>>>> are indexed 
>>>> >>         • 115 seconds - when 3 associations (images, subscribers and 
>>>> tags) are indexed 
>>>> >>         • 113 seconds - when 3 associations (images, subscribers and 
>>>> videos) are indexed (just to prove it's not tags slowing it down) 
>>>> >>         • ꝏ (not finishing) - when 4 associations or more are 
>>>> selected. 
>>>> >> 
>>>> >> Here's my index file: 
>>>> >> 
>>>> >> ThinkingSphinx::Index.define :document, with: :active_record, delta: 
>>>> true, sql_range_step: 999999999, group_concat_max_len: 16384 do 
>>>> >> 
>>>> >>  has countries(:id), as: :country_ids 
>>>> >>  has images(:id), as: :image_ids, facet: true 
>>>> >>  has subscribers(:id), as: :subscriber_ids, facet: true 
>>>> >>  has tags(:id), as: :tag_ids, facet: true 
>>>> >>  has videos(:id), as: :video_ids, facet: true 
>>>> >> 
>>>> >>  indexes countries.name, as: :countries 
>>>> >>  indexes images.title, as: :images 
>>>> >>  indexes subscribers.title, as: :subscribers 
>>>> >>  indexes tags.name, as: :tags 
>>>> >>  indexes videos.title, as: :videos 
>>>> >> 
>>>> >>  has updated_at 
>>>> >> 
>>>> >> end 
>>>> >> 
>>>> >> The generated sql is a massive group_by query and is not finishing. 
>>>> See it here 
>>>> https://github.com/crossroads/rails3-ts-example#what-sphinx-is-doing 
>>>> >> 
>>>> >> I'd really appreciate some advice on how to optimise this so 
>>>> indexing becomes viable again. Do I just have too much going on here? I'm 
>>>> using facets, indexes and attributes. Perhaps there is a better way to 
>>>> optimise? A friend suggested pre-computing with some joins... how would 
>>>> this work? 
>>>> >> 
>>>> >> Vital stats: using mysql v14.14, sphinx 2.0.4, Ubuntu, rails 3.2.13, 
>>>> thinking-sphinx 3.0.4 
>>>> >> 
>>>> >> For those who'd like to take a look, I've uploaded a sample project 
>>>> here https://github.com/crossroads/rails3-ts-example which can be 
>>>> cloned. If you follow the instructions, it will setup a db with test data 
>>>> and reproduce the problem quickly. 
>>>> >> 
>>>> >> There's also the sphinx generated SQL and EXPLAIN: 
>>>> https://github.com/crossroads/rails3-ts-example#what-sphinx-is-doing 
>>>> >> 
>>>> >> Thanks in advance for anyone taking the time to read. 
>>>> >> 
>>>> >> Regards, 
>>>> >> Steve 
>>>> >> 
>>>> >> -- 
>>>> >> You received this message because you are subscribed to the Google 
>>>> Groups "Thinking Sphinx" group. 
>>>> >> To unsubscribe from this group and stop receiving emails from it, 
>>>> send an email to [email protected]. 
>>>> >> To post to this group, send email to thinkin...@googlegroups. 
>>>> <http://googlegroups.com/>com <http://googlegroups.com/>. 
>>>> >> Visit this group at http://groups.google.com/group/thinking-sphinx. 
>>>> >> For more options, visit https://groups.google.com/groups/opt_out. 
>>>> >> 
>>>> >> 
>>>> > 
>>>> > 
>>>> > -- 
>>>> > You received this message because you are subscribed to the Google 
>>>> Groups "Thinking Sphinx" group. 
>>>> > To unsubscribe from this group and stop receiving emails from it, 
>>>> send an email to [email protected]. 
>>>> > To post to this group, send email to thinkin...@googlegroups. 
>>>> <http://googlegroups.com/>com <http://googlegroups.com/>. 
>>>> > Visit this group at http://groups.google.com/group/thinking-sphinx. 
>>>> > For more options, visit https://groups.google.com/groups/opt_out. 
>>>> > 
>>>> > 
>>>>
>>>>
>>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Thinking Sphinx" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at http://groups.google.com/group/thinking-sphinx.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>
>>>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Thinking Sphinx" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> To post to this group, send email to [email protected] 
> <javascript:>.
> Visit this group at http://groups.google.com/group/thinking-sphinx.
> For more options, visit https://groups.google.com/d/optout.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/thinking-sphinx.
For more options, visit https://groups.google.com/d/optout.

Re: [ts] Slow indexing (never finishes) when indexing 4 or more associations

Reply via email to