Re: [ts] Suddenly slow index

Daniel Vandersluis Thu, 18 Jul 2013 12:15:53 -0700

Seems to be because GROUP_CONCAT(job_applications_users.`job_id` SEPARATOR 
',') AS `job_ids` was still being included in the sql_query line... is 
there another option I need to remove it?



On Thursday, July 18, 2013 3:03:01 PM UTC-4, Daniel Vandersluis wrote:
>
> Hmm... when I try that I get this error:
>
> indexing index 'user_core'...
> ERROR: index 'user_core': multi-valued attribute 'job_ids' of wrong 
> source-type found in query; must be 'field'.
>
> The line in the generated config file looks like this:
>
> sql_attr_multi = uint job_ids from query; SELECT 
> `job_applications`.`user_id` * 9 + 1 AS `id`, `job_applications`.`job_id` 
> AS `job_ids` FROM `job_applications`
>
> On Thursday, July 18, 2013 2:53:29 PM UTC-4, Daniel Vandersluis wrote:
>>
>> I will try out source: :query, thanks.
>>
>> In terms of last_job_application, it doesn't use an ORDER, there is a 
>> last_job_application_id foreign key on the model.
>>
>> On Thursday, July 18, 2013 2:16:03 PM UTC-4, Pat Allan wrote:
>>>
>>> Hi Daniel 
>>>
>>> As for this issue: the short answer is I'm not entirely sure the index 
>>> definition will always give you the results you're after. Although you do 
>>> seem to be using MySQL, and that can be a little carefree with its approach 
>>> to SQL results. 
>>>
>>> You're asking for the 'last job application' - but ORDER clauses can't 
>>> apply to joins, and so there's no guarantee that the job application in 
>>> question for the three attributes is the 'last' one. 
>>>
>>> That said, a potential work-around: change the new attribute definition 
>>> so it's in a separate query: 
>>>
>>>   has job_applications.job_id, as: :job_ids, facet: true, source: :query 
>>>
>>> That should remove the second join - and while it will involve a 
>>> separate query, it may perform much better that way. 
>>>
>>> Give it a shot. 
>>>
>>> -- 
>>> Pat 
>>>
>>> On 18/07/2013, at 12:59 AM, Daniel Vandersluis wrote: 
>>>
>>> > Index: 
>>> > 
>>> > define_index do 
>>> >     indexes 'TRIM(LOWER(first_name))', as: :first_name, :sortable => 
>>> true 
>>> >     indexes 'TRIM(LOWER(last_name))', as: :last_name, :sortable => 
>>> true 
>>> >     indexes email, :sortable => true 
>>> >     indexes resumes.document, :as => "document", :sortable => true 
>>> > 
>>> >     has :id, :as => :user_id 
>>> >     has client_id 
>>> >     has updated_at 
>>> >     has is_internal 
>>> > 
>>> >     has location.country_id, as: "country_id", facet: true 
>>> >     has location.state_id, as: "state_id" 
>>> >     has location.city_id, as: "city_id" 
>>> >     has 'RADIANS(locations.latitude)', as: :lat, type: :float 
>>> >     has 'RADIANS(locations.longitude)', as: :lng, type: :float 
>>> > 
>>> >     has last_job_application.source.source_type_id, as: 
>>> "source_type_id" 
>>> >     has last_job_application.source_id, as: "source_id", facet: true 
>>> >     has last_job_application.application_status_id, as: 
>>> "application_status_id" 
>>> > 
>>> >     has tags(:id), as: "tag_ids", facet: true 
>>> >     has profiles(:profile_type_id), as: "profile_type_ids", facet: 
>>> true 
>>> >     has job_applications(:job_id), as: "job_ids", facet: true # THIS 
>>> IS THE NEW ATTRIBUTE !!! 
>>> > 
>>> >     has candidate_answers(:answer_id), as: "candidate_answer_ids" 
>>> > 
>>> >     set_property latitude_attr: "lat" 
>>> >     set_property longitude_attr: "lng" 
>>> >     
>>> >     # By default due to our has many relationship with resumes 
>>> documents, MySQL only returns 
>>> >     # the first 1024 characters of the document. This enforces the 
>>> size during indexing. 
>>> >     set_property group_concat_max_len: 500000 
>>> >   end 
>>> > 
>>> > Models: 
>>> > 
>>> > class User < ActiveRecord::Base 
>>> >   belongs_to              :last_job_application, class_name: 
>>> 'JobApplication' 
>>> >   has_many                :resumes 
>>> >   has_many                :job_applications, dependent: :destroy, 
>>> autosave: true 
>>> >   has_many                :candidate_answers, through: 
>>> :job_applications 
>>> >   has_many                :candidate_profiles, dependent: :delete_all 
>>> >   has_many                :candidate_tags_candidate_users 
>>> >   has_many                :candidate_tags, through: 
>>> :candidate_tags_candidate_users, source: :candidate_tag, dependent: 
>>> :delete_all 
>>> > end 
>>> > 
>>> > class JobApplication < ActiveRecord::Base 
>>> >   belongs_to :user 
>>> >   belongs_to :application_status 
>>> >   belongs_to :job, counter_cache: true 
>>> >   belongs_to :source 
>>> > end 
>>> > 
>>> > Please let me know if there's any other model you'd like to see, 
>>> thanks! 
>>> > 
>>> > On Tuesday, July 16, 2013 8:49:04 PM UTC-4, Pat Allan wrote: 
>>> > Can you show us the index definition and the related associations? 
>>> > 
>>> > On 17/07/2013, at 5:02 AM, Daniel Vandersluis wrote: 
>>> > 
>>> > > I did an explain query, like I said, all the joins are indexed 
>>> (either as primary keys or actual indexes). In both cases (before and after 
>>> adding the attribute), it takes milliseconds to execute the query, but a 
>>> while before all data has been collected. 
>>> > > 
>>> > > 
>>> +----+-------------+-----------------------------+--------+--------------------------------------------+--------------------------------------------+---------+------------------------------------------------+-------+-------------+
>>>  
>>>
>>> > > | id | select_type | table                       | type   | 
>>> possible_keys                              | key                           
>>>              | key_len | ref                                            | 
>>> rows  | Extra       | 
>>> > > 
>>> +----+-------------+-----------------------------+--------+--------------------------------------------+--------------------------------------------+---------+------------------------------------------------+-------+-------------+
>>>  
>>>
>>> > > |  1 | SIMPLE      | users                       | index  | NULL     
>>>                                   | PRIMARY                                 
>>>    | 4       | NULL                                           | 23754 |     
>>>         | 
>>> > > |  1 | SIMPLE      | documents                   | ref    | 
>>> index_documents_on_user_id                 | index_documents_on_user_id     
>>>             | 5       | prod-2013-05-28.users.id                       
>>> |     1 |             | 
>>> > > |  1 | SIMPLE      | locations                   | eq_ref | PRIMARY 
>>>                                    | PRIMARY                               
>>>      | 4       | prod-2013-05-28.users.location_id              |     1 |   
>>>           | 
>>> > > |  1 | SIMPLE      | job_applications            | eq_ref | PRIMARY 
>>>                                    | PRIMARY                               
>>>      | 4       | prod-2013-05-28.users.last_job_application_id  |     1 |   
>>>           | 
>>> > > |  1 | SIMPLE      | sources                     | eq_ref | PRIMARY 
>>>                                    | PRIMARY                               
>>>      | 4       | prod-2013-05-28.job_applications.source_id     |     1 |   
>>>           | 
>>> > > |  1 | SIMPLE      | tags_users                  | ref    | 
>>> index_tags_on_user_id                      | index_tags_on_user_id         
>>>              | 5       | prod-2013-05-28.users.id                       
>>> |     1 |             | 
>>> > > |  1 | SIMPLE      | tags                        | eq_ref | PRIMARY 
>>>                                    | PRIMARY                               
>>>      | 4       | prod-2013-05-28.tags_users.tag_id              |     1 | 
>>> Using index | 
>>> > > |  1 | SIMPLE      | profiles                    | ref    | 
>>> index_profiles_on_user_id_and_profile_type | 
>>> index_profiles_on_user_id_and_profile_type | 5       | 
>>> prod-2013-05-28.users.id                       |     1 | Using index | 
>>> > > |  1 | SIMPLE      | job_applications_users      | ref    | 
>>> index_job_applications_on_user_id          | 
>>> index_job_applications_on_user_id          | 5       | 
>>> prod-2013-05-28.users.id                       |     1 |             | 
>>> > > |  1 | SIMPLE      | job_applications_users_join | ref    | 
>>> index_job_applications_on_user_id          | 
>>> index_job_applications_on_user_id          | 5       | 
>>> prod-2013-05-28.users.id                       |     1 | Using index | 
>>> > > |  1 | SIMPLE      | candidate_answers           | ref    | 
>>> uidx_on_candidate_answers                  | uidx_on_candidate_answers     
>>>              | 5       | prod-2013-05-28.job_applications_users_join.id|    
>>>  8 | Using index | 
>>> > > 
>>> +----+-------------+-----------------------------+--------+--------------------------------------------+--------------------------------------------+---------+------------------------------------------------+-------+-------------+
>>>  
>>>
>>> > > 
>>> > > The join is exactly the same except for the alias: 
>>> > > 
>>> > > LEFT OUTER JOIN `job_applications` `job_applications_users` 
>>> > >     ON `job_applications_users`.`user_id` = `users`.`id` 
>>> > > LEFT OUTER JOIN `job_applications` `job_applications_users_join` 
>>> > >     ON `job_applications_users_join`.`user_id` = `users`.`id` 
>>> > > 
>>> > > 
>>> > > 
>>> > > On Tuesday, July 16, 2013 2:12:02 PM UTC-4, Pat Allan wrote: 
>>> > > Hi Daniel 
>>> > > Any slowness in indexing is going to be related to the generated SQL 
>>> query - when you say the query itself seems fast, how are you comparing it? 
>>> I'd recommend running it through EXPLAIN to get some idea of what could be 
>>> causing it to be slow. 
>>> > > 
>>> > > Is there any way in which the join is different beyond the aliased 
>>> name? 
>>> > > 
>>> > > As for upgrading - I'm not sure if there's been any changes related 
>>> to query generation, but using the latest releases is always recommended 
>>> (in this case, 2.1.0). 
>>> > > 
>>> > > Cheers 
>>> > > 
>>> > > -- 
>>> > > Pat 
>>> > > 
>>> > > 
>>> > > On 17/07/2013, at 3:09 AM, Daniel Vandersluis wrote: 
>>> > > 
>>> > > > Is there any reason an index would suddenly take 3x as long to 
>>> index after adding an extra has_many attribute to the index definition? The 
>>> query itself is completely indexed, and takes about 1.5ms to run (plus data 
>>> collection time, there are about 200k records in the main table that is 
>>> being indexed, plus a bunch of attributes - sphinx reports it as a 466MB 
>>> index). Prior to adding the extra attribute, indexing took about 9 minutes, 
>>> and now it takes 29. The new attribute averages just over 1 record per row, 
>>> with a maximum of 78. 
>>> > > > 
>>> > > > Possibly related is that adding the new attribute causes the query 
>>> TS generates to add a duplicate join (with a different alias) to a join 
>>> that's added by a different attribute (however as mentioned the query 
>>> itself seems to be fast). 
>>> > > > 
>>> > > > I'm using ThinkingSphinx 2.0.11 currently - would updating to TS3 
>>> help? 
>>> > > > 
>>> > > > -- 
>>> > > > You received this message because you are subscribed to the Google 
>>> Groups "Thinking Sphinx" group. 
>>> > > > To unsubscribe from this group and stop receiving emails from it, 
>>> send an email to [email protected]. 
>>> > > > To post to this group, send email to [email protected]. 
>>> > > > Visit this group at http://groups.google.com/group/thinking-sphinx. 
>>>
>>> > > > For more options, visit https://groups.google.com/groups/opt_out. 
>>> > > >   
>>> > > >   
>>> > > 
>>> > > 
>>> > > 
>>> > > 
>>> > > 
>>> > > 
>>> > > 
>>> > > 
>>> > > 
>>> > > 
>>> > > 
>>> > > 
>>> > > -- 
>>> > > You received this message because you are subscribed to the Google 
>>> Groups "Thinking Sphinx" group. 
>>> > > To unsubscribe from this group and stop receiving emails from it, 
>>> send an email to [email protected]. 
>>> > > To post to this group, send email to [email protected]. 
>>> > > Visit this group at http://groups.google.com/group/thinking-sphinx. 
>>> > > For more options, visit https://groups.google.com/groups/opt_out. 
>>> > >   
>>> > >   
>>> > 
>>> > 
>>> > 
>>> > -- 
>>> > You received this message because you are subscribed to the Google 
>>> Groups "Thinking Sphinx" group. 
>>> > To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected]. 
>>> > To post to this group, send email to [email protected]. 
>>> > Visit this group at http://groups.google.com/group/thinking-sphinx. 
>>> > For more options, visit https://groups.google.com/groups/opt_out. 
>>> >   
>>> >   
>>>
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/thinking-sphinx.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [ts] Suddenly slow index

Reply via email to