It does make sense, albeit it's not particularly obvious.

It's down to how database joins work: multiple joins means data is repeated 
when concatenated, but the :source => :query option can mean that the joins 
aren't required in the main query, and so the duplicates are avoided (and 
speeds up indexing times). It's great to know that it's made such a clear 
difference - I'll be sure to recommend this approach more in the future.

Cheers

-- 
Pat

On 03/08/2013, at 3:18 AM, Daniel Vandersluis wrote:

> Does this make any sense: I added source: :query to each of my has_many 
> attributes, and suddenly indexing is fast again and back down to < 500MB... 
> 
> On Monday, July 22, 2013 8:04:45 PM UTC-4, Pat Allan wrote:
> That is surprising - removing min_prefix_len should certainly drop index file 
> sizes down. 
> 
> It's worth noting the fix I just mentioned in the other thread should remove 
> the extra join, and this should reduce the amount of data your database 
> passes through to Sphinx. So: it may help return things to what you're 
> expecting. Give it a shot, let me know. 
> 
> -- 
> Pat 
> 
> On 23/07/2013, at 6:12 AM, Daniel Vandersluis wrote: 
> 
> > Does it make any sense for the size to not change even with enable_star and 
> > min_prefix_len being disabled? 
> > 
> > On Monday, July 22, 2013 11:20:03 AM UTC-4, Daniel Vandersluis wrote: 
> > There must be something weird going on here - when I added job_ids to the 
> > index (as per the other thread) with the latest master from github, the 
> > index size grows even more, up to 11GB now... 
> > 
> > On Saturday, July 20, 2013 11:43:38 PM UTC-4, Pat Allan wrote: 
> > On 21/07/2013, at 1:40 PM, Daniel Vandersluis wrote: 
> > 
> > > Sorry, it's the same index, I was just simplifying the names for the 
> > > purpose of this post and missed one. Sorry for the confusion :) 
> > 
> > Ah rightio. The change in size is pretty crazy then! 
> > 
> > > If the change was made prior to 2.0.11, wouldn't that mean that the 
> > > indexes previously would have been huge too? 
> > 
> > I would have thought so, yes. 
> > 
> > > I'm not sure I understand what you mean about sql_field_string - do 
> > > sql_field_strings take up significantly more space than sql_attr_strings 
> > > do? 
> > 
> > There's no reason for them to at all. I don't know the dark arts behind the 
> > Sphinx source code though (it's C and C++, neither of which I'm confident 
> > with). 
> > 
> > -- 
> > Pat 
> > 
> > > 
> > >> On Saturday, July 20, 2013 11:18:40 PM UTC-4, Pat Allan wrote: 
> > >> Further to this: I guess I was wrong about 2.0.11 using ordinal 
> > >> attribute types instead of string attribute types - that change must 
> > >> have come in earlier. 
> > >> 
> > >> sql_attr_string is a standard string attribute (not ordinal), and 
> > >> sql_field_string stores the field value as a string attribute of the 
> > >> same name *as well as* the field. The latter removes the need for the 
> > >> _sort suffix you'll spot in sortable attributes in 2.x releases. 
> > >> 
> > >> I wouldn't expect there to be any difference between these two in terms 
> > >> of file size though. But just to compare apples with apples - you had 
> > >> user_core file sizes previously, but now it's candidate_user_core. Are 
> > >> there other large and unnecessary string attributes in the CandidateUser 
> > >> index? 
> > >> 
> > >> -- 
> > >> Pat 
> > >> 
> > >> On 19/07/2013, at 4:25 AM, Daniel Vandersluis wrote: 
> > >> 
> > >> > Yeah, I was using 2.0.11 previously. There does not seem to be any 
> > >> > difference with removing sortable: true from the index definition (for 
> > >> > resumes.document), except that this line disappears from the generated 
> > >> > configuration file: sql_field_string = document. This seems to at 
> > >> > least let indexer complete properly, but the index size is still huge: 
> > >> > 
> > >> > indexing index 'candidate_user_core'... 
> > >> > collected 199704 docs, 8478.8 MB 
> > >> > 
> > >> > It also takes a long time to go through the sorting "Mhits" step now. 
> > >> > I see how TS2 added sql_attr_string for the sort columns whereas TS3 
> > >> > adds sql_field_string - that's what you're talking about right? Is 
> > >> > there any way to either a) get around this issue, or b) force TS to 
> > >> > use the ordinal type? (everything should still work that way, 
> > >> > correct?) 
> > >> > 
> > >> > Here's the options I set in thinking_sphinx.yml: 
> > >> > 
> > >> > development: 
> > >> >   address: localhost 
> > >> >   version: 2.0.8-release 
> > >> >   mem_limit: 256M   
> > >> >   
> > >> >   enable_star: true 
> > >> >   min_prefix_len: 2 
> > >> >   blend_chars: "@, -, &" 
> > >> >   html_strip: true 
> > >> >   max_matches: 25000 
> > >> > 
> > >> > Is there any way I can speed this up / reduce the size? 
> > >> > 
> > >> > On Thursday, July 18, 2013 1:59:11 PM UTC-4, Pat Allan wrote: 
> > >> > I think with 2.0.11 (what you were using previously, right?) TS uses 
> > >> > the ordinal attribute type, which stores an integer for each string 
> > >> > (calculated by grabbing all known values, putting them in order, 
> > >> > returning the index of each value). 
> > >> > 
> > >> > With TS v3 (and later 2.x releases if I remember correctly) it'll use 
> > >> > the native string attribute type (a relatively recent addition to 
> > >> > Sphinx), which means Sphinx is storing the real string value - which 
> > >> > is much better if you're sorting across more than one index (say, if 
> > >> > you're using deltas, or searching across multiple models). In this 
> > >> > case, it would mean Sphinx is now storing potentially a ton of data, 
> > >> > instead of a 32-bit integer per record. 
> > >> > 
> > >> > -- 
> > >> > Pat 
> > >> > 
> > >> > On 19/07/2013, at 3:53 AM, Daniel Vandersluis wrote: 
> > >> > 
> > >> > > Thanks for the response, Pat - yes, it's the same index as the other 
> > >> > > thread. Good point about sorting resumes, that shouldn't be there. 
> > >> > > However, why would that make such a difference between TS2 and TS3 
> > >> > > (see my other post which I added at the same time as your response)? 
> > >> > > 
> > >> > > I will try removing the sortable on resumes and see what difference 
> > >> > > it makes! 
> > >> > > 
> > >> > > On Thursday, July 18, 2013 1:49:13 PM UTC-4, Pat Allan wrote: 
> > >> > > Hi Daniel 
> > >> > > 
> > >> > > If this is the same index as in the other thread, I'm guessing it's 
> > >> > > the fact that you've got resumes.document sortable. A record with 
> > >> > > many resumes and/or large document values could end up with massive 
> > >> > > values for the underlying string attribute (that you'd sort by) - 
> > >> > > are you actually sorting by this? Generally I'd be surprised if 
> > >> > > there's much point sorting by large amounts of text. 
> > >> > > 
> > >> > > -- 
> > >> > > Pat 
> > >> > > 
> > >> > > On 19/07/2013, at 3:09 AM, Daniel Vandersluis wrote: 
> > >> > > 
> > >> > > > Is there any reason that an index would grow in size when 
> > >> > > > upgrading from thinkingsphinx 2 to 3? The only differences in the 
> > >> > > > configuration file is changing port to mysql41, and changing 
> > >> > > > version to 2.0.8-release, but an index that used to be around 
> > >> > > > 500MB is now resulting in this error: 
> > >> > > > 
> > >> > > > ERROR: index 'user_core': too many string attributes (current 
> > >> > > > index format allows up to 4 GB). 
> > >> > > > 
> > >> > > > Anyone have any idea why this would be? 
> > >> > > > 
> > >> > > > -- 
> > >> > > > You received this message because you are subscribed to the Google 
> > >> > > > Groups "Thinking Sphinx" group. 
> > >> > > > To unsubscribe from this group and stop receiving emails from it, 
> > >> > > > send an email to [email protected]. 
> > >> > > > To post to this group, send email to [email protected]. 
> > >> > > > Visit this group at 
> > >> > > > http://groups.google.com/group/thinking-sphinx. 
> > >> > > > For more options, visit https://groups.google.com/groups/opt_out. 
> > >> > > >   
> > >> > > >   
> > >> > > 
> > >> > > 
> > >> > > 
> > >> > > -- 
> > >> > > You received this message because you are subscribed to the Google 
> > >> > > Groups "Thinking Sphinx" group. 
> > >> > > To unsubscribe from this group and stop receiving emails from it, 
> > >> > > send an email to [email protected]. 
> > >> > > To post to this group, send email to [email protected]. 
> > >> > > Visit this group at http://groups.google.com/group/thinking-sphinx. 
> > >> > > For more options, visit https://groups.google.com/groups/opt_out. 
> > >> > >   
> > >> > >   
> > >> > 
> > >> > 
> > >> > 
> > >> > -- 
> > >> > You received this message because you are subscribed to the Google 
> > >> > Groups "Thinking Sphinx" group. 
> > >> > To unsubscribe from this group and stop receiving emails from it, send 
> > >> > an email to [email protected]. 
> > >> > To post to this group, send email to [email protected]. 
> > >> > Visit this group at http://groups.google.com/group/thinking-sphinx. 
> > >> > For more options, visit https://groups.google.com/groups/opt_out. 
> > >> >   
> > >> >   
> > >> 
> > >> 
> > >> 
> > >> -- 
> > >> You received this message because you are subscribed to the Google 
> > >> Groups "Thinking Sphinx" group. 
> > >> To unsubscribe from this group and stop receiving emails from it, send 
> > >> an email to [email protected]. 
> > >> To post to this group, send email to [email protected]. 
> > >> Visit this group at http://groups.google.com/group/thinking-sphinx. 
> > >> For more options, visit https://groups.google.com/groups/opt_out. 
> > >>   
> > >>   
> > > 
> > 
> > 
> > -- 
> > You received this message because you are subscribed to the Google Groups 
> > "Thinking Sphinx" group. 
> > To unsubscribe from this group and stop receiving emails from it, send an 
> > email to [email protected]. 
> > To post to this group, send email to [email protected]. 
> > Visit this group at http://groups.google.com/group/thinking-sphinx. 
> > For more options, visit https://groups.google.com/groups/opt_out. 
> >   
> >   
> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Thinking Sphinx" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/thinking-sphinx.
> For more options, visit https://groups.google.com/groups/opt_out.
>  
>  

-- 
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/thinking-sphinx.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to