There must be something weird going on here - when I added job_ids to the 
index (as per the other thread) with the latest master from github, the 
index size grows even more, up to 11GB now... 

On Saturday, July 20, 2013 11:43:38 PM UTC-4, Pat Allan wrote:
>
> On 21/07/2013, at 1:40 PM, Daniel Vandersluis wrote: 
>
> > Sorry, it's the same index, I was just simplifying the names for the 
> purpose of this post and missed one. Sorry for the confusion :) 
>
> Ah rightio. The change in size is pretty crazy then! 
>
> > If the change was made prior to 2.0.11, wouldn't that mean that the 
> indexes previously would have been huge too? 
>
> I would have thought so, yes. 
>
> > I'm not sure I understand what you mean about sql_field_string - do 
> sql_field_strings take up significantly more space than sql_attr_strings 
> do? 
>
> There's no reason for them to at all. I don't know the dark arts behind 
> the Sphinx source code though (it's C and C++, neither of which I'm 
> confident with). 
>
> -- 
> Pat 
>
> > 
> >> On Saturday, July 20, 2013 11:18:40 PM UTC-4, Pat Allan wrote: 
> >> Further to this: I guess I was wrong about 2.0.11 using ordinal 
> attribute types instead of string attribute types - that change must have 
> come in earlier. 
> >> 
> >> sql_attr_string is a standard string attribute (not ordinal), and 
> sql_field_string stores the field value as a string attribute of the same 
> name *as well as* the field. The latter removes the need for the _sort 
> suffix you'll spot in sortable attributes in 2.x releases. 
> >> 
> >> I wouldn't expect there to be any difference between these two in terms 
> of file size though. But just to compare apples with apples - you had 
> user_core file sizes previously, but now it's candidate_user_core. Are 
> there other large and unnecessary string attributes in the CandidateUser 
> index? 
> >> 
> >> -- 
> >> Pat 
> >> 
> >> On 19/07/2013, at 4:25 AM, Daniel Vandersluis wrote: 
> >> 
> >> > Yeah, I was using 2.0.11 previously. There does not seem to be any 
> difference with removing sortable: true from the index definition (for 
> resumes.document), except that this line disappears from the generated 
> configuration file: sql_field_string = document. This seems to at least let 
> indexer complete properly, but the index size is still huge: 
> >> > 
> >> > indexing index 'candidate_user_core'... 
> >> > collected 199704 docs, 8478.8 MB 
> >> > 
> >> > It also takes a long time to go through the sorting "Mhits" step now. 
> I see how TS2 added sql_attr_string for the sort columns whereas TS3 adds 
> sql_field_string - that's what you're talking about right? Is there any way 
> to either a) get around this issue, or b) force TS to use the ordinal type? 
> (everything should still work that way, correct?) 
> >> > 
> >> > Here's the options I set in thinking_sphinx.yml: 
> >> > 
> >> > development: 
> >> >   address: localhost 
> >> >   version: 2.0.8-release 
> >> >   mem_limit: 256M   
> >> >   
> >> >   enable_star: true 
> >> >   min_prefix_len: 2 
> >> >   blend_chars: "@, -, &" 
> >> >   html_strip: true 
> >> >   max_matches: 25000 
> >> > 
> >> > Is there any way I can speed this up / reduce the size? 
> >> > 
> >> > On Thursday, July 18, 2013 1:59:11 PM UTC-4, Pat Allan wrote: 
> >> > I think with 2.0.11 (what you were using previously, right?) TS uses 
> the ordinal attribute type, which stores an integer for each string 
> (calculated by grabbing all known values, putting them in order, returning 
> the index of each value). 
> >> > 
> >> > With TS v3 (and later 2.x releases if I remember correctly) it'll use 
> the native string attribute type (a relatively recent addition to Sphinx), 
> which means Sphinx is storing the real string value - which is much better 
> if you're sorting across more than one index (say, if you're using deltas, 
> or searching across multiple models). In this case, it would mean Sphinx is 
> now storing potentially a ton of data, instead of a 32-bit integer per 
> record. 
> >> > 
> >> > -- 
> >> > Pat 
> >> > 
> >> > On 19/07/2013, at 3:53 AM, Daniel Vandersluis wrote: 
> >> > 
> >> > > Thanks for the response, Pat - yes, it's the same index as the 
> other thread. Good point about sorting resumes, that shouldn't be there. 
> However, why would that make such a difference between TS2 and TS3 (see my 
> other post which I added at the same time as your response)? 
> >> > > 
> >> > > I will try removing the sortable on resumes and see what difference 
> it makes! 
> >> > > 
> >> > > On Thursday, July 18, 2013 1:49:13 PM UTC-4, Pat Allan wrote: 
> >> > > Hi Daniel 
> >> > > 
> >> > > If this is the same index as in the other thread, I'm guessing it's 
> the fact that you've got resumes.document sortable. A record with many 
> resumes and/or large document values could end up with massive values for 
> the underlying string attribute (that you'd sort by) - are you actually 
> sorting by this? Generally I'd be surprised if there's much point sorting 
> by large amounts of text. 
> >> > > 
> >> > > -- 
> >> > > Pat 
> >> > > 
> >> > > On 19/07/2013, at 3:09 AM, Daniel Vandersluis wrote: 
> >> > > 
> >> > > > Is there any reason that an index would grow in size when 
> upgrading from thinkingsphinx 2 to 3? The only differences in the 
> configuration file is changing port to mysql41, and changing version to 
> 2.0.8-release, but an index that used to be around 500MB is now resulting 
> in this error: 
> >> > > > 
> >> > > > ERROR: index 'user_core': too many string attributes (current 
> index format allows up to 4 GB). 
> >> > > > 
> >> > > > Anyone have any idea why this would be? 
> >> > > > 
> >> > > > -- 
> >> > > > You received this message because you are subscribed to the 
> Google Groups "Thinking Sphinx" group. 
> >> > > > To unsubscribe from this group and stop receiving emails from it, 
> send an email to [email protected]. 
> >> > > > To post to this group, send email to [email protected]. 
>
> >> > > > Visit this group at 
> http://groups.google.com/group/thinking-sphinx. 
> >> > > > For more options, visit https://groups.google.com/groups/opt_out. 
>
> >> > > >   
> >> > > >   
> >> > > 
> >> > > 
> >> > > 
> >> > > -- 
> >> > > You received this message because you are subscribed to the Google 
> Groups "Thinking Sphinx" group. 
> >> > > To unsubscribe from this group and stop receiving emails from it, 
> send an email to [email protected]. 
> >> > > To post to this group, send email to [email protected]. 
> >> > > Visit this group at http://groups.google.com/group/thinking-sphinx. 
>
> >> > > For more options, visit https://groups.google.com/groups/opt_out. 
> >> > >   
> >> > >   
> >> > 
> >> > 
> >> > 
> >> > -- 
> >> > You received this message because you are subscribed to the Google 
> Groups "Thinking Sphinx" group. 
> >> > To unsubscribe from this group and stop receiving emails from it, 
> send an email to [email protected]. 
> >> > To post to this group, send email to [email protected]. 
> >> > Visit this group at http://groups.google.com/group/thinking-sphinx. 
> >> > For more options, visit https://groups.google.com/groups/opt_out. 
> >> >   
> >> >   
> >> 
> >> 
> >> 
> >> -- 
> >> You received this message because you are subscribed to the Google 
> Groups "Thinking Sphinx" group. 
> >> To unsubscribe from this group and stop receiving emails from it, send 
> an email to [email protected] <javascript:>. 
> >> To post to this group, send email to 
> >> [email protected]<javascript:>. 
>
> >> Visit this group at http://groups.google.com/group/thinking-sphinx. 
> >> For more options, visit https://groups.google.com/groups/opt_out. 
> >>   
> >>   
> > 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/thinking-sphinx.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to