Further to this: I guess I was wrong about 2.0.11 using ordinal attribute types 
instead of string attribute types - that change must have come in earlier.

sql_attr_string is a standard string attribute (not ordinal), and 
sql_field_string stores the field value as a string attribute of the same name 
*as well as* the field. The latter removes the need for the _sort suffix you'll 
spot in sortable attributes in 2.x releases.

I wouldn't expect there to be any difference between these two in terms of file 
size though. But just to compare apples with apples - you had user_core file 
sizes previously, but now it's candidate_user_core. Are there other large and 
unnecessary string attributes in the CandidateUser index?

-- 
Pat

On 19/07/2013, at 4:25 AM, Daniel Vandersluis wrote:

> Yeah, I was using 2.0.11 previously. There does not seem to be any difference 
> with removing sortable: true from the index definition (for 
> resumes.document), except that this line disappears from the generated 
> configuration file: sql_field_string = document. This seems to at least let 
> indexer complete properly, but the index size is still huge:
> 
> indexing index 'candidate_user_core'...
> collected 199704 docs, 8478.8 MB
> 
> It also takes a long time to go through the sorting "Mhits" step now. I see 
> how TS2 added sql_attr_string for the sort columns whereas TS3 adds 
> sql_field_string - that's what you're talking about right? Is there any way 
> to either a) get around this issue, or b) force TS to use the ordinal type? 
> (everything should still work that way, correct?)
> 
> Here's the options I set in thinking_sphinx.yml:
> 
> development:
>   address: localhost
>   version: 2.0.8-release
>   mem_limit: 256M  
>   
>   enable_star: true
>   min_prefix_len: 2
>   blend_chars: "@, -, &"
>   html_strip: true
>   max_matches: 25000
> 
> Is there any way I can speed this up / reduce the size?
> 
> On Thursday, July 18, 2013 1:59:11 PM UTC-4, Pat Allan wrote:
> I think with 2.0.11 (what you were using previously, right?) TS uses the 
> ordinal attribute type, which stores an integer for each string (calculated 
> by grabbing all known values, putting them in order, returning the index of 
> each value). 
> 
> With TS v3 (and later 2.x releases if I remember correctly) it'll use the 
> native string attribute type (a relatively recent addition to Sphinx), which 
> means Sphinx is storing the real string value - which is much better if 
> you're sorting across more than one index (say, if you're using deltas, or 
> searching across multiple models). In this case, it would mean Sphinx is now 
> storing potentially a ton of data, instead of a 32-bit integer per record. 
> 
> -- 
> Pat 
> 
> On 19/07/2013, at 3:53 AM, Daniel Vandersluis wrote: 
> 
> > Thanks for the response, Pat - yes, it's the same index as the other 
> > thread. Good point about sorting resumes, that shouldn't be there. However, 
> > why would that make such a difference between TS2 and TS3 (see my other 
> > post which I added at the same time as your response)? 
> > 
> > I will try removing the sortable on resumes and see what difference it 
> > makes! 
> > 
> > On Thursday, July 18, 2013 1:49:13 PM UTC-4, Pat Allan wrote: 
> > Hi Daniel 
> > 
> > If this is the same index as in the other thread, I'm guessing it's the 
> > fact that you've got resumes.document sortable. A record with many resumes 
> > and/or large document values could end up with massive values for the 
> > underlying string attribute (that you'd sort by) - are you actually sorting 
> > by this? Generally I'd be surprised if there's much point sorting by large 
> > amounts of text. 
> > 
> > -- 
> > Pat 
> > 
> > On 19/07/2013, at 3:09 AM, Daniel Vandersluis wrote: 
> > 
> > > Is there any reason that an index would grow in size when upgrading from 
> > > thinkingsphinx 2 to 3? The only differences in the configuration file is 
> > > changing port to mysql41, and changing version to 2.0.8-release, but an 
> > > index that used to be around 500MB is now resulting in this error: 
> > > 
> > > ERROR: index 'user_core': too many string attributes (current index 
> > > format allows up to 4 GB). 
> > > 
> > > Anyone have any idea why this would be? 
> > > 
> > > -- 
> > > You received this message because you are subscribed to the Google Groups 
> > > "Thinking Sphinx" group. 
> > > To unsubscribe from this group and stop receiving emails from it, send an 
> > > email to [email protected]. 
> > > To post to this group, send email to [email protected]. 
> > > Visit this group at http://groups.google.com/group/thinking-sphinx. 
> > > For more options, visit https://groups.google.com/groups/opt_out. 
> > >   
> > >   
> > 
> > 
> > 
> > -- 
> > You received this message because you are subscribed to the Google Groups 
> > "Thinking Sphinx" group. 
> > To unsubscribe from this group and stop receiving emails from it, send an 
> > email to [email protected]. 
> > To post to this group, send email to [email protected]. 
> > Visit this group at http://groups.google.com/group/thinking-sphinx. 
> > For more options, visit https://groups.google.com/groups/opt_out. 
> >   
> >   
> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Thinking Sphinx" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/thinking-sphinx.
> For more options, visit https://groups.google.com/groups/opt_out.
>  
>  


-- 
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/thinking-sphinx.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to