Does it make any sense for the size to not change even with enable_star and min_prefix_len being disabled?
On Monday, July 22, 2013 11:20:03 AM UTC-4, Daniel Vandersluis wrote: > > There must be something weird going on here - when I added job_ids to the > index (as per the other thread) with the latest master from github, the > index size grows even more, up to 11GB now... > > On Saturday, July 20, 2013 11:43:38 PM UTC-4, Pat Allan wrote: >> >> On 21/07/2013, at 1:40 PM, Daniel Vandersluis wrote: >> >> > Sorry, it's the same index, I was just simplifying the names for the >> purpose of this post and missed one. Sorry for the confusion :) >> >> Ah rightio. The change in size is pretty crazy then! >> >> > If the change was made prior to 2.0.11, wouldn't that mean that the >> indexes previously would have been huge too? >> >> I would have thought so, yes. >> >> > I'm not sure I understand what you mean about sql_field_string - do >> sql_field_strings take up significantly more space than sql_attr_strings >> do? >> >> There's no reason for them to at all. I don't know the dark arts behind >> the Sphinx source code though (it's C and C++, neither of which I'm >> confident with). >> >> -- >> Pat >> >> > >> >> On Saturday, July 20, 2013 11:18:40 PM UTC-4, Pat Allan wrote: >> >> Further to this: I guess I was wrong about 2.0.11 using ordinal >> attribute types instead of string attribute types - that change must have >> come in earlier. >> >> >> >> sql_attr_string is a standard string attribute (not ordinal), and >> sql_field_string stores the field value as a string attribute of the same >> name *as well as* the field. The latter removes the need for the _sort >> suffix you'll spot in sortable attributes in 2.x releases. >> >> >> >> I wouldn't expect there to be any difference between these two in >> terms of file size though. But just to compare apples with apples - you had >> user_core file sizes previously, but now it's candidate_user_core. Are >> there other large and unnecessary string attributes in the CandidateUser >> index? >> >> >> >> -- >> >> Pat >> >> >> >> On 19/07/2013, at 4:25 AM, Daniel Vandersluis wrote: >> >> >> >> > Yeah, I was using 2.0.11 previously. There does not seem to be any >> difference with removing sortable: true from the index definition (for >> resumes.document), except that this line disappears from the generated >> configuration file: sql_field_string = document. This seems to at least let >> indexer complete properly, but the index size is still huge: >> >> > >> >> > indexing index 'candidate_user_core'... >> >> > collected 199704 docs, 8478.8 MB >> >> > >> >> > It also takes a long time to go through the sorting "Mhits" step >> now. I see how TS2 added sql_attr_string for the sort columns whereas TS3 >> adds sql_field_string - that's what you're talking about right? Is there >> any way to either a) get around this issue, or b) force TS to use the >> ordinal type? (everything should still work that way, correct?) >> >> > >> >> > Here's the options I set in thinking_sphinx.yml: >> >> > >> >> > development: >> >> > address: localhost >> >> > version: 2.0.8-release >> >> > mem_limit: 256M >> >> > >> >> > enable_star: true >> >> > min_prefix_len: 2 >> >> > blend_chars: "@, -, &" >> >> > html_strip: true >> >> > max_matches: 25000 >> >> > >> >> > Is there any way I can speed this up / reduce the size? >> >> > >> >> > On Thursday, July 18, 2013 1:59:11 PM UTC-4, Pat Allan wrote: >> >> > I think with 2.0.11 (what you were using previously, right?) TS uses >> the ordinal attribute type, which stores an integer for each string >> (calculated by grabbing all known values, putting them in order, returning >> the index of each value). >> >> > >> >> > With TS v3 (and later 2.x releases if I remember correctly) it'll >> use the native string attribute type (a relatively recent addition to >> Sphinx), which means Sphinx is storing the real string value - which is >> much better if you're sorting across more than one index (say, if you're >> using deltas, or searching across multiple models). In this case, it would >> mean Sphinx is now storing potentially a ton of data, instead of a 32-bit >> integer per record. >> >> > >> >> > -- >> >> > Pat >> >> > >> >> > On 19/07/2013, at 3:53 AM, Daniel Vandersluis wrote: >> >> > >> >> > > Thanks for the response, Pat - yes, it's the same index as the >> other thread. Good point about sorting resumes, that shouldn't be there. >> However, why would that make such a difference between TS2 and TS3 (see my >> other post which I added at the same time as your response)? >> >> > > >> >> > > I will try removing the sortable on resumes and see what >> difference it makes! >> >> > > >> >> > > On Thursday, July 18, 2013 1:49:13 PM UTC-4, Pat Allan wrote: >> >> > > Hi Daniel >> >> > > >> >> > > If this is the same index as in the other thread, I'm guessing >> it's the fact that you've got resumes.document sortable. A record with many >> resumes and/or large document values could end up with massive values for >> the underlying string attribute (that you'd sort by) - are you actually >> sorting by this? Generally I'd be surprised if there's much point sorting >> by large amounts of text. >> >> > > >> >> > > -- >> >> > > Pat >> >> > > >> >> > > On 19/07/2013, at 3:09 AM, Daniel Vandersluis wrote: >> >> > > >> >> > > > Is there any reason that an index would grow in size when >> upgrading from thinkingsphinx 2 to 3? The only differences in the >> configuration file is changing port to mysql41, and changing version to >> 2.0.8-release, but an index that used to be around 500MB is now resulting >> in this error: >> >> > > > >> >> > > > ERROR: index 'user_core': too many string attributes (current >> index format allows up to 4 GB). >> >> > > > >> >> > > > Anyone have any idea why this would be? >> >> > > > >> >> > > > -- >> >> > > > You received this message because you are subscribed to the >> Google Groups "Thinking Sphinx" group. >> >> > > > To unsubscribe from this group and stop receiving emails from >> it, send an email to [email protected]. >> >> > > > To post to this group, send email to [email protected]. >> >> >> > > > Visit this group at >> http://groups.google.com/group/thinking-sphinx. >> >> > > > For more options, visit https://groups.google.com/groups/opt_out. >> >> >> > > > >> >> > > > >> >> > > >> >> > > >> >> > > >> >> > > -- >> >> > > You received this message because you are subscribed to the Google >> Groups "Thinking Sphinx" group. >> >> > > To unsubscribe from this group and stop receiving emails from it, >> send an email to [email protected]. >> >> > > To post to this group, send email to [email protected]. >> >> > > Visit this group at http://groups.google.com/group/thinking-sphinx. >> >> >> > > For more options, visit https://groups.google.com/groups/opt_out. >> >> > > >> >> > > >> >> > >> >> > >> >> > >> >> > -- >> >> > You received this message because you are subscribed to the Google >> Groups "Thinking Sphinx" group. >> >> > To unsubscribe from this group and stop receiving emails from it, >> send an email to [email protected]. >> >> > To post to this group, send email to [email protected]. >> >> > Visit this group at http://groups.google.com/group/thinking-sphinx. >> >> > For more options, visit https://groups.google.com/groups/opt_out. >> >> > >> >> > >> >> >> >> >> >> >> >> -- >> >> You received this message because you are subscribed to the Google >> Groups "Thinking Sphinx" group. >> >> To unsubscribe from this group and stop receiving emails from it, send >> an email to [email protected]. >> >> To post to this group, send email to [email protected]. >> >> Visit this group at http://groups.google.com/group/thinking-sphinx. >> >> For more options, visit https://groups.google.com/groups/opt_out. >> >> >> >> >> > >> >> -- You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/thinking-sphinx. For more options, visit https://groups.google.com/groups/opt_out.
