Hi Russ It's actually a limitation of Sphinx, not Thinking Sphinx, for document ids to be integers. Using CRC'd versions should be possible, but you could have collisions - CRC32 has no guarantee of being unique.
The reason it's slow is probably due to the way Sphinx pages indexing queries... You can change this: http://freelancing-god.github.com/ts/en/common_issues.html#slow_indexing With the multiplication, you may need to compile Sphinx with 64bit document id support, but I'd certainly be hesitant to go down this path - as soon as you hit a collision in the CRC'd values, it becomes a waste of time. Cheers -- Pat On 20/02/2011, at 1:53 AM, Russell Garner wrote: > Hi, > > I'm using Sphinx with a crawler. As the crawler does its business, it > replaces all articles with updated versions. Consequently, all articles.id > change and as the crawler's working, the number of results search returns > slowly dwindles until a rebuild can be done at the end. I could delta, but > didn't want to as everything would end up in the delta index and it seems the > TS docs advise against this. > > I found > http://stackoverflow.com/questions/965656/specifying-different-column-as-doc-id-using-thinking-sphinx > and trawled through the commits until I found set_sphinx_primary_key. I > already have a unique string column egms_id, so tried it, but it looks like > TS wants an integer field as sphinx_document_id performs a multiplication on > it. Ok, I thought, I'll just .to_crc32 the egms_id, but this looks like it's > going to produce integers which are way too large - in any case, it hangs the > indexer :) > > Is there any way for me to just use the egms_id string here? Or can I use a > CRC32 by perhaps patching Sphinx to accept set_primary_key :egms_id_crc32, > :behaves_like_hash => true and not perform the multiplication when it sees > this option? Or have I missed the purpose of the multiplication (it seems > like it's to guarantee uniqueness, and I can already do that with a CRC of a > unique string)? > > Apologies if this comes across confused and tired - that's because I am ;) > > Cheers, > > Russ > > -- > You received this message because you are subscribed to the Google Groups > "Thinking Sphinx" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/thinking-sphinx?hl=en. -- You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/thinking-sphinx?hl=en.
