Hi Alex

A quick work-around could be to add a custom field to the mix, to ensure 
there's always data:

  indexes "'account'", :as => :placeholder_field

Does that help?

-- 
Pat

On 02/11/2011, at 10:03 PM, Alex Kahn wrote:

> Hi Pat,
> Thanks for your response. We have 7 other models indexed by Sphinx.
> They are all much smaller tables, the largest containing around 5,000
> records. Their indices are far more complex, though.
> There is no missing data for email addresses. However, there are many
> account_names records that have contain NULL or "" values for the
> first_name and/or last_name columns. Indeed, removing the two
> account_names lines from the define_index block and re-running the
> index task, removes the duplicate document ID warning. However, due to
> an unrelated configuration issue, I'm not able to get to get the
> total_entries count from a console. Using the `search` command line
> tool, it seems that the account_core index now has a total of 602083
> documents (all of them!).
> So it looks like the blank data is the cause for the duplicate
> document id warning and the seemingly-missing records. What would you
> suggest as a way to work around this issue? I can try casting the NULL
> values to empty strings, or adding other data to the index that would
> help Sphinx distinguish between records (but wouldn't the timestamp
> and email address fields do that?). Anything you'd suggest?
> And yes, the Account model is listed in the indexed_models setting.
> Cheers,Alex
> 
> On Nov 2, 9:32 am, Pat Allan <[email protected]> wrote:
>> Hi Alex
>> 
>> How many other Sphinx indices do you have in your app? Just wondering if 
>> there's some conflict somehow, though that surely would crop up in dev as 
>> well.
>> 
>> As for the missing records - do you have many accounts with no first name, 
>> last name or email addresses? I remember reading somewhere that Sphinx 
>> ignores records that have no data in their fields. Not sure if these two 
>> problems are related to each other.
>> 
>> Also, going by an issue you logged on Github - is this the app you're using 
>> the indexed_models setting with? Can you confirm that all relevant models 
>> are in that setting?
>> 
>> Cheers
>> 
>> --
>> Pat
>> 
>> On 02/11/2011, at 12:18 AM, Alex Kahn wrote:
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>>> Hi,
>> 
>>> I'm adding a new index to my application. It looks like this:
>> 
>>> class Account < ActiveRecord::Base
>>>  define_index do
>>>    indexes account_name.first_name
>>>    indexes account_name.last_name
>>>    indexes email_addresses.email_address
>> 
>>>    has created_at
>> 
>>>    set_property :delta => :datetime, :threshold => 2.minutes
>>>  end
>>> end
>> 
>>> I'm testing how long the full index takes to generate on a staging
>>> server where we typically have only sanitized data from production.
>>> But for this task, I'm working with our entire accounts,
>>> account_names, and email_addresses tables from production.
>> 
>>> When I generate the index, I get the following warning during the
>>> accounts index phase:
>> 
>>>  WARNING: duplicate document ids found
>> 
>>> In the Rails console, I observe the following:
>>>>> Account.search.total_entries
>>> => 260793
>>>>> Account.count
>>> => 602083
>> 
>>> Locally, with a much smaller subset of the data, I also get a
>>> different count from each data source, but I don't receive the
>>> "duplicate document ids" warning when generating the index.
>> 
>>> My research so far has indicated that this is an issue with merging
>>> indexes. But here I'm generating a full index, not a generating a
>>> delta index and then merging it into a full index.
>> 
>>> My questions are:
>> 
>>> 1. The warning and the discrepancy in count, are they related?
>>> 2. What does the warning mean?
>>> 3. Is all of my data accessible via searching, despite the different
>>> counts?
>>> 4. How can I fix this?
>> 
>>> Thanks in advance for any assistance,
>>> Alex Kahn
>> 
>>> P.S. I'm using Rails 2.3.14, Sphinx 0.9.9, thinking-sphinx 1.4.7, ts-
>>> datetime-delta 1.0.2
>> 
>>> --
>>> You received this message because you are subscribed to the Google Groups 
>>> "Thinking Sphinx" group.
>>> To post to this group, send email to [email protected].
>>> To unsubscribe from this group, send email to 
>>> [email protected].
>>> For more options, visit this group 
>>> athttp://groups.google.com/group/thinking-sphinx?hl=en.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Thinking Sphinx" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/thinking-sphinx?hl=en.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/thinking-sphinx?hl=en.

Reply via email to