Hi Pat, Thanks for your response. We have 7 other models indexed by Sphinx. They are all much smaller tables, the largest containing around 5,000 records. Their indices are far more complex, though. There is no missing data for email addresses. However, there are many account_names records that have contain NULL or "" values for the first_name and/or last_name columns. Indeed, removing the two account_names lines from the define_index block and re-running the index task, removes the duplicate document ID warning. However, due to an unrelated configuration issue, I'm not able to get to get the total_entries count from a console. Using the `search` command line tool, it seems that the account_core index now has a total of 602083 documents (all of them!). So it looks like the blank data is the cause for the duplicate document id warning and the seemingly-missing records. What would you suggest as a way to work around this issue? I can try casting the NULL values to empty strings, or adding other data to the index that would help Sphinx distinguish between records (but wouldn't the timestamp and email address fields do that?). Anything you'd suggest? And yes, the Account model is listed in the indexed_models setting. Cheers,Alex
On Nov 2, 9:32 am, Pat Allan <[email protected]> wrote: > Hi Alex > > How many other Sphinx indices do you have in your app? Just wondering if > there's some conflict somehow, though that surely would crop up in dev as > well. > > As for the missing records - do you have many accounts with no first name, > last name or email addresses? I remember reading somewhere that Sphinx > ignores records that have no data in their fields. Not sure if these two > problems are related to each other. > > Also, going by an issue you logged on Github - is this the app you're using > the indexed_models setting with? Can you confirm that all relevant models are > in that setting? > > Cheers > > -- > Pat > > On 02/11/2011, at 12:18 AM, Alex Kahn wrote: > > > > > > > > > Hi, > > > I'm adding a new index to my application. It looks like this: > > > class Account < ActiveRecord::Base > > define_index do > > indexes account_name.first_name > > indexes account_name.last_name > > indexes email_addresses.email_address > > > has created_at > > > set_property :delta => :datetime, :threshold => 2.minutes > > end > > end > > > I'm testing how long the full index takes to generate on a staging > > server where we typically have only sanitized data from production. > > But for this task, I'm working with our entire accounts, > > account_names, and email_addresses tables from production. > > > When I generate the index, I get the following warning during the > > accounts index phase: > > > WARNING: duplicate document ids found > > > In the Rails console, I observe the following: > >>> Account.search.total_entries > > => 260793 > >>> Account.count > > => 602083 > > > Locally, with a much smaller subset of the data, I also get a > > different count from each data source, but I don't receive the > > "duplicate document ids" warning when generating the index. > > > My research so far has indicated that this is an issue with merging > > indexes. But here I'm generating a full index, not a generating a > > delta index and then merging it into a full index. > > > My questions are: > > > 1. The warning and the discrepancy in count, are they related? > > 2. What does the warning mean? > > 3. Is all of my data accessible via searching, despite the different > > counts? > > 4. How can I fix this? > > > Thanks in advance for any assistance, > > Alex Kahn > > > P.S. I'm using Rails 2.3.14, Sphinx 0.9.9, thinking-sphinx 1.4.7, ts- > > datetime-delta 1.0.2 > > > -- > > You received this message because you are subscribed to the Google Groups > > "Thinking Sphinx" group. > > To post to this group, send email to [email protected]. > > To unsubscribe from this group, send email to > > [email protected]. > > For more options, visit this group > > athttp://groups.google.com/group/thinking-sphinx?hl=en. -- You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/thinking-sphinx?hl=en.
