Sorry. Looks like Google Groups ate my newlines. :(
On Nov 2, 4:03 pm, Alex Kahn <[email protected]> wrote: > Hi Pat, > Thanks for your response. We have 7 other models indexed by Sphinx. > They are all much smaller tables, the largest containing around 5,000 > records. Their indices are far more complex, though. > There is no missing data for email addresses. However, there are many > account_names records that have contain NULL or "" values for the > first_name and/or last_name columns. Indeed, removing the two > account_names lines from the define_index block and re-running the > index task, removes the duplicate document ID warning. However, due to > an unrelated configuration issue, I'm not able to get to get the > total_entries count from a console. Using the `search` command line > tool, it seems that the account_core index now has a total of 602083 > documents (all of them!). > So it looks like the blank data is the cause for the duplicate > document id warning and the seemingly-missing records. What would you > suggest as a way to work around this issue? I can try casting the NULL > values to empty strings, or adding other data to the index that would > help Sphinx distinguish between records (but wouldn't the timestamp > and email address fields do that?). Anything you'd suggest? > And yes, the Account model is listed in the indexed_models setting. > Cheers,Alex > > On Nov 2, 9:32 am, Pat Allan <[email protected]> wrote: > > > > > > > > > Hi Alex > > > How many other Sphinx indices do you have in your app? Just wondering if > > there's some conflict somehow, though that surely would crop up in dev as > > well. > > > As for the missing records - do you have many accounts with no first name, > > last name or email addresses? I remember reading somewhere that Sphinx > > ignores records that have no data in their fields. Not sure if these two > > problems are related to each other. > > > Also, going by an issue you logged on Github - is this the app you're using > > the indexed_models setting with? Can you confirm that all relevant models > > are in that setting? > > > Cheers > > > -- > > Pat > > > On 02/11/2011, at 12:18 AM, Alex Kahn wrote: > > > > Hi, > > > > I'm adding a new index to my application. It looks like this: > > > > class Account < ActiveRecord::Base > > > define_index do > > > indexes account_name.first_name > > > indexes account_name.last_name > > > indexes email_addresses.email_address > > > > has created_at > > > > set_property :delta => :datetime, :threshold => 2.minutes > > > end > > > end > > > > I'm testing how long the full index takes to generate on a staging > > > server where we typically have only sanitized data from production. > > > But for this task, I'm working with our entire accounts, > > > account_names, and email_addresses tables from production. > > > > When I generate the index, I get the following warning during the > > > accounts index phase: > > > > WARNING: duplicate document ids found > > > > In the Rails console, I observe the following: > > >>> Account.search.total_entries > > > => 260793 > > >>> Account.count > > > => 602083 > > > > Locally, with a much smaller subset of the data, I also get a > > > different count from each data source, but I don't receive the > > > "duplicate document ids" warning when generating the index. > > > > My research so far has indicated that this is an issue with merging > > > indexes. But here I'm generating a full index, not a generating a > > > delta index and then merging it into a full index. > > > > My questions are: > > > > 1. The warning and the discrepancy in count, are they related? > > > 2. What does the warning mean? > > > 3. Is all of my data accessible via searching, despite the different > > > counts? > > > 4. How can I fix this? > > > > Thanks in advance for any assistance, > > > Alex Kahn > > > > P.S. I'm using Rails 2.3.14, Sphinx 0.9.9, thinking-sphinx 1.4.7, ts- > > > datetime-delta 1.0.2 > > > > -- > > > You received this message because you are subscribed to the Google Groups > > > "Thinking Sphinx" group. > > > To post to this group, send email to [email protected]. > > > To unsubscribe from this group, send email to > > > [email protected]. > > > For more options, visit this group > > > athttp://groups.google.com/group/thinking-sphinx?hl=en. -- You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/thinking-sphinx?hl=en.
