Sorry. Looks like Google Groups ate my newlines. :(

On Nov 2, 4:03 pm, Alex Kahn <[email protected]> wrote:
> Hi Pat,
> Thanks for your response. We have 7 other models indexed by Sphinx.
> They are all much smaller tables, the largest containing around 5,000
> records. Their indices are far more complex, though.
> There is no missing data for email addresses. However, there are many
> account_names records that have contain NULL or "" values for the
> first_name and/or last_name columns. Indeed, removing the two
> account_names lines from the define_index block and re-running the
> index task, removes the duplicate document ID warning. However, due to
> an unrelated configuration issue, I'm not able to get to get the
> total_entries count from a console. Using the `search` command line
> tool, it seems that the account_core index now has a total of 602083
> documents (all of them!).
> So it looks like the blank data is the cause for the duplicate
> document id warning and the seemingly-missing records. What would you
> suggest as a way to work around this issue? I can try casting the NULL
> values to empty strings, or adding other data to the index that would
> help Sphinx distinguish between records (but wouldn't the timestamp
> and email address fields do that?). Anything you'd suggest?
> And yes, the Account model is listed in the indexed_models setting.
> Cheers,Alex
>
> On Nov 2, 9:32 am, Pat Allan <[email protected]> wrote:
>
>
>
>
>
>
>
> > Hi Alex
>
> > How many other Sphinx indices do you have in your app? Just wondering if 
> > there's some conflict somehow, though that surely would crop up in dev as 
> > well.
>
> > As for the missing records - do you have many accounts with no first name, 
> > last name or email addresses? I remember reading somewhere that Sphinx 
> > ignores records that have no data in their fields. Not sure if these two 
> > problems are related to each other.
>
> > Also, going by an issue you logged on Github - is this the app you're using 
> > the indexed_models setting with? Can you confirm that all relevant models 
> > are in that setting?
>
> > Cheers
>
> > --
> > Pat
>
> > On 02/11/2011, at 12:18 AM, Alex Kahn wrote:
>
> > > Hi,
>
> > > I'm adding a new index to my application. It looks like this:
>
> > > class Account < ActiveRecord::Base
> > >  define_index do
> > >    indexes account_name.first_name
> > >    indexes account_name.last_name
> > >    indexes email_addresses.email_address
>
> > >    has created_at
>
> > >    set_property :delta => :datetime, :threshold => 2.minutes
> > >  end
> > > end
>
> > > I'm testing how long the full index takes to generate on a staging
> > > server where we typically have only sanitized data from production.
> > > But for this task, I'm working with our entire accounts,
> > > account_names, and email_addresses tables from production.
>
> > > When I generate the index, I get the following warning during the
> > > accounts index phase:
>
> > >  WARNING: duplicate document ids found
>
> > > In the Rails console, I observe the following:
> > >>> Account.search.total_entries
> > > => 260793
> > >>> Account.count
> > > => 602083
>
> > > Locally, with a much smaller subset of the data, I also get a
> > > different count from each data source, but I don't receive the
> > > "duplicate document ids" warning when generating the index.
>
> > > My research so far has indicated that this is an issue with merging
> > > indexes. But here I'm generating a full index, not a generating a
> > > delta index and then merging it into a full index.
>
> > > My questions are:
>
> > > 1. The warning and the discrepancy in count, are they related?
> > > 2. What does the warning mean?
> > > 3. Is all of my data accessible via searching, despite the different
> > > counts?
> > > 4. How can I fix this?
>
> > > Thanks in advance for any assistance,
> > > Alex Kahn
>
> > > P.S. I'm using Rails 2.3.14, Sphinx 0.9.9, thinking-sphinx 1.4.7, ts-
> > > datetime-delta 1.0.2
>
> > > --
> > > You received this message because you are subscribed to the Google Groups 
> > > "Thinking Sphinx" group.
> > > To post to this group, send email to [email protected].
> > > To unsubscribe from this group, send email to 
> > > [email protected].
> > > For more options, visit this group 
> > > athttp://groups.google.com/group/thinking-sphinx?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/thinking-sphinx?hl=en.

Reply via email to