Re: Duplicate documents

2017-03-29 Thread Wenjie Zhang
Hi Alex, Thanks for your time and please see response inline. Thanks, Jack On Wed, Mar 29, 2017 at 11:36 AM, Alexandre Rafalovitch wrote: > There are too many things here. As far as I understand: > *) You should not need to use Signature chain [Jack] I have the same > feeling as well, but I j

Re: Duplicate documents

2017-03-29 Thread Alexandre Rafalovitch
There are too many things here. As far as I understand: *) You should not need to use Signature chain *) You should have a uniqueID assigned to the child record *) You should not assign parentID to the child record, it will be assigned automatically *) Double check that your unique_key field type i

Re: Duplicate documents

2017-03-29 Thread Wenjie Zhang
BTW, we only have one node and the collection has just one shard. On Wed, Mar 29, 2017 at 10:52 AM, Wenjie Zhang wrote: > Hi there, > > We are in solr 6.0.1, here is our solr schema and config: > > _unique_key > > > > solr.StrField > 32766 > > > > > > [^

Re: Duplicate Documents

2015-09-18 Thread Mr Havercamp
Thanks. Okay have done what you suggest, I.e. removed the overwrite=true which should default to solr's default value. I've also tried a re-index and left it to run for a few days; so far so good, nothing indicating duplicates, so as you say, could just be a bug in my code. Will continue to monito

Re: Duplicate Documents

2015-09-12 Thread Shawn Heisey
On 9/12/2015 10:51 AM, Mr Havercamp wrote: > Unfortunately, has never changed. The issue can take some time > to show itself although I think there were logic issues with the way I > update documents in my index. > > I first do a full purge and reindex of all items without issue. > > Over time,

Re: Duplicate Documents

2015-09-12 Thread Mr Havercamp
Unfortunately, has never changed. The issue can take some time to show itself although I think there were logic issues with the way I update documents in my index. I first do a full purge and reindex of all items without issue. Over time, I only index items that have changed/are new since initia

Re: Duplicate Documents

2015-09-11 Thread Erick Erickson
OK, this makes no sense whatsoever, so I"m missing something. commitWithin shouldn't matter at all, there's code to handle multiple updates between commits. I'm _really_ shooting in the dark here, but... > did you perhaps change the definition from the default "id" to "key" without blowing away

Re: Duplicate Documents

2015-09-11 Thread Mr Havercamp
Thanks for the suggestions. No, not using MERGEINDEXES nor MapReduceIndexerTool. I've pasted the XML in case there is something broken there (cut down for brevity, i.e. the "..."): 123456789/3Test SubmissionTest Submission11Test Collectiontest collection|||Test CollectionTest Collectionyoung, ha

Re: Duplicate Documents

2015-09-11 Thread Mr Havercamp
I'm wondering if the commitWithin is causing issues. On 11 September 2015 at 18:52, Mr Havercamp wrote: > Thanks for the suggestions. No, not using MERGEINDEXES nor > MapReduceIndexerTool. > > I've pasted the XML in case there is something broken there (cut > down for brevity, i.e. the "..."):

Re: Duplicate Documents

2015-09-11 Thread Erick Erickson
Are you by any chance using the MERGEINDEXES core admin call? Or using MapReduceIndexerTool? Neither of those delete duplicates This is a fundamental part of Solr though, so it's virtually certain that there's some innocent-seeming thing you're doing that's causing this... Best, Erick On Fr

Re: Duplicate Documents

2015-09-11 Thread Vivek Pathak
At query time, you could externally roll in the dups when they have the same signature. If you define your use case, it might be easier.. On 09/11/2015 11:55 AM, Shawn Heisey wrote: On 9/11/2015 9:10 AM, Mr Havercamp wrote: fieldType def: It is not SolrCloud. As long

Re: Duplicate Documents

2015-09-11 Thread Shawn Heisey
On 9/11/2015 9:10 AM, Mr Havercamp wrote: > fieldType def: > > > sortMissingLast="true" /> > > It is not SolrCloud. As long as it's not a distributed index, I can't think of any problem those field/type definitions might cause. Even if it were distributed and you had the same do

Re: Duplicate Documents

2015-09-11 Thread Mr Havercamp
Hi Shawn Thanks for your response. fieldType def: It is not SolrCloud. Cheers Hayden On 11 September 2015 at 16:35, Shawn Heisey wrote: > On 9/11/2015 8:25 AM, Mr Havercamp wrote: > > Running 4.8.1. I am experiencing the same problem where I get duplicates > on > > index

Re: Duplicate Documents

2015-09-11 Thread Shawn Heisey
On 9/11/2015 8:25 AM, Mr Havercamp wrote: > Running 4.8.1. I am experiencing the same problem where I get duplicates on > index update despite using overwrite=true when adding existing documents. > My duplicate ratio is a lot higher with maybe 25 - 50% of records having > duplicates (and as the ind

Re: Duplicate Documents

2015-09-11 Thread Mr Havercamp
Running 4.8.1. I am experiencing the same problem where I get duplicates on index update despite using overwrite=true when adding existing documents. My duplicate ratio is a lot higher with maybe 25 - 50% of records having duplicates (and as the index continues to run the duplicates increase from 2

RE: Duplicate Documents

2015-08-05 Thread Tarala, Magesh
I deleted the index and re-indexed. Duplicates went away. Have not identified root cause, but looks like updating documents is causing it sporadically. Going to try deleting the document and then update. -Original Message- From: Tarala, Magesh Sent: Monday, August 03, 2015 8:27 AM To:

Re: Duplicate documents based on attribute

2013-07-25 Thread Aditya
You need to store the color field as multi valued stored field. You have to do pagination manually. If you worried, then use database. Have a table with Product Name and Color. You could retrieve data with pagination. Still if you want to achieve it via Solr. Have a separate record for every produ

Re: Duplicate documents based on attribute

2013-07-25 Thread Alexandre Rafalovitch
Look for the presentations online. You are not the first store to use Solr, there are some explanations around. Try one from Gilt, but I think there were more. You will want to store data at the lowest meaningful level of search granularity. So, in your case, it might be ProductVariation (shoes+co

Re: Duplicate documents based on attribute

2013-07-25 Thread Mark
I was hoping to do this from within Solr, that way I don't have to manually mess around with pagination. The number of items on each page would be indeterministic. On Jul 25, 2013, at 9:48 AM, Anshum Gupta wrote: > Have a multivalued stored 'color' field and just iterate on it outside of > so

Re: Duplicate documents based on attribute

2013-07-25 Thread Anshum Gupta
Have a multivalued stored 'color' field and just iterate on it outside of solr. On Thu, Jul 25, 2013 at 10:12 PM, Mark wrote: > How would I go about doing something like this. Not sure if this is > something that can be accomplished on the index side or its something that > should be done in ou

Re: Duplicate documents being added even with unique key

2012-05-21 Thread Parmeley, Michael
Changing my field type to string for my uniquekey field solved the problem. Thanks to Jack and Erik for the fix! On May 18, 2012, at 5:33 PM, Jack Krupansky wrote: > Typically the uniqueKey field is a "string" field type (your schema uses > "text_general"), although I don't think it is supposed

Re: Duplicate documents being added even with unique key

2012-05-18 Thread Jack Krupansky
Typically the uniqueKey field is a "string" field type (your schema uses "text_general"), although I don't think it is supposed to be a requirement. Still, it is one thing that stands out. Actually, you may be running into some variation of SOLR-1401: https://issues.apache.org/jira/browse/SOLR

Re: Duplicate documents being added even with unique key

2012-05-18 Thread Erik Hatcher
Your unique key field should be of type "string" not a tokenized type. Erik On May 18, 2012, at 17:50, "Parmeley, Michael" wrote: > I have a uniquekey set in my schema; however, I am still getting duplicated > documents added. Can anyone provide any insight into why this may be > happenin