Field Collapsing is what you want - this is a classic problem with
retail store product indexing and everyone uses field collapsing.
(That is, everyone who is willing to apply the patch on their own
code.)

Dedupe is completely the wrong word. Deduping is something else
entirely - it is about trying not to index the same document twice.

On Tue, Jan 12, 2010 at 11:30 AM, Kelly Taylor <wired...@hotmail.com> wrote:
>
> David,
>
> Thanks, and yes, I decided to travel that path last night (applying SOLR-236
> patch) and plan to have some results by the end of the day; I'll post a
> summary.
>
> I read about field collapsing in your book last night. The book is an
> excellent resource by the way (shameless commendation plug!), and it made me
> laugh to find out that my use case is crazy!
>
> Regarding dedupe, I'm not sure either.  The component is mentioned in an
> article by Amit Nithianandan
> (http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Solr-and-RDBMS-design-basics).
> I had concluded from the section entitled, "Comparing the Solr Approach with
> the RDBMS," that the dedupe component was somehow used as a "field
> collapsing" alternative (in my mind anyway) but I couldn't find a real-world
> example.
>
> Amit says, "...I might create an index with multiple documents or records
> for the same exact wiper blade, each document having different location data
> (lat/long, address, etc.) to represent an individual store. Solr has a
> de-duplication component to help show unique documents in case that
> particular wiper blade is available in multiple stores near me..."
>
> In my case, I was attempting to equate Amit's "wiper blade" with my
> "product" entity, and his "individual store" my "SKU" entity.
>
> Thanks again.
>
> -Kelly
>
>
> David Smiley @MITRE.org wrote:
>>
>> Kelly,
>> This is a good question you have posed and illustrates a challenge with
>> Solr's limited schema.  I don't see how the dedup will help.  I would
>> continue with the SKU based approach and use this patch:
>> https://issues.apache.org/jira/browse/SOLR-236
>> You'll collapse on the product id.  My book, p.192, highlights this
>> component as it existed when I wrote it but it has been updated since
>> then.
>>
>> A recent separate question by you on this list suggests you're going down
>> this path.  I would grab the attached SOLR-236.patch file and attempt to
>> apply it to the 1.4 source.
>>
>> ~ David Smiley
>> Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/
>>
>
> --
> View this message in context: 
> http://old.nabble.com/Encountering-a-roadblock-with-my-Solr-schema-design...use-dedupe--tp27118977p27131969.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to