Thanks for your advice Alexandre.
On 3 September 2015 at 20:29, Alexandre Rafalovitch
wrote:
> Probably because your signatureField and your fields are the same! You
> need to point signatureField at a new (not-ID) field.
>
> You will still get duplicates, as you requested that in your other
> e
Probably because your signatureField and your fields are the same! You
need to point signatureField at a new (not-ID) field.
You will still get duplicates, as you requested that in your other
emails, but now you would be able to group on that new signature
field.
If you have any further problems,
Hi Alexandre,
Thanks for pointing out the error. I'm able to get the documents to be
indexed after adding in the two processors.
However, I'm still seeing all the similar documents being search in the
content without being de-duplicated. My content is currently indexed as
fieldType=text_general.
And that's because you have an incomplete chain. If you look at the
full example in solrconfig.xml, it shows:
true
id
false
name,features,cat
solr.processor.Lookup3Signature
Notice, the last two processors. I
Hi Erick,
I couldn't really find anything special in the logs. The indexing process
just went on normally, but after that when I check the index, there is
nothing indexed.
This is what I see from the logs. Looks the same as when the indexing works
fine.
INFO - 2015-09-03 01:24:35.316; [collecti
_How_ does it fail? You must be seeing something in the logs
On Wed, Sep 2, 2015 at 8:29 AM, Zheng Lin Edwin Yeo
wrote:
> Hi Erick,
>
> Yes, i'm trying out the De-Duplication too. But I'm facing a problem with
> that, which is the indexing stops working once I put in the following
> De-Dupl
Hi Erick,
Yes, i'm trying out the De-Duplication too. But I'm facing a problem with
that, which is the indexing stops working once I put in the following
De-Duplication code in solrconfig.xml. The problem seems to be with this dedupe line.
dedupe
true
signature
false
content
Yes, that is an intentional limit for the size of a single token,
which strings are.
Why not use deduplication? See:
https://cwiki.apache.org/confluence/display/solr/De-Duplication
You don't have to replace the existing documents, and Solr will
compute a hash that can be used to identify identica
Hi,
I would like to check, is the string bytes must be at most 32766 characters
in length?
I'm trying to do a copyField of my rich-text documents content to a field
with fieldType=string to try out my getting distinct result for content, as
there are several documents with the exact same content,