Hi Erik,

thanks for your concerns and thoughts.
There is no XY problem because we decouple input (storing)
from, searching, faceting, ...
What you see is just the input for storing and output of the original
text in the results. There is no need to do any analysis on this.
So don't worry, it works like a charm for years now ;-)

With the upgrade from 4.6.1 to 4.10.4 it only turned out we never
recognized that we were missing 3 or 4 documents within over
70 million because they were silently dropped which has been changed
by LUCENE-5472.

Regards
Bernd


Am 12.05.2015 um 00:29 schrieb Erick Erickson:
> I've got to ask _how_ are you intending to search this field? On the
> surface, this feels like an XY problem.
> It's a "string" type. Therefore, if this is the input:
> 
> 102, 111, 114, 32, 97, 32, 114, 101, 118, 105, 101, 119, 32, 115, 101,
> 101, 32, 66, 114
> 
> you'll only ever get a match if you search exactly:
> 102, 111, 114, 32, 97, 32, 114, 101, 118, 105, 101, 119, 32, 115, 101,
> 101, 32, 66, 114
> 
> None of these will match
> 102
> 102,
> 32
> 32,
> 119, 32, 115
> 
> etc.
> 
> The idea of doing a match on a single _token_ that's over 32K long is
> pretty far out there, thus
> the check.
> 
> The entire multiValued discussion is _probably_ a red herring and
> won't help you. multiValued
> has nothing to do with multiple terms, that's all up to your field type.
> 
> So back up and tell us _how_ you intend to search this field. I'm
> guessing you really want
> to make it a text-based type instead. But that's just a guess.
> 
> Best,
> Erick.
> 
> On Mon, May 11, 2015 at 8:43 AM, Bernd Fehling
> <bernd.fehl...@uni-bielefeld.de> wrote:
>> It turned out that I didn't recognized that dcdescription is not indexed,
>> only stored. So the next in "chain" ist f_dcperson where dccreator and
>> dcdescription is combined and indexed. And this is why the error
>> shows up on f_dcperson. ("delay of error")
>>
>> Thanks for your help, regards.
>> Bernd
>>
>>
>> Am 11.05.2015 um 15:35 schrieb Shawn Heisey:
>>> On 5/11/2015 7:19 AM, Bernd Fehling wrote:
>>>> After reading https://issues.apache.org/jira/browse/LUCENE-5472
>>>> one question still remains.
>>>>
>>>> Why is it complaining about f_dcperson which is a copyField when the
>>>> origin problem field is dcdescription which definately is much larger
>>>> than 32766?
>>>>
>>>> I would assume it complains about dcdescription field. Or not?
>>>
>>> If the value resulting in the error does come from a copyField source
>>> that also uses a "string" type, then my guess here is that Solr has some
>>> prioritization that causes the copyField destination to be indexed
>>> before the sources.  This ordering might make things go a little faster,
>>> because if it happens right after copying, all or most of the data for
>>> the destination field would already be sitting in one or more of the CPU
>>> caches.  Cache hits are wonderful things for performance.
>>>
>>> Thanks,
>>> Shawn
>>>

-- 
*************************************************************
Bernd Fehling                    Bielefeld University Library
Dipl.-Inform. (FH)                LibTec - Library Technology
Universitätsstr. 25                  and Knowledge Management
33615 Bielefeld
Tel. +49 521 106-4060       bernd.fehling(at)uni-bielefeld.de

BASE - Bielefeld Academic Search Engine - www.base-search.net
*************************************************************

Reply via email to