Agree 100%. The value returned is the value stored. Not affected by the 
analyzer. 

However, I searched for that token. See my query? I would expect the analyzer 
to remove the large token. So that when I search for the large token I would 
find nothing. Rather it returns my record. 

Am I missing something here? 


----- Original Message -----

From: "Jack Krupansky" <jack.krupan...@gmail.com> 
To: solr-user@lucene.apache.org 
Sent: Friday, May 15, 2015 11:56:51 AM 
Subject: Re: Problem with solr.LengthFilterFactory 

The returned value is the stored or original source value - only the 
indexed terms are affected by token filtering. 

You could use an update processor if you want to adjust the actual source 
value, such as the truncate processor to truncate long source values: 

http://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/update/processor/TruncateFieldUpdateProcessorFactory.html
 


-- Jack Krupansky 

On Fri, May 15, 2015 at 11:38 AM, Charles Sanders <csand...@redhat.com> 
wrote: 

> Yes, that is what I am seeing. Looking in the code myself, I see no reason 
> for this behavior. That is why I assumed I was doing something very wrong. 
> 
> Below I have included an example. I set the max length to 300. I insert a 
> record with a single token of 500 characters. I expect the token to be 
> removed and not included in the index. When I query using the large token, 
> the record is returned. I can see the same result using the analysis page 
> in the solr console. 
> 
> He is a test example: 
> 
> <field name="portal_package" type="text_std" indexed="true" stored="true" 
> multiValued="true"/> 
> 
> <fieldType name="text_std" class="solr.TextField" 
> positionIncrementGap="100"> 
> <analyzer type="index"> 
> <tokenizer class="solr.WhitespaceTokenizerFactory"/> 
> <filter class="solr.LengthFilterFactory" min="1" max="300" /> 
> </analyzer> 
> </fieldType> 
> 
> 
> A test record: 
> 
> { 
> "documentKind": "test", 
> "uri": "test300", 
> "id": "test300", 
> "portal_package": 
> "12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"
>  
> } 
> 
> 
> Query result: 
> 
> { 
> "responseHeader": { 
> "status": 0, 
> "QTime": 55, 
> "params": { 
> "indent": "true", 
> "q": 
> "portal_package:12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890",
>  
> "_": "1431704135745", 
> "wt": "json" 
> } 
> }, 
> "response": { 
> "numFound": 1, 
> "start": 0, 
> "docs": [ 
> { 
> "documentKind": "test", 
> "uri": "test300", 
> "id": "test300", 
> "portal_package": [ 
> 
> "12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"
>  
> ], 
> "_version_": 1501249997589446700, 
> "timestamp": "2015-05-15T15:26:05.205Z", 
> "language": "en" 
> } 
> ] 
> } 
> } 
> 
> 
> 
> 
> 
> ----- Original Message ----- 
> 
> From: "Shawn Heisey" <apa...@elyograg.org> 
> To: solr-user@lucene.apache.org 
> Sent: Friday, May 15, 2015 11:13:14 AM 
> Subject: Re: Problem with solr.LengthFilterFactory 
> 
> On 5/15/2015 8:49 AM, Charles Sanders wrote: 
> > I'm seeing a problem with the LengthFilter. It appears to work fine 
> until I increase the max value above 254. At the point it stops removing 
> the very large token from the stream. As a result I get the error: 
> > java.lang.IllegalArgumentException: Document contains at least one 
> immense term...... UTF8 encoding is longer than the max length 32766 
> > 
> > I'm certain I'm doing this wrong. Can someone please show me the light. 
> :) 
> > 
> > <fieldType name="text_std" class="solr.TextField" 
> positionIncrementGap="100"> 
> > <analyzer type="index"> 
> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> 
> > <filter class="solr.LengthFilterFactory" min="1" max="254" /> 
> > </analyzer> 
> > </fieldType> 
> 
> So with max="254", you don't get the error? Looking at the code for 
> LengthFilter, I can't see any way for it to behave differently with a 
> max of 254 vs. a max of 255 or higher. All of the interfaces and 
> classes involved use "int" for length, which means it should work 
> perfectly with numbers above 254. 
> 
> Thanks, 
> Shawn 
> 
> 
> 

Reply via email to