Re: Problem with solr.LengthFilterFactory

Charles Sanders Fri, 15 May 2015 08:39:07 -0700

Yes, that is what I am seeing. Looking in the code myself, I see no reason for 
this behavior. That is why I assumed I was doing something very wrong.


Below I have included an example. I set the max length to 300. I insert a 
record with a single token of 500 characters. I expect the token to be removed 
and not included in the index. When I query using the large token, the record 
is returned. I can see the same result using the analysis page in the solr 
console. 

He is a test example: 

<field name="portal_package" type="text_std" indexed="true" stored="true" 
multiValued="true"/> 

<fieldType name="text_std" class="solr.TextField" positionIncrementGap="100"> 
<analyzer type="index"> 
<tokenizer class="solr.WhitespaceTokenizerFactory"/> 
<filter class="solr.LengthFilterFactory" min="1" max="300" /> 
</analyzer> 
</fieldType> 


A test record: 

{ 
"documentKind": "test", 
"uri": "test300", 
"id": "test300", 
"portal_package": 
"12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"
 
} 


Query result: 

{ 
"responseHeader": { 
"status": 0, 
"QTime": 55, 
"params": { 
"indent": "true", 
"q": 
"portal_package:12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890",
 
"_": "1431704135745", 
"wt": "json" 
} 
}, 
"response": { 
"numFound": 1, 
"start": 0, 
"docs": [ 
{ 
"documentKind": "test", 
"uri": "test300", 
"id": "test300", 
"portal_package": [ 
"12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"
 
], 
"_version_": 1501249997589446700, 
"timestamp": "2015-05-15T15:26:05.205Z", 
"language": "en" 
} 
] 
} 
} 





----- Original Message -----

From: "Shawn Heisey" <apa...@elyograg.org> 
To: solr-user@lucene.apache.org 
Sent: Friday, May 15, 2015 11:13:14 AM 
Subject: Re: Problem with solr.LengthFilterFactory 

On 5/15/2015 8:49 AM, Charles Sanders wrote: 
> I'm seeing a problem with the LengthFilter. It appears to work fine until I 
> increase the max value above 254. At the point it stops removing the very 
> large token from the stream. As a result I get the error: 
> java.lang.IllegalArgumentException: Document contains at least one immense 
> term...... UTF8 encoding is longer than the max length 32766 
> 
> I'm certain I'm doing this wrong. Can someone please show me the light. :) 
> 
> <fieldType name="text_std" class="solr.TextField" positionIncrementGap="100"> 
> <analyzer type="index"> 
> <tokenizer class="solr.WhitespaceTokenizerFactory"/> 
> <filter class="solr.LengthFilterFactory" min="1" max="254" /> 
> </analyzer> 
> </fieldType> 

So with max="254", you don't get the error? Looking at the code for 
LengthFilter, I can't see any way for it to behave differently with a 
max of 254 vs. a max of 255 or higher. All of the interfaces and 
classes involved use "int" for length, which means it should work 
perfectly with numbers above 254. 

Thanks, 
Shawn

Re: Problem with solr.LengthFilterFactory

Reply via email to