On 8/28/2018 6:03 AM, kunhu0...@gmail.com wrote:
possible analysis error: Document contains at least one immense term in
field="content" (whose UTF8 encoding is longer than the max length 32766),

It's telling you exactly what is wrong.

The field named "content" is probably using a field class with no analysis, or using the Keyword Tokenizer so the whole field gets treated as a single term.  The length of that field for at least one of your documents is longer than 32766 characters. Maybe it's bytes -- a UTF8 character can be more than a single byte.  Lucene has a limit on term length, and your input exceeded that length.

If you change the field type for content to something that's analyzed (split into words, basically) then this problem would likely go away.

Thanks,
Shawn

Reply via email to