I just came back to this because I figured out you're trying to just store this text. Now I'm baffled. How big is it? :)
Not sure why an analyzer is running if you're just storing the content. Maybe you should post your whole schema.xml... there could be a copyfield that's dumping the text into a different field that has the keyword tokenizer? Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions <https://twitter.com/Appinions> | g+: plus.google.com/appinions <https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts> w: appinions.com <http://www.appinions.com/> On Mon, Sep 15, 2014 at 10:37 AM, Michael Della Bitta < michael.della.bi...@appinions.com> wrote: > If you're using a String fieldtype, you're not indexing it so much as > dumping the whole content blob in there as a single term for exact > matching. > > You probably want to look at one of the text field types for textural > content. > > That doesn't explain the difference in behavior between Solr versions, but > my hunch is that you'll be happier in general with the behavior of a field > type that does tokenizing and stemming for plain text search anyway. > > Michael Della Bitta > > Applications Developer > > o: +1 646 532 3062 > > appinions inc. > > “The Science of Influence Marketing” > > 18 East 41st Street > > New York, NY 10017 > > t: @appinions <https://twitter.com/Appinions> | g+: > plus.google.com/appinions > <https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts> > w: appinions.com <http://www.appinions.com/> > > On Mon, Sep 15, 2014 at 10:06 AM, Christopher Gross <cogr...@gmail.com> > wrote: > >> Solr 4.9.0 >> Java 1.7.0_49 >> >> I'm indexing an internal Wiki site. I was running on an older version of >> Solr (4.1) and wasn't having any trouble indexing the content, but now I'm >> getting errors: >> >> SCHEMA: >> <field name="content" type="string" indexed="false" stored="true" >> required="true"/> >> >> LOGS: >> Caused by: java.lang.IllegalArgumentException: Document contains at least >> one immense term in field="content" (whose UTF8 encoding is longer than >> the >> max length 32766), all of which were skipped. Please correct the analyzer >> to not produce such terms. The prefix of the first immense term is: '[60, >> 33, 45, 45, 32, 98, 111, 100, 121, 67, 111, 110, 116, 101, 110, 116, 32, >> 45, 45, 62, 10, 9, 9, 9, 60, 100, 105, 118, 32, 115]...', original >> message: >> bytes can be at most 32766 in length; got 183250 >> .... >> Caused by: >> org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException: bytes >> can be at most 32766 in length; got 183250 >> >> I was indexing it, but I switched that off (as you can see above) but it >> still is having problems. Is there a different type I should use, or a >> different analyzer? I imagine that there is a way to index very large >> documents in Solr. Any recommendations would be helpful. Thanks! >> >> -- Chris >> > >