I just came back to this because I figured out you're trying to just store
this text. Now I'm baffled. How big is it? :)

Not sure why an analyzer is running if you're just storing the content.
Maybe you should post your whole schema.xml... there could be a copyfield
that's dumping the text into a different field that has the keyword
tokenizer?

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions <https://twitter.com/Appinions> | g+:
plus.google.com/appinions
<https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts>
w: appinions.com <http://www.appinions.com/>

On Mon, Sep 15, 2014 at 10:37 AM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:

> If you're using a String fieldtype, you're not indexing it so much as
> dumping the whole content blob in there as a single term for exact
> matching.
>
> You probably want to look at one of the text field types for textural
> content.
>
> That doesn't explain the difference in behavior between Solr versions, but
> my hunch is that you'll be happier in general with the behavior of a field
> type that does tokenizing and stemming for plain text search anyway.
>
> Michael Della Bitta
>
> Applications Developer
>
> o: +1 646 532 3062
>
> appinions inc.
>
> “The Science of Influence Marketing”
>
> 18 East 41st Street
>
> New York, NY 10017
>
> t: @appinions <https://twitter.com/Appinions> | g+:
> plus.google.com/appinions
> <https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts>
> w: appinions.com <http://www.appinions.com/>
>
> On Mon, Sep 15, 2014 at 10:06 AM, Christopher Gross <cogr...@gmail.com>
> wrote:
>
>> Solr 4.9.0
>> Java 1.7.0_49
>>
>> I'm indexing an internal Wiki site.  I was running on an older version of
>> Solr (4.1) and wasn't having any trouble indexing the content, but now I'm
>> getting errors:
>>
>> SCHEMA:
>> <field name="content" type="string" indexed="false" stored="true"
>> required="true"/>
>>
>> LOGS:
>> Caused by: java.lang.IllegalArgumentException: Document contains at least
>> one immense term in field="content" (whose UTF8 encoding is longer than
>> the
>> max length 32766), all of which were skipped.  Please correct the analyzer
>> to not produce such terms.  The prefix of the first immense term is: '[60,
>> 33, 45, 45, 32, 98, 111, 100, 121, 67, 111, 110, 116, 101, 110, 116, 32,
>> 45, 45, 62, 10, 9, 9, 9, 60, 100, 105, 118, 32, 115]...', original
>> message:
>> bytes can be at most 32766 in length; got 183250
>> ....
>> Caused by:
>> org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException: bytes
>> can be at most 32766 in length; got 183250
>>
>> I was indexing it, but I switched that off (as you can see above) but it
>> still is having problems.  Is there a different type I should use, or a
>> different analyzer?  I imagine that there is a way to index very large
>> documents in Solr.  Any recommendations would be helpful.  Thanks!
>>
>> -- Chris
>>
>
>

Reply via email to