May not need a script for that: http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/update/processor/TruncateFieldUpdateProcessorFactory.html
Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 15 September 2014 11:05, Jack Krupansky <j...@basetechnology.com> wrote: > You can use an update request processor to filter the input for large > values. You could write a script with the stateless script processor which > ignores or trims large input values. > > -- Jack Krupansky > > -----Original Message----- From: Christopher Gross > Sent: Monday, September 15, 2014 7:58 AM > To: solr-user > Subject: Re: Solr Exceptions -- "immense terms" > > > Yeah -- for this part I'm just trying to store it to show it later. > > There was a change in Lucene 4.8.x. Before then, the exception was just > being eaten...now they throw it up and don't index that document. > > Can't push the whole schema up -- but I do copy the content field into a > "text" field (text_en_splitting) that gets used for a full text search > (along w/ some other fields). But then I would think I'd see the error for > that field instead of "content." I may try that to figure out where the > problem is, but I do want to have the content available for doing the > search... > > It's big. > > I'm probably going to have to tweak the schema some (probably wise anyway), > but I'm not sure what do to about this large text. I'm loading the content > in via some Java code so I could trim it down, but I'd rather not exclude > content from the page just because it's large. I was hoping that someone > would have a better field type to use, or an idea of what to do to > configure it. > > Thanks Michael. > > > -- Chris > > On Mon, Sep 15, 2014 at 10:38 AM, Michael Della Bitta < > michael.della.bi...@appinions.com> wrote: > >> I just came back to this because I figured out you're trying to just store >> this text. Now I'm baffled. How big is it? :) >> >> Not sure why an analyzer is running if you're just storing the content. >> Maybe you should post your whole schema.xml... there could be a copyfield >> that's dumping the text into a different field that has the keyword >> tokenizer? >> >> Michael Della Bitta >> >> Applications Developer >> >> o: +1 646 532 3062 >> >> appinions inc. >> >> “The Science of Influence Marketing” >> >> 18 East 41st Street >> >> New York, NY 10017 >> >> t: @appinions <https://twitter.com/Appinions> | g+: >> plus.google.com/appinions >> < >> >> https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts >> > >> w: appinions.com <http://www.appinions.com/> >> >> On Mon, Sep 15, 2014 at 10:37 AM, Michael Della Bitta < >> michael.della.bi...@appinions.com> wrote: >> >> > If you're using a String fieldtype, you're not indexing it so much as >> > dumping the whole content blob in there as a single term for exact >> > matching. >> > >> > You probably want to look at one of the text field types for textural >> > content. >> > >> > That doesn't explain the difference in behavior between Solr versions, >> but >> > my hunch is that you'll be happier in general with the behavior of a >> field >> > type that does tokenizing and stemming for plain text search anyway. >> > >> > Michael Della Bitta >> > >> > Applications Developer >> > >> > o: +1 646 532 3062 >> > >> > appinions inc. >> > >> > “The Science of Influence Marketing” >> > >> > 18 East 41st Street >> > >> > New York, NY 10017 >> > >> > t: @appinions <https://twitter.com/Appinions> | g+: >> > plus.google.com/appinions >> > < >> >> https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts >> > >> > w: appinions.com <http://www.appinions.com/> >> > >> > On Mon, Sep 15, 2014 at 10:06 AM, Christopher Gross <cogr...@gmail.com> >> > wrote: >> > >> >> Solr 4.9.0 >> >> Java 1.7.0_49 >> >> >> >> I'm indexing an internal Wiki site. I was running on an older version >> of >> >> Solr (4.1) and wasn't having any trouble indexing the content, but now >> I'm >> >> getting errors: >> >> >> >> SCHEMA: >> >> <field name="content" type="string" indexed="false" stored="true" >> >> required="true"/> >> >> >> >> LOGS: >> >> Caused by: java.lang.IllegalArgumentException: Document contains at >> least >> >> one immense term in field="content" (whose UTF8 encoding is longer than >> >> the >> >> max length 32766), all of which were skipped. Please correct the >> analyzer >> >> to not produce such terms. The prefix of the first immense term is: >> '[60, >> >> 33, 45, 45, 32, 98, 111, 100, 121, 67, 111, 110, 116, 101, 110, 116, >> >> >> 32, >> >> 45, 45, 62, 10, 9, 9, 9, 60, 100, 105, 118, 32, 115]...', original >> >> message: >> >> bytes can be at most 32766 in length; got 183250 >> >> .... >> >> Caused by: >> >> org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException: >> bytes >> >> can be at most 32766 in length; got 183250 >> >> >> >> I was indexing it, but I switched that off (as you can see above) but >> >> >> it >> >> still is having problems. Is there a different type I should use, or a >> >> different analyzer? I imagine that there is a way to index very large >> >> documents in Solr. Any recommendations would be helpful. Thanks! >> >> >> >> -- Chris >> >> >> > >> > >> >