Re: Solr Exceptions -- "immense terms"

Christopher Gross Mon, 15 Sep 2014 07:59:26 -0700

Yeah -- for this part I'm just trying to store it to show it later.

There was a change in Lucene 4.8.x.  Before then, the exception was just
being eaten...now they throw it up and don't index that document.


Can't push the whole schema up -- but I do copy the content field into a
"text" field (text_en_splitting) that gets used for a full text search
(along w/ some other fields).  But then I would think I'd see the error for
that field instead of "content."  I may try that to figure out where the
problem is, but I do want to have the content available for doing the
search...

It's big.

I'm probably going to have to tweak the schema some (probably wise anyway),
but I'm not sure what do to about this large text.  I'm loading the content
in via some Java code so I could trim it down, but I'd rather not exclude
content from the page just because it's large.  I was hoping that someone
would have a better field type to use, or an idea of what to do to
configure it.

Thanks Michael.


-- Chris

On Mon, Sep 15, 2014 at 10:38 AM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:

> I just came back to this because I figured out you're trying to just store
> this text. Now I'm baffled. How big is it? :)
>
> Not sure why an analyzer is running if you're just storing the content.
> Maybe you should post your whole schema.xml... there could be a copyfield
> that's dumping the text into a different field that has the keyword
> tokenizer?
>
> Michael Della Bitta
>
> Applications Developer
>
> o: +1 646 532 3062
>
> appinions inc.
>
> “The Science of Influence Marketing”
>
> 18 East 41st Street
>
> New York, NY 10017
>
> t: @appinions <https://twitter.com/Appinions> | g+:
> plus.google.com/appinions
> <
> https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
> >
> w: appinions.com <http://www.appinions.com/>
>
> On Mon, Sep 15, 2014 at 10:37 AM, Michael Della Bitta <
> michael.della.bi...@appinions.com> wrote:
>
> > If you're using a String fieldtype, you're not indexing it so much as
> > dumping the whole content blob in there as a single term for exact
> > matching.
> >
> > You probably want to look at one of the text field types for textural
> > content.
> >
> > That doesn't explain the difference in behavior between Solr versions,
> but
> > my hunch is that you'll be happier in general with the behavior of a
> field
> > type that does tokenizing and stemming for plain text search anyway.
> >
> > Michael Della Bitta
> >
> > Applications Developer
> >
> > o: +1 646 532 3062
> >
> > appinions inc.
> >
> > “The Science of Influence Marketing”
> >
> > 18 East 41st Street
> >
> > New York, NY 10017
> >
> > t: @appinions <https://twitter.com/Appinions> | g+:
> > plus.google.com/appinions
> > <
> https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
> >
> > w: appinions.com <http://www.appinions.com/>
> >
> > On Mon, Sep 15, 2014 at 10:06 AM, Christopher Gross <cogr...@gmail.com>
> > wrote:
> >
> >> Solr 4.9.0
> >> Java 1.7.0_49
> >>
> >> I'm indexing an internal Wiki site.  I was running on an older version
> of
> >> Solr (4.1) and wasn't having any trouble indexing the content, but now
> I'm
> >> getting errors:
> >>
> >> SCHEMA:
> >> <field name="content" type="string" indexed="false" stored="true"
> >> required="true"/>
> >>
> >> LOGS:
> >> Caused by: java.lang.IllegalArgumentException: Document contains at
> least
> >> one immense term in field="content" (whose UTF8 encoding is longer than
> >> the
> >> max length 32766), all of which were skipped.  Please correct the
> analyzer
> >> to not produce such terms.  The prefix of the first immense term is:
> '[60,
> >> 33, 45, 45, 32, 98, 111, 100, 121, 67, 111, 110, 116, 101, 110, 116, 32,
> >> 45, 45, 62, 10, 9, 9, 9, 60, 100, 105, 118, 32, 115]...', original
> >> message:
> >> bytes can be at most 32766 in length; got 183250
> >> ....
> >> Caused by:
> >> org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException:
> bytes
> >> can be at most 32766 in length; got 183250
> >>
> >> I was indexing it, but I switched that off (as you can see above) but it
> >> still is having problems.  Is there a different type I should use, or a
> >> different analyzer?  I imagine that there is a way to index very large
> >> documents in Solr.  Any recommendations would be helpful.  Thanks!
> >>
> >> -- Chris
> >>
> >
> >
>

Re: Solr Exceptions -- "immense terms"

Reply via email to