Also, is there a way to overcome the long content problem?

I'm getting this error when I've indexed large rich-text documents and
tried to build the suggester.

*{*
*  "responseHeader":{*
*    "status":500,*
*    "QTime":47},*
*  "error":{*
*    "msg":"Document contains at least one immense term in
field=\"exacttext\" (whose UTF8 encoding is longer than the max length
32766), all of which were skipped.  Please correct the analyzer to not
produce such terms.  The prefix of the first immense term is: '[32, 10, 32,
10, 32, 32, 10, 32, 32, 10, 32, 32, 10, 32, 32, 10, 32, 32, 10, 32, 32, 10,
32, 32, 10, 32, 32, 10, 32, 32]...', original message: bytes can be at most
32766 in length; got 139402",*
*    "trace":"java.lang.IllegalArgumentException: Document contains at
least one immense term in field=\"exacttext\" (whose UTF8 encoding is
longer than the max length 32766), all of which were skipped.  Please
correct the analyzer to not produce such terms.  The prefix of the first
immense term is: '[32, 10, 32, 10, 32, 32, 10, 32, 32, 10, 32, 32, 10, 32,
32, 10, 32, 32, 10, 32, 32, 10, 32, 32, 10, 32, 32, 10, 32, 32]...',
original message: bytes can be at most 32766 in length; got 139402\r\n\tat
org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:667)\r\n\tat
org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:344)\r\n\tat
org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:300)\r\n\tat
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:232)\r\n\tat
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:458)\r\n\tat
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1350)\r\n\tat
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1138)\r\n\tat
org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.add(AnalyzingInfixSuggester.java:381)\r\n\tat
org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.build(AnalyzingInfixSuggester.java:310)\r\n\tat
org.apache.lucene.search.suggest.Lookup.build(Lookup.java:193)\r\n\tat
org.apache.solr.spelling.suggest.SolrSuggester.build(SolrSuggester.java:163)\r\n\tat
org.apache.solr.handler.component.SuggestComponent.prepare(SuggestComponent.java:179)\r\n\tat
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:196)\r\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)\r\n\tat
org.apache.solr.core.SolrCore.execute(SolrCore.java:1984)\r\n\tat
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829)\r\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446)\r\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)\r\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)\r\n\tat
org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:82)\r\n\tat
org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:294)\r\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)\r\n\tat
org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:82)\r\n\tat
org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:294)\r\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)\r\n\tat
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)\r\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)\r\n\tat
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)\r\n\tat
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)\r\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)\r\n\tat
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)\r\n\tat
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)\r\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)\r\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)\r\n\tat
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)\r\n\tat
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)\r\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)\r\n\tat
org.eclipse.jetty.server.Server.handle(Server.java:368)\r\n\tat
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)\r\n\tat
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)\r\n\tat
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)\r\n\tat
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)\r\n\tat
org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)\r\n\tat
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)\r\n\tat
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)\r\n\tat
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)\r\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)\r\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)\r\n\tat
java.lang.Thread.run(Unknown Source)\r\nCaused by:
org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException: bytes
can be at most 32766 in length; got 139402\r\n\tat
org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:284)\r\n\tat
org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:154)\r\n\tat
org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:657)\r\n\t...
48 more\r\n",*
*    "code":500}}*


Regards,
Edwin



On 16 June 2015 at 11:43, Zheng Lin Edwin Yeo <edwinye...@gmail.com> wrote:

> Thanks Benedetti,
>
> I've change to the AnalyzingInfixLookup approach, and it is able to start
> searching from the middle of the field.
>
> However, is it possible to make the suggester to show only part of the
> content of the field (like 2 or 3 fields after), instead of the entire
> content/sentence, which can be quite long?
>
>
> Regards,
> Edwin
>
>
>
> On 15 June 2015 at 17:33, Alessandro Benedetti <benedetti.ale...@gmail.com
> > wrote:
>
>> ehehe Edwin, I think you should read again the document I linked time ago
>> :
>>
>> http://lucidworks.com/blog/solr-suggester/
>>
>> The suggester you used is not meant to provide infix suggestions.
>> The fuzzy suggester is working on a fuzzy basis , with the *starting*
>> terms
>> of a field content.
>>
>> What you are looking for is actually one of the Infix Suggesters.
>> For example the AnalyzingInfixLookup approach.
>>
>> When working with Suggesters is important first to make a distinction :
>>
>> 1) Returning the full content of the field ( analysisInfix or Fuzzy)
>>
>> 2) Returning token(s) ( Free Text Suggester)
>>
>> Then the second difference is :
>>
>> 1) Infix suggestions ( from the "middle" of the field content)
>> 2) Classic suggester ( from the beginning of the field content)
>>
>> Clarified that, will be quite simple to work with suggesters.
>>
>> Cheers
>>
>> 2015-06-15 9:28 GMT+01:00 Zheng Lin Edwin Yeo <edwinye...@gmail.com>:
>>
>> > I've indexed a rich-text documents with the following content:
>> >
>> > This is a testing rich text documents to test the uploading of files to
>> > Solr
>> >
>> >
>> > When I tried to use the suggestion, it return me the entire field in the
>> > content once I enter suggest?q=t. However, when I tried to search for
>> > q='rich', I don't get any results returned.
>> >
>> > This is my current configuration for the suggester:
>> > <searchComponent name="suggest" class="solr.SuggestComponent">
>> >   <lst name="suggester">
>> > <str name="name">mySuggester</str>
>> > <str name="lookupImpl">FuzzyLookupFactory</str>
>> > <str name="dictionaryImpl">DocumentDictionaryFactory</str>
>> > <str name="field">Suggestion</str>
>> > <str name="suggestAnalyzerFieldType">suggestType</str>
>> > <str name="buildOnStartup">true</str>
>> > <str name="buildOnCommit">false</str>
>> >   </lst>
>> > </searchComponent>
>> >
>> > <requestHandler name="/suggest" class="solr.SearchHandler"
>> startup="lazy" >
>> >   <lst name="defaults">
>> >     <str name="wt">json</str>
>> >         <str name="indent">true</str>
>> >
>> > <str name="suggest">true</str>
>> > <str name="suggest.count">10</str>
>> > <str name="suggest.dictionary">mySuggester</str>
>> >   </lst>
>> >   <arr name="components">
>> > <str>suggest</str>
>> >   </arr>
>> > </requestHandler>
>> >
>> > Is it possible to allow the suggester to return something even from the
>> > middle of the sentence, and also not to return the entire sentence if
>> the
>> > sentence. Perhaps it should just suggest the next 2 or 3 fields, and to
>> > return more fields as the users type.
>> >
>> > For example,
>> > When user type 'this', it should return 'This is a testing'
>> > When user type 'this is a testing', it should return 'This is a testing
>> > rich text documents'.
>> >
>> >
>> > Regards,
>> > Edwin
>> >
>>
>>
>>
>> --
>> --------------------------
>>
>> Benedetti Alessandro
>> Visiting card : http://about.me/alessandro_benedetti
>>
>> "Tyger, tyger burning bright
>> In the forests of the night,
>> What immortal hand or eye
>> Could frame thy fearful symmetry?"
>>
>> William Blake - Songs of Experience -1794 England
>>
>
>

Reply via email to