Hi Erick,

I couldn't really find anything special in the logs. The indexing process
just went on normally, but after that when I check the index, there is
nothing indexed.

This is what I see from the logs. Looks the same as when the indexing works
fine.

INFO  - 2015-09-03 01:24:35.316; [collection1 shard1 core_node2
collection1] org.apache.solr.handler.extraction.SolrContentHandler; Content
1 = content
INFO  - 2015-09-03 01:24:35.319; [collection1 shard1 core_node2
collection1] org.apache.solr.handler.extraction.SolrContentHandler; Content
2 = content
INFO  - 2015-09-03 01:24:35.482; [collection1 shard1 core_node1
collection1_shard1_replica2] org.apache.solr.core.SolrCore;
[collection1_shard1_replica2] webapp=/solr path=/update
params={update.distrib=FROMLEADER&distrib.from=
http://localhost:8983/solr/collection1/&wt=javabin&version=2
<http://192.168.23.52:8983/edm/collection1/&wt=javabin&version=2>} status=0
QTime=4
INFO  - 2015-09-03 01:24:35.483; [collection1 shard1 core_node2
collection1] org.apache.solr.update.processor.LogUpdateProcessor;
[collection1] webapp=/solr path=/update/extract
params={literal.geolocation=1,103&literal.popularity=0&literal.title=cat&literal.entity=Growhill&
resource.name
=C:\Users\edwin_000\Desktop\edwin\solr-5.2.1\IndexingDocuments\collection1\cat.pdf&
literal.id=collection1_cat&literal.location=Singapore&literal.accessgroup=VIP&literal.content_cat=test3&literal.crossreference=science&literal.accesslevel=8&literal.importance=5&literal.userid=edwin&literal.reference=science&literal.url=C:\Users\edwin_000\Desktop\edwin\solr-5.2.1\IndexingDocuments\collection1\cat.pdf&literal.content_subcat=test3&literal.visibility=Public}
{add=[collection1_cat (1511253318382387200)]} 0 437
INFO  - 2015-09-03 01:24:36.218; [collection1 shard1 core_node2
collection1] org.apache.solr.handler.extraction.SolrContentHandler; Content
1 = content
INFO  - 2015-09-03 01:24:36.225; [collection1 shard1 core_node2
collection1] org.apache.solr.handler.extraction.SolrContentHandler; Content
2 = content
INFO  - 2015-09-03 01:24:36.487; [collection1 shard1 core_node1
collection1_shard1_replica2] org.apache.solr.core.SolrCore;
[collection1_shard1_replica2] webapp=/solr path=/update
params={update.distrib=FROMLEADER&distrib.from=
http://localhost:8983/solr/collection1/&wt=javabin&version=2
<http://192.168.23.52:8983/edm/collection1/&wt=javabin&version=2>} status=0
QTime=6


Regards,
Edwin


On 2 September 2015 at 23:34, Erick Erickson <erickerick...@gmail.com>
wrote:

> _How_ does it fail? You must be seeing something in the logs....
>
>
>
> On Wed, Sep 2, 2015 at 8:29 AM, Zheng Lin Edwin Yeo
> <edwinye...@gmail.com> wrote:
> > Hi Erick,
> >
> > Yes, i'm trying out the De-Duplication too. But I'm facing a problem with
> > that, which is the indexing stops working once I put in the following
> > De-Duplication code in solrconfig.xml. The problem seems to be with this
> <str
> > name="update.chain">dedupe</str> line.
> >
> >   <requestHandler name="/update" class="solr.UpdateRequestHandler">
> >   <lst name="defaults">
> > <str name="update.chain">dedupe</str>
> >   </lst>
> >   </requestHandler>
> >
> >
> >     <updateRequestProcessorChain name="dedupe">
> >   <processor class="solr.processor.SignatureUpdateProcessorFactory">
> > <bool name="enabled">true</bool>
> > <str name="signatureField">signature</str>
> > <bool name="overwriteDupes">false</bool>
> > <str name="fields">content</str>
> > <str name="signatureClass">solr.processor.Lookup3Signature</str>
> >   </processor>
> > </updateRequestProcessorChain>
> >
> >
> > Regards,
> > Edwin
> >
> > On 2 September 2015 at 23:10, Erick Erickson <erickerick...@gmail.com>
> > wrote:
> >
> >> Yes, that is an intentional limit for the size of a single token,
> >> which strings are.
> >>
> >> Why not use deduplication? See:
> >> https://cwiki.apache.org/confluence/display/solr/De-Duplication
> >>
> >> You don't have to replace the existing documents, and Solr will
> >> compute a hash that can be used to identify identical documents
> >> and you can use_that_.
> >>
> >> Best
> >> Erick
> >>
> >> On Wed, Sep 2, 2015 at 2:53 AM, Zheng Lin Edwin Yeo
> >> <edwinye...@gmail.com> wrote:
> >> > Hi,
> >> >
> >> > I would like to check, is the string bytes must be at most 32766
> >> characters
> >> > in length?
> >> >
> >> > I'm trying to do a copyField of my rich-text documents content to a
> field
> >> > with fieldType=string to try out my getting distinct result for
> content,
> >> as
> >> > there are several documents with the exact same content, and we only
> want
> >> > to list one of them during searching.
> >> >
> >> > However, I get the following errors in some of the documents when I
> tried
> >> > to index them with the copyField. Some of my documents are quite
> large in
> >> > size, and there is a possibility that it exceed 32766 characters. Is
> >> there
> >> > any other ways to overcome this problem?
> >> >
> >> >
> >> > org.apache.solr.common.SolrException: Exception writing document id
> >> > collection1_polymer100 to the index; possible analysis error.
> >> > at
> >> >
> >>
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:167)
> >> > at
> >> >
> >>
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
> >> > at
> >> >
> >>
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
> >> > at
> >> >
> >>
> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:955)
> >> > at
> >> >
> >>
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1110)
> >> > at
> >> >
> >>
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:706)
> >> > at
> >> >
> >>
> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:104)
> >> > at
> >> >
> >>
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
> >> > at
> >> >
> >>
> org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.processAdd(LanguageIdentifierUpdateProcessor.java:207)
> >> > at
> >> >
> >>
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:122)
> >> > at
> >> >
> >>
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:127)
> >> > at
> >> >
> >>
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:235)
> >> > at
> >> >
> >>
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> >> > at
> >> >
> >>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
> >> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
> >> > at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
> >> > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
> >> > at
> >> >
> >>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
> >> > at
> >> >
> >>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
> >> > at
> >> >
> >>
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
> >> > at
> >> >
> >>
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
> >> > at
> >> >
> >>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> >> > at
> >> >
> >>
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
> >> > at
> >> >
> >>
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
> >> > at
> >> >
> >>
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
> >> > at
> >>
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
> >> > at
> >> >
> >>
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
> >> > at
> >> >
> >>
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
> >> > at
> >> >
> >>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> >> > at
> >> >
> >>
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
> >> > at
> >> >
> >>
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
> >> > at
> >> >
> >>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
> >> > at org.eclipse.jetty.server.Server.handle(Server.java:497)
> >> > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
> >> > at
> >> >
> >>
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
> >> > at
> >> >
> >>
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
> >> > at
> >> >
> >>
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
> >> > at
> >> >
> >>
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
> >> > at java.lang.Thread.run(Thread.java:745)
> >> > Caused by: java.lang.IllegalArgumentException: Document contains at
> least
> >> > one immense term in field="signature" (whose UTF8 encoding is longer
> than
> >> > the max length 32766), all of which were skipped.  Please correct the
> >> > analyzer to not produce such terms.  The prefix of the first immense
> term
> >> > is: '[32, 60, 112, 62, 60, 98, 114, 62, 32, 32, 32, 60, 98, 114, 62,
> 56,
> >> > 48, 56, 32, 72, 97, 110, 100, 98, 111, 111, 107, 32, 111, 102]...',
> >> > original message: bytes can be at most 32766 in length; got 49960
> >> > at
> >> >
> >>
> org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:670)
> >> > at
> >> >
> >>
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:344)
> >> > at
> >> >
> >>
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:300)
> >> > at
> >> >
> >>
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:232)
> >> > at
> >> >
> >>
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:458)
> >> > at
> >>
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1363)
> >> > at
> >> >
> >>
> org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:239)
> >> > at
> >> >
> >>
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:163)
> >> > ... 38 more
> >> > Caused by:
> >> > org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException:
> >> bytes
> >> > can be at most 32766 in length; got 49960
> >> > at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:284)
> >> > at
> >>
> org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:154)
> >> > at
> >> >
> >>
> org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:660)
> >> > ... 45 more
> >> >
> >> >
> >> > Regards,
> >> > Edwin
> >>
>

Reply via email to