Hi Erick, I couldn't really find anything special in the logs. The indexing process just went on normally, but after that when I check the index, there is nothing indexed.
This is what I see from the logs. Looks the same as when the indexing works fine. INFO - 2015-09-03 01:24:35.316; [collection1 shard1 core_node2 collection1] org.apache.solr.handler.extraction.SolrContentHandler; Content 1 = content INFO - 2015-09-03 01:24:35.319; [collection1 shard1 core_node2 collection1] org.apache.solr.handler.extraction.SolrContentHandler; Content 2 = content INFO - 2015-09-03 01:24:35.482; [collection1 shard1 core_node1 collection1_shard1_replica2] org.apache.solr.core.SolrCore; [collection1_shard1_replica2] webapp=/solr path=/update params={update.distrib=FROMLEADER&distrib.from= http://localhost:8983/solr/collection1/&wt=javabin&version=2 <http://192.168.23.52:8983/edm/collection1/&wt=javabin&version=2>} status=0 QTime=4 INFO - 2015-09-03 01:24:35.483; [collection1 shard1 core_node2 collection1] org.apache.solr.update.processor.LogUpdateProcessor; [collection1] webapp=/solr path=/update/extract params={literal.geolocation=1,103&literal.popularity=0&literal.title=cat&literal.entity=Growhill& resource.name =C:\Users\edwin_000\Desktop\edwin\solr-5.2.1\IndexingDocuments\collection1\cat.pdf& literal.id=collection1_cat&literal.location=Singapore&literal.accessgroup=VIP&literal.content_cat=test3&literal.crossreference=science&literal.accesslevel=8&literal.importance=5&literal.userid=edwin&literal.reference=science&literal.url=C:\Users\edwin_000\Desktop\edwin\solr-5.2.1\IndexingDocuments\collection1\cat.pdf&literal.content_subcat=test3&literal.visibility=Public} {add=[collection1_cat (1511253318382387200)]} 0 437 INFO - 2015-09-03 01:24:36.218; [collection1 shard1 core_node2 collection1] org.apache.solr.handler.extraction.SolrContentHandler; Content 1 = content INFO - 2015-09-03 01:24:36.225; [collection1 shard1 core_node2 collection1] org.apache.solr.handler.extraction.SolrContentHandler; Content 2 = content INFO - 2015-09-03 01:24:36.487; [collection1 shard1 core_node1 collection1_shard1_replica2] org.apache.solr.core.SolrCore; [collection1_shard1_replica2] webapp=/solr path=/update params={update.distrib=FROMLEADER&distrib.from= http://localhost:8983/solr/collection1/&wt=javabin&version=2 <http://192.168.23.52:8983/edm/collection1/&wt=javabin&version=2>} status=0 QTime=6 Regards, Edwin On 2 September 2015 at 23:34, Erick Erickson <erickerick...@gmail.com> wrote: > _How_ does it fail? You must be seeing something in the logs.... > > > > On Wed, Sep 2, 2015 at 8:29 AM, Zheng Lin Edwin Yeo > <edwinye...@gmail.com> wrote: > > Hi Erick, > > > > Yes, i'm trying out the De-Duplication too. But I'm facing a problem with > > that, which is the indexing stops working once I put in the following > > De-Duplication code in solrconfig.xml. The problem seems to be with this > <str > > name="update.chain">dedupe</str> line. > > > > <requestHandler name="/update" class="solr.UpdateRequestHandler"> > > <lst name="defaults"> > > <str name="update.chain">dedupe</str> > > </lst> > > </requestHandler> > > > > > > <updateRequestProcessorChain name="dedupe"> > > <processor class="solr.processor.SignatureUpdateProcessorFactory"> > > <bool name="enabled">true</bool> > > <str name="signatureField">signature</str> > > <bool name="overwriteDupes">false</bool> > > <str name="fields">content</str> > > <str name="signatureClass">solr.processor.Lookup3Signature</str> > > </processor> > > </updateRequestProcessorChain> > > > > > > Regards, > > Edwin > > > > On 2 September 2015 at 23:10, Erick Erickson <erickerick...@gmail.com> > > wrote: > > > >> Yes, that is an intentional limit for the size of a single token, > >> which strings are. > >> > >> Why not use deduplication? See: > >> https://cwiki.apache.org/confluence/display/solr/De-Duplication > >> > >> You don't have to replace the existing documents, and Solr will > >> compute a hash that can be used to identify identical documents > >> and you can use_that_. > >> > >> Best > >> Erick > >> > >> On Wed, Sep 2, 2015 at 2:53 AM, Zheng Lin Edwin Yeo > >> <edwinye...@gmail.com> wrote: > >> > Hi, > >> > > >> > I would like to check, is the string bytes must be at most 32766 > >> characters > >> > in length? > >> > > >> > I'm trying to do a copyField of my rich-text documents content to a > field > >> > with fieldType=string to try out my getting distinct result for > content, > >> as > >> > there are several documents with the exact same content, and we only > want > >> > to list one of them during searching. > >> > > >> > However, I get the following errors in some of the documents when I > tried > >> > to index them with the copyField. Some of my documents are quite > large in > >> > size, and there is a possibility that it exceed 32766 characters. Is > >> there > >> > any other ways to overcome this problem? > >> > > >> > > >> > org.apache.solr.common.SolrException: Exception writing document id > >> > collection1_polymer100 to the index; possible analysis error. > >> > at > >> > > >> > org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:167) > >> > at > >> > > >> > org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69) > >> > at > >> > > >> > org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) > >> > at > >> > > >> > org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:955) > >> > at > >> > > >> > org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1110) > >> > at > >> > > >> > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:706) > >> > at > >> > > >> > org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:104) > >> > at > >> > > >> > org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) > >> > at > >> > > >> > org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.processAdd(LanguageIdentifierUpdateProcessor.java:207) > >> > at > >> > > >> > org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:122) > >> > at > >> > > >> > org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:127) > >> > at > >> > > >> > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:235) > >> > at > >> > > >> > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) > >> > at > >> > > >> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) > >> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064) > >> > at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654) > >> > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450) > >> > at > >> > > >> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227) > >> > at > >> > > >> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196) > >> > at > >> > > >> > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) > >> > at > >> > > >> > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) > >> > at > >> > > >> > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > >> > at > >> > > >> > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) > >> > at > >> > > >> > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) > >> > at > >> > > >> > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) > >> > at > >> > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) > >> > at > >> > > >> > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > >> > at > >> > > >> > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) > >> > at > >> > > >> > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > >> > at > >> > > >> > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) > >> > at > >> > > >> > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) > >> > at > >> > > >> > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) > >> > at org.eclipse.jetty.server.Server.handle(Server.java:497) > >> > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) > >> > at > >> > > >> > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) > >> > at > >> > > >> > org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) > >> > at > >> > > >> > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) > >> > at > >> > > >> > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) > >> > at java.lang.Thread.run(Thread.java:745) > >> > Caused by: java.lang.IllegalArgumentException: Document contains at > least > >> > one immense term in field="signature" (whose UTF8 encoding is longer > than > >> > the max length 32766), all of which were skipped. Please correct the > >> > analyzer to not produce such terms. The prefix of the first immense > term > >> > is: '[32, 60, 112, 62, 60, 98, 114, 62, 32, 32, 32, 60, 98, 114, 62, > 56, > >> > 48, 56, 32, 72, 97, 110, 100, 98, 111, 111, 107, 32, 111, 102]...', > >> > original message: bytes can be at most 32766 in length; got 49960 > >> > at > >> > > >> > org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:670) > >> > at > >> > > >> > org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:344) > >> > at > >> > > >> > org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:300) > >> > at > >> > > >> > org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:232) > >> > at > >> > > >> > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:458) > >> > at > >> > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1363) > >> > at > >> > > >> > org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:239) > >> > at > >> > > >> > org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:163) > >> > ... 38 more > >> > Caused by: > >> > org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException: > >> bytes > >> > can be at most 32766 in length; got 49960 > >> > at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:284) > >> > at > >> > org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:154) > >> > at > >> > > >> > org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:660) > >> > ... 45 more > >> > > >> > > >> > Regards, > >> > Edwin > >> >