Hello, Vadim. My guess (and only guess) that bunch of updates coming into a shard causes a heavy merge that blocks new updates in its' order. This can be verified with logs or threaddump from the problematic node. The probable measures are: try to shuffle updates to load other shards for a while and let parallel merge to pack that shard. And just wait a little by increasing timeout in jetty. Let us know what you will encounter.
On Thu, Sep 13, 2018 at 3:54 PM Vadim Ivanov < vadim.iva...@spb.ntk-intourist.ru> wrote: > Hi, > I've put some more tests on the issue and managed to find out more details. > Time out occurs when while long indexing some documents in the beginning is > going to one shard and then for a long time (more than 120 sec) no data at > all is going to that shard. > Connection to that core, opened in the beginning of indexing, goes to idle > timeout :( . > If no data at all going to the shard during indexing - no timeout occurs on > that shard. > If Indexing finishes earlier than 120 sec - no timeout occurs on that > shard. > Unfortunately, in our use-case there are lot of long indexing up to 30 > minutes with uneven shard distribution of documents. > Any suggestion how to mitigate issue? > -- > BR > Vadim Ivanov > > > -----Original Message----- > From: Вадим Иванов [mailto:vadim.iva...@spb.ntk-intourist.ru] > Sent: Wednesday, September 12, 2018 4:29 PM > To: solr-user@lucene.apache.org > Subject: Idle Timeout while DIH indexing and implicit sharding in 7.4 > > Hello gurus, > I am using solrCloud with DIH for indexing my data. > Testing 7.4.0 with implicitly sharded collection I have noticed that any > indexing > longer then 2 minutes always failing with many timeout records in log > coming > from all replicas in collection. > > Such as: > x:Mycol_s_0_replica_t40 RequestHandlerBase > java.io.IOException: java.util.concurrent.TimeoutException: Idle timeout > expired: 120001/120000 ms > null:java.io.IOException: java.util.concurrent.TimeoutException: Idle > timeout expired: 120000/120000 ms > at > > org.eclipse.jetty.server.HttpInput$ErrorState.noContent(HttpInput.java:1075) > at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:313) > at > > org.apache.solr.servlet.ServletInputStreamWrapper.read(ServletInputStreamWra > pper.java:74) > ... > Caused by: java.util.concurrent.TimeoutException: Idle timeout expired: > 120000/120000 ms > at > org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:166) > at org.eclipse.jetty.io.IdleTimeout$1.run(IdleTimeout.java:50) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$ > 201(ScheduledThreadPoolExecutor.java:180) > at > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Sch > eduledThreadPoolExecutor.java:293) > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11 > 49) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6 > 24) > ... 1 more > Suppressed: java.lang.Throwable: HttpInput failure > at > org.eclipse.jetty.server.HttpInput.failed(HttpInput.java:821) > at > > org.eclipse.jetty.server.HttpConnection$BlockingReadCallback.failed(HttpConn > ection.java:649) > at > org.eclipse.jetty.io.FillInterest.onFail(FillInterest.java:134) > > Resulting indexing status: > "statusMessages":{ > "Total Requests made to DataSource":"1", > "Total Rows Fetched":"2828323", > "Total Documents Processed":"2828323", > "Total Documents Skipped":"0", > "Full Dump Started":"2018-09-12 14:28:21", > "":"Indexing completed. Added/Updated: 2828323 documents. Deleted 0 > documents.", > "Committed":"2018-09-12 14:33:41", > "Time taken":"0:5:19.507", > "Full Import failed":"2018-09-12 14:33:41"}} > > Nevertheless all these documents seems indexed fine and searchable. > If the same collection not sharded or sharded as " compositeId" indexing > done without any errors. > Type of replicas - nrt or tolg doesn't matter. > Small Indexing (taking less than 2 minutes) run smoothly. > > Testing environment - 1 node, Collection with 6 shards, 1 replica for each > shard > Collection: > /admin/collections?action=CREATE&name=Mycol > &numShards=6 > &router.name=implicit > &shards=s_0,s_1,s_2,s_3,s_4,s_5 > &router.field=sf_shard > &collection.configName=Mycol > &maxShardsPerNode=10 > &nrtReplicas=0&tlogReplicas=1 > > > I have never noticed such behavior before on my prod configuration (solr > 6.3.0) > Seems like bug in new version, but I could not find any jira on issue. > > Any ideas, please... > > -- > BR > Vadim Ivanov > > -- Sincerely yours Mikhail Khludnev