Re: Idle Timeout while DIH indexing and implicit sharding in 7.4

Mikhail Khludnev Fri, 14 Sep 2018 02:09:56 -0700

Hello, Vadim.
My guess (and only guess) that bunch of updates coming into a shard causes
a heavy merge that blocks new updates in its' order. This can be verified
with logs or threaddump from the problematic node. The probable measures
are: try to shuffle updates to load other shards for a while and let
parallel merge to pack that shard. And just wait a little by increasing
timeout in jetty.
Let us know what you will encounter.


On Thu, Sep 13, 2018 at 3:54 PM Vadim Ivanov <
vadim.iva...@spb.ntk-intourist.ru> wrote:

> Hi,
> I've put some more tests on the issue and managed to find out more details.
> Time out occurs when while long indexing some documents in the beginning is
> going to one shard and then for a long time (more than 120 sec) no data at
> all is going to that shard.
> Connection to that core, opened in the beginning of indexing, goes to  idle
> timeout :( .
> If no data at all going to the shard during indexing - no timeout occurs on
> that shard.
> If Indexing finishes earlier than 120 sec - no timeout occurs on that
> shard.
> Unfortunately, in our use-case there are lot of long  indexing up to 30
> minutes with uneven shard distribution of documents.
> Any suggestion how to mitigate issue?
> --
> BR
> Vadim Ivanov
>
>
> -----Original Message-----
> From: Вадим Иванов [mailto:vadim.iva...@spb.ntk-intourist.ru]
> Sent: Wednesday, September 12, 2018 4:29 PM
> To: solr-user@lucene.apache.org
> Subject: Idle Timeout while DIH indexing and implicit sharding in 7.4
>
> Hello gurus,
> I am using solrCloud with DIH for indexing my data.
> Testing 7.4.0 with implicitly sharded collection  I have noticed that any
> indexing
> longer then 2 minutes always failing with many timeout records in log
> coming
> from all replicas in collection.
>
> Such as:
> x:Mycol_s_0_replica_t40 RequestHandlerBase
> java.io.IOException: java.util.concurrent.TimeoutException: Idle timeout
> expired: 120001/120000 ms
> null:java.io.IOException: java.util.concurrent.TimeoutException: Idle
> timeout expired: 120000/120000 ms
>         at
>
> org.eclipse.jetty.server.HttpInput$ErrorState.noContent(HttpInput.java:1075)
>         at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:313)
>         at
>
> org.apache.solr.servlet.ServletInputStreamWrapper.read(ServletInputStreamWra
> pper.java:74)
> ...
> Caused by: java.util.concurrent.TimeoutException: Idle timeout expired:
> 120000/120000 ms
>         at
> org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:166)
>         at org.eclipse.jetty.io.IdleTimeout$1.run(IdleTimeout.java:50)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at
>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$
> 201(ScheduledThreadPoolExecutor.java:180)
>         at
>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Sch
> eduledThreadPoolExecutor.java:293)
>         at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
> 49)
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
> 24)
>         ... 1 more
>         Suppressed: java.lang.Throwable: HttpInput failure
>                 at
> org.eclipse.jetty.server.HttpInput.failed(HttpInput.java:821)
>                 at
>
> org.eclipse.jetty.server.HttpConnection$BlockingReadCallback.failed(HttpConn
> ection.java:649)
>                 at
> org.eclipse.jetty.io.FillInterest.onFail(FillInterest.java:134)
>
> Resulting indexing status:
>   "statusMessages":{
>     "Total Requests made to DataSource":"1",
>     "Total Rows Fetched":"2828323",
>     "Total Documents Processed":"2828323",
>     "Total Documents Skipped":"0",
>     "Full Dump Started":"2018-09-12 14:28:21",
>     "":"Indexing completed. Added/Updated: 2828323 documents. Deleted 0
> documents.",
>     "Committed":"2018-09-12 14:33:41",
>     "Time taken":"0:5:19.507",
>     "Full Import failed":"2018-09-12 14:33:41"}}
>
> Nevertheless all these documents seems indexed fine and searchable.
> If the same collection not sharded  or sharded as " compositeId"   indexing
> done without any errors.
> Type of replicas - nrt or tolg doesn't matter.
> Small Indexing (taking less than 2 minutes) run smoothly.
>
> Testing environment - 1 node, Collection with 6 shards, 1 replica for each
> shard
> Collection:
> /admin/collections?action=CREATE&name=Mycol
>         &numShards=6
>         &router.name=implicit
>         &shards=s_0,s_1,s_2,s_3,s_4,s_5
>         &router.field=sf_shard
>         &collection.configName=Mycol
>         &maxShardsPerNode=10
>         &nrtReplicas=0&tlogReplicas=1
>
>
> I have never noticed such behavior before on my prod configuration (solr
> 6.3.0)
> Seems like bug in new version, but I could not find any jira on issue.
>
> Any ideas, please...
>
> --
> BR
> Vadim Ivanov
>
>

-- 
Sincerely yours
Mikhail Khludnev

Re: Idle Timeout while DIH indexing and implicit sharding in 7.4

Reply via email to