We could also see below on server side logs as well.

Rejected connection from Server connection from
>> [client host address=x.yx.x.x; client port=abc] because incoming
>> request was rejected by pool possibly due to thread exhaustion
>>


On Tue, Jun 25, 2019, 7:27 AM aashish choudhary <
[email protected]> wrote:

> As I mentioned earlier threads count could go to 4000 and we have seen
> readtimeout crossing default 10 seconds. We tried to increase read timeout
> to 30 seconds but that didn't work either. Record count is not more than
> 600k.
>
> Job gets successful in second attempt without changing anything which is
> bit weird.
>
> With best regards,
> Ashish
>
> On Tue, Jun 25, 2019, 12:23 AM Anilkumar Gingade <[email protected]>
> wrote:
>
>> Hi Ashish,
>>
>> How many threads at a time executing putAll jobs in a single client
>> (spark job?)...
>> Do you see read timeout exception in client logs...If so, can you try
>> increasing the read timeout value. Or reducing the putAll size.
>>
>> In case of PutAll for partitioned region; the putAll (entries) size is
>> broken down and sent to respective servers based on its data affinity; the
>> reason its working with partitioned region.
>>
>> You can find more detail on how client-server connection works at:
>>
>> https://geode.apache.org/docs/guide/14/topologies_and_comm/topology_concepts/how_the_pool_manages_connections.html
>>
>> -Anil.
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Jun 24, 2019 at 10:04 AM aashish choudhary <
>> [email protected]> wrote:
>>
>>> Hi,
>>>
>>> We have been experiencing issues while connect to geode using putAll API
>>> with spark. Issue is specific to one particular spark job which tries to
>>> load data to a replicated region. Exception we see in the server side is
>>> that default limit of 800 gets maxed out and on client side we see retry
>>> attempt to each server but gets failed even though when we re ran the same
>>> job it gets completed without any issue.
>>>
>>> In the code problem I could see is that we are connecting to geode using
>>> client cache in forEachPartition which I think could be the issue. So for
>>> each partition we are making a connection to geode. In stats file we could
>>> see that connections getting timeout and there is thread burst also
>>> sometimes >4000.
>>>
>>> What is the recommended way to connect to geode using spark?
>>>
>>> But this one specific job which gets failed most of the times and is a
>>> replicated region. Also when we change the type of region to partitioned
>>> then job gets completed. We have enabled disk persistence for both type of
>>> regions.
>>>
>>> Thoughts?
>>>
>>>
>>>
>>> With best regards,
>>> Ashish
>>>
>>

Reply via email to