We could also see below on server side logs as well. Rejected connection from Server connection from >> [client host address=x.yx.x.x; client port=abc] because incoming >> request was rejected by pool possibly due to thread exhaustion >>
On Tue, Jun 25, 2019, 7:27 AM aashish choudhary < [email protected]> wrote: > As I mentioned earlier threads count could go to 4000 and we have seen > readtimeout crossing default 10 seconds. We tried to increase read timeout > to 30 seconds but that didn't work either. Record count is not more than > 600k. > > Job gets successful in second attempt without changing anything which is > bit weird. > > With best regards, > Ashish > > On Tue, Jun 25, 2019, 12:23 AM Anilkumar Gingade <[email protected]> > wrote: > >> Hi Ashish, >> >> How many threads at a time executing putAll jobs in a single client >> (spark job?)... >> Do you see read timeout exception in client logs...If so, can you try >> increasing the read timeout value. Or reducing the putAll size. >> >> In case of PutAll for partitioned region; the putAll (entries) size is >> broken down and sent to respective servers based on its data affinity; the >> reason its working with partitioned region. >> >> You can find more detail on how client-server connection works at: >> >> https://geode.apache.org/docs/guide/14/topologies_and_comm/topology_concepts/how_the_pool_manages_connections.html >> >> -Anil. >> >> >> >> >> >> >> >> On Mon, Jun 24, 2019 at 10:04 AM aashish choudhary < >> [email protected]> wrote: >> >>> Hi, >>> >>> We have been experiencing issues while connect to geode using putAll API >>> with spark. Issue is specific to one particular spark job which tries to >>> load data to a replicated region. Exception we see in the server side is >>> that default limit of 800 gets maxed out and on client side we see retry >>> attempt to each server but gets failed even though when we re ran the same >>> job it gets completed without any issue. >>> >>> In the code problem I could see is that we are connecting to geode using >>> client cache in forEachPartition which I think could be the issue. So for >>> each partition we are making a connection to geode. In stats file we could >>> see that connections getting timeout and there is thread burst also >>> sometimes >4000. >>> >>> What is the recommended way to connect to geode using spark? >>> >>> But this one specific job which gets failed most of the times and is a >>> replicated region. Also when we change the type of region to partitioned >>> then job gets completed. We have enabled disk persistence for both type of >>> regions. >>> >>> Thoughts? >>> >>> >>> >>> With best regards, >>> Ashish >>> >>
