So I decreased the number of spark executors to 2, and the problem went
away.
However, what's the general guidline about number of nodes/clients that can
write to the cluster at the same time?
1) How does one increase write throuput without increasing number of
clients (the server nodes are underutilized at the moment)
2) We have use cases where we many have many clients writing from different
sources

Cheers,
Eugene

On Fri, Aug 24, 2018 at 11:51 AM, eugene miretsky <[email protected]
> wrote:

> Attached is the error I get from ignitevisorcmd.sh after calling the cache
> command (the command just hangs).
> To me it looks like all the spark executrors (10 in my test) start a new
> client node, and some of those nodes get terminated and restarted as the
> executor die. This seems to really confuse Ignite.
>
> [15:45:10,741][INFO][grid-nio-worker-tcp-comm-0-#23%console%][TcpCommunicationSpi]
> Established outgoing communication connection [locAddr=/127.0.0.1:40984,
> rmtAddr=/127.0.0.1:47101]
>
> [15:45:10,741][INFO][grid-nio-worker-tcp-comm-1-#24%console%][TcpCommunicationSpi]
> Established outgoing communication connection [locAddr=/127.0.0.1:49872,
> rmtAddr=/127.0.0.1:47100]
>
> [15:45:10,742][INFO][grid-nio-worker-tcp-comm-3-#26%console%][TcpCommunicationSpi]
> Established outgoing communication connection [locAddr=/127.0.0.1:40988,
> rmtAddr=/127.0.0.1:47101]
>
> [15:45:10,743][INFO][grid-nio-worker-tcp-comm-1-#24%console%][TcpCommunicationSpi]
> Accepted incoming communication connection [locAddr=/127.0.0.1:47101,
> rmtAddr=/127.0.0.1:40992]
>
> [15:45:10,745][INFO][grid-nio-worker-tcp-comm-0-#23%console%][TcpCommunicationSpi]
> Established outgoing communication connection [locAddr=/127.0.0.1:49876,
> rmtAddr=/127.0.0.1:47100]
>
> [15:45:11,725][SEVERE][grid-nio-worker-tcp-comm-2-#25%console%][TcpCommunicationSpi]
> Failed to process selector key [ses=GridSelectorNioSessionImpl
> [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=2,
> bytesRcvd=180, bytesSent=18, bytesRcvd0=18, bytesSent0=0, select=true,
> super=GridWorker [name=grid-nio-worker-tcp-comm-2,
> igniteInstanceName=console, finished=false, hashCode=1827979135,
> interrupted=false, runner=grid-nio-worker-tcp-comm-2-#25%console%]]],
> writeBuf=java.nio.DirectByteBuffer[pos=0 lim=166400 cap=166400],
> readBuf=java.nio.DirectByteBuffer[pos=18 lim=18 cap=117948],
> inRecovery=null, outRecovery=null, super=GridNioSessionImpl [locAddr=/
> 172.21.85.37:39942, rmtAddr=ip-172-21-85-213.ap-south-1.compute.internal/
> 172.21.85.213:47100, createTime=1535125510724, closeTime=0, bytesSent=0,
> bytesRcvd=18, bytesSent0=0, bytesRcvd0=18, sndSchedTime=1535125510724,
> lastSndTime=1535125510724, lastRcvTime=1535125510724, readsPaused=false,
> filterChain=FilterChain[filters=[GridNioCodecFilter
> [parser=o.a.i.i.util.nio.GridDirectParser@7ae6182a, directMode=true],
> GridConnectionBytesVerifyFilter], accepted=false]]]
>
> java.lang.NullPointerException
>
>         at org.apache.ignite.internal.util.nio.GridNioServer.
> cancelConnect(GridNioServer.java:885)
>
>         at org.apache.ignite.spi.communication.tcp.internal.
> TcpCommunicationConnectionCheckFuture$SingleAddressConnectFuture.cancel(
> TcpCommunicationConnectionCheckFuture.java:338)
>
>         at org.apache.ignite.spi.communication.tcp.internal.
> TcpCommunicationConnectionCheckFuture$MultipleAddressesConnectFuture
> .cancelFutures(TcpCommunicationConnectionCheckFuture.java:475)
>
>         at org.apache.ignite.spi.communication.tcp.internal.
> TcpCommunicationConnectionCheckFuture$MultipleAddressesConnectFuture
> .receivedAddressStatus(TcpCommunicationConnectionCheckFuture.java:494)
>
>         at org.apache.ignite.spi.communication.tcp.internal.
> TcpCommunicationConnectionCheckFuture$MultipleAddressesConnectFuture
> $1.onStatusReceived(TcpCommunicationConnectionCheckFuture.java:433)
>
>         at org.apache.ignite.spi.communication.tcp.internal.
> TcpCommunicationConnectionCheckFuture$SingleAddressConnectFuture.finish(
> TcpCommunicationConnectionCheckFuture.java:362)
>
>         at org.apache.ignite.spi.communication.tcp.internal.
> TcpCommunicationConnectionCheckFuture$SingleAddressConnectFuture.
> onConnected(TcpCommunicationConnectionCheckFuture.java:348)
>
>         at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$2.
> onMessage(TcpCommunicationSpi.java:773)
>
>         at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$2.
> onMessage(TcpCommunicationSpi.java:383)
>
>         at org.apache.ignite.internal.util.nio.GridNioFilterChain$
> TailFilter.onMessageReceived(GridNioFilterChain.java:279)
>
>         at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.
> proceedMessageReceived(GridNioFilterAdapter.java:109)
>
>         at org.apache.ignite.internal.util.nio.GridNioCodecFilter.
> onMessageReceived(GridNioCodecFilter.java:117)
>
>         at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.
> proceedMessageReceived(GridNioFilterAdapter.java:109)
>
>         at org.apache.ignite.internal.util.nio.
> GridConnectionBytesVerifyFilter.onMessageReceived(
> GridConnectionBytesVerifyFilter.java:88)
>
>         at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.
> proceedMessageReceived(GridNioFilterAdapter.java:109)
>
>         at org.apache.ignite.internal.util.nio.GridNioServer$
> HeadFilter.onMessageReceived(GridNioServer.java:3490)
>
>
> On Fri, Aug 24, 2018 at 11:18 AM, eugene miretsky <
> [email protected]> wrote:
>
>>  Thanks,
>>
>> So the way I understand it, thick client will use the affinitly key to
>> send data to the right node, and hence will split the traiffic between all
>> the nodes, the thin client will just send the data to one node, and that
>> node will be responsible to send it to the actual node that owns the
>> 'shard'?
>>
>> I keep getting the following error when using the Spark driver, the
>> driver keeps writing, but very slowly. Any idea what is causing the error,
>> or how to fix it?
>>
>> Cheers,
>> Eugene
>>
>> "
>>
>> [15:04:58,030][SEVERE][data-streamer-stripe-10-#43%Server%][DataStreamProcessor]
>> Failed to respond to node [nodeId=78af5d88-cbfa-4529-aaee-ff4982985cdf,
>> res=DataStreamerResponse [reqId=192, forceLocDep=true]]
>>
>> class org.apache.ignite.IgniteCheckedException: Failed to send message
>> (node may have left the grid or TCP connection cannot be established due to
>> firewall issues) [node=ZookeeperClusterNode 
>> [id=78af5d88-cbfa-4529-aaee-ff4982985cdf,
>> addrs=[127.0.0.1], order=377, loc=false, client=true], topic=T1
>> [topic=TOPIC_DATASTREAM, 
>> id=b8d675c6561-78af5d88-cbfa-4529-aaee-ff4982985cdf],
>> msg=DataStreamerResponse [reqId=192, forceLocDep=true], policy=9]
>>
>>         at org.apache.ignite.internal.managers.communication.GridIoMana
>> ger.send(GridIoManager.java:1651)
>>
>>         at org.apache.ignite.internal.managers.communication.GridIoMana
>> ger.sendToCustomTopic(GridIoManager.java:1703)
>>
>>         at org.apache.ignite.internal.managers.communication.GridIoMana
>> ger.sendToCustomTopic(GridIoManager.java:1673)
>>
>>         at org.apache.ignite.internal.processors.datastreamer.DataStrea
>> mProcessor.sendResponse(DataStreamProcessor.java:440)
>>
>>         at org.apache.ignite.internal.processors.datastreamer.DataStrea
>> mProcessor.localUpdate(DataStreamProcessor.java:402)
>>
>>         at org.apache.ignite.internal.processors.datastreamer.DataStrea
>> mProcessor.processRequest(DataStreamProcessor.java:305)
>>
>>         at org.apache.ignite.internal.processors.datastreamer.DataStrea
>> mProcessor.access$000(DataStreamProcessor.java:60)
>>
>>         at org.apache.ignite.internal.processors.datastreamer.DataStrea
>> mProcessor$1.onMessage(DataStreamProcessor.java:90)
>>
>>         at org.apache.ignite.internal.managers.communication.GridIoMana
>> ger.invokeListener(GridIoManager.java:1556)
>>
>>         at org.apache.ignite.internal.managers.communication.GridIoMana
>> ger.processRegularMessage0(GridIoManager.java:1184)
>>
>>         at org.apache.ignite.internal.managers.communication.GridIoMana
>> ger.access$4200(GridIoManager.java:125)
>>
>>         at org.apache.ignite.internal.managers.communication.GridIoMana
>> ger$9.run(GridIoManager.java:1091)
>>
>>         at org.apache.ignite.internal.util.StripedExecutor$Stripe.run(
>> StripedExecutor.java:511)
>>
>>         at java.lang.Thread.run(Thread.java:748)
>>
>> Caused by: class org.apache.ignite.spi.IgniteSpiException: Failed to
>> send message to remote node: ZookeeperClusterNode
>> [id=78af5d88-cbfa-4529-aaee-ff4982985cdf, addrs=[127.0.0.1], order=377,
>> loc=false, client=true]
>>
>>         at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.
>> sendMessage0(TcpCommunicationSpi.java:2718)
>>
>>         at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.
>> sendMessage(TcpCommunicationSpi.java:2651)
>>
>>         at org.apache.ignite.internal.managers.communication.GridIoMana
>> ger.send(GridIoManager.java:1643)
>>
>>         ... 13 more
>>
>> Caused by: class org.apache.ignite.IgniteCheckedException: Failed to
>> connect to node (is node still alive?). Make sure that each ComputeTask and
>> cache Transaction has a timeout set in order to prevent parties from
>> waiting forever in case of network issues 
>> [nodeId=78af5d88-cbfa-4529-aaee-ff4982985cdf,
>> addrs=[/127.0.0.1:47101]]
>>
>>         at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.
>> createTcpClient(TcpCommunicationSpi.java:3422)
>>
>>         at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.
>> createNioClient(TcpCommunicationSpi.java:2958)
>>
>>         at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.
>> reserveClient(TcpCommunicationSpi.java:2841)
>>
>>         at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.
>> sendMessage0(TcpCommunicationSpi.java:2692)
>>
>>         ... 15 more
>>
>>         Suppressed: class org.apache.ignite.IgniteCheckedException:
>> Failed to connect to address [addr=/127.0.0.1:47101, err=Connection
>> refused]
>>
>>                 at org.apache.ignite.spi.communic
>> ation.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicati
>> onSpi.java:3425)
>>
>>                 ... 18 more
>>
>>         Caused by: java.net.ConnectException: Connection refused
>>
>>                 at sun.nio.ch.SocketChannelImpl.checkConnect(Native
>> Method)
>>
>>                 at sun.nio.ch.SocketChannelImpl.f
>> inishConnect(SocketChannelImpl.java:717)
>>
>>                 at sun.nio.ch.SocketAdaptor.conne
>> ct(SocketAdaptor.java:111)
>>
>>                 at org.apache.ignite.spi.communic
>> ation.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicati
>> onSpi.java:3262)
>>
>>                 ... 18 more
>>
>> "
>>
>> On Tue, Aug 14, 2018 at 4:39 PM, akurbanov <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> Spark integration was implemented before java thin client was released
>>> and
>>> thick client performs better than thin one in general. Is your question
>>> related to existence of benchmarks for thin vs thick clients in Spark
>>> integration or just a comparison of these two options?
>>>
>>> Thin clients' functionality is limited compared to thick client, also it
>>> generally should be a bit slower as it is communicates not with whole
>>> cluster, but only with a single node and is not partition-aware. This
>>> introduces additional network costs which may affect performance
>>> compared to
>>> thick client in the simplest and ideal conditions where network transfer
>>> is
>>> a major part of workload.
>>>
>>> However this performance decrease may be completely irrelevant depending
>>> on
>>> use case and workload, so you should always measure peformance and do
>>> benchmarks for a specific use case and make a decision which option suits
>>> your needs more.
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>>
>>
>>
>

Reply via email to