Hi Val,
Thanks for pointing out the errors. Since i was getting errors related to GC pauses, I had set offHeapMaxMemory to 0 as per the below link. http://apacheignite.gridgain.org/docs/performance-tips#tune-off-heap-memory But now i have set it to -1 After doing the required changes and exposing the ports while creating the container, i do not see those errors. Also i am able to ping the ignite ports. However the servers are still not joining. Getting the below errors on VM1 container which is started first. Initially the other node joins but again it gets removed from the cluster. [19:01:19,827][INFO][disco-event-worker-#44%null%][GridDiscoveryManager] *Added new node to topology**:* TcpDiscoveryNode [id=13e5b190-fa8e-4c8c-be62-9f09ae6fadb9, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.17.0.1, 172.18.0.1, 172.20.29.33], sockAddrs=[/ 172.20.29.33:47500, /172.17.0.1:47500, /0:0:0:0:0:0:0:1%lo:47500, / 127.0.0.1:47500, /172.18.0.1:47500], discPort=47500, order=14, intOrder=8, lastExchangeTime=1480618879814, loc=false, ver=1.7.0#20160801-sha1:383273e3, isClient=false] [19:01:19,828][INFO][disco-event-worker-#44%null%][GridDiscoveryManager] *Topology snapshot [ver=15, servers=2, clients=0, CPUs=4, heap=2.0GB]* [19:01:19,829][WARNING][disco-event-worker-#44%null%][GridDiscoveryManager*] Node FAILED:* TcpDiscoveryNode [id=13e5b190-fa8e-4c8c-be62-9f09ae6fadb9, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.17.0.1, 172.18.0.1, 172.20.29.33], sockAddrs=[/172.20.29.33:47500, /172.17.0.1:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, /172.18.0.1:47500], discPort=47500, order=14, intOrder=8, lastExchangeTime=1480618879814, loc=false, ver=1.7.0#20160801-sha1:383273e3, isClient=false] [19:01:19,829][INFO][disco-event-worker-#44%null%][GridDiscoveryManager] *Topology snapshot [ver=15, servers=1, clients=0, CPUs=2, heap=1.0GB]* [19:01:19,846][INFO][exchange-worker-#47%null%][GridCachePartitionExchangeManager] Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=14, minorTopVer=0], evt=NODE_JOINED, node=13e5b190-fa8e-4c8c-be62-9f09ae6fadb9] [19:01:19,865][INFO][exchange-worker-#47%null%][GridCachePartitionExchangeManager] Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=15, minorTopVer=0], evt=NODE_FAILED, node=13e5b190-fa8e-4c8c-be62-9f09ae6fadb9] Getting the errors on VM2 container. Although in VM1 we get a message that this container has joined but not joining message is there in its log [18:59:27,919][WARNING][main][TcpDiscoverySpi] *Node has not been connected to topology and will repeat join process.* Check remote nodes logs for possible error messages. Note that large topology may require significant time to start. Increase 'TcpDiscoverySpi.networkTimeout' configuration property if getting this message on the starting nodes [networkTimeout=20000] Please find below my current configuration [Since the issue persisted, I have also included the Non-loopback local IPs as well] I have retained the loopback address else the node is not coming up. <bean id="grid.cfg" class="org.apache.ignite.configuration.IgniteConfiguration"> <property name="cacheConfiguration"> <bean class="org.apache.ignite.configuration.CacheConfiguration"> <property name="offHeapMaxMemory" value="-1"/> </bean> </property> <!-- Explicitly configure TCP discovery SPI to provide list of initial nodes. --> <property name="discoverySpi"> <bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi"> <property name="localPort" value="47500"/> <property name="networkTimeout" value="20000" /> <property name="ipFinder"> <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder"> <property name="addresses"> <list> <!-- In distributed environment, replace with actual host IP address.> --> <value>127.0.0.1:47500..47509</value> <value>172.26.116.67:47500..47509</value> <value>172.18.0.1:47500..47509</value> <value>172.17.0.1:47500..47509</value> <value>172.20.29.33:47500..47509</value> </list> </property> </bean> </property> <property name="ackTimeout" value="50"/> <property name="socketTimeout" value="200"/> <property name="heartbeatFrequency" value="100"/> </bean> </property> <property name="communicationSpi"> <bean class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi"> <!-- Override local port. --> <property name="localPort" value="47100"/> <property name="sharedMemoryPort" value="-1"/> </bean> </property> </bean> Can you please help? Do we need to use address resolver? On Wed, Nov 30, 2016 at 5:12 PM, vkulichenko <valentin.kuliche...@gmail.com> wrote: > Hi, > > This configuration is incorrect: > > <value>127.0.0.1:47100..47509</value> > <value>172.26.116.67:47100..47509</value> > > First of all, since you're building a distributed cluster, you should not > use loopback here. Put real addresses that discovery component binds to. > Second of all, 47100 is a port for communication, not for discovery, so > ranges should be 47500..47509 instead. > > In addition, please check that you not only can ping containers from each > other, but that you can telnet to Ignite ports after first node is started. > Sometimes it happens with Docker, that you can explicitly open ports. > > And finally, setting offHeapMaxMemory to zero actually enables and makes > off-heap storage unlimited. To disable it should be set to -1, but this is > actually a default value, so you don't need to do this either. > > -Val > > > > -- > View this message in context: http://apache-ignite-users. > 70518.x6.nabble.com/Unable-to-create-cluster-of-Apache- > Ignite-Server-Containers-running-on-individual-VMs-tp9287p9314.html > Sent from the Apache Ignite Users mailing list archive at Nabble.com. > -- *Thanks & Regards,Piali Mazumder Nath* *+1 415 629 7019*