Subroto - both tools error; openssl errno 111 - which made me check bound ports on the c* node with encryption flipped. Port 9042 is not open (determined by netstat -ant). Looking at the log differences for when a node is started with/without encryption. Without encryption, I get a bunch of lines like:
OutboundTcpConnection.java:561 - Handshaking version w/ IP And this happens after a line like Gossiper.java - Waiting for gossip to settle... with encryption toggled to 'dc', I don't see any of those lines; presumable b/c the gossiper is trying to start but doesn't. On Mon, Aug 26, 2019 at 6:51 PM Subroto Barua <sbarua...@yahoo.com.invalid> wrote: > Michael, > > Are you able to connect to any c* node via OpenSSL? > > Openssl s_client -connect <ip address >:9042 > > Cqlsh <ip address> —ssl > > Subroto > > On Aug 26, 2019, at 2:47 PM, Marc Selwan <marc.sel...@datastax.com> wrote: > > which exact version of OpenJDK are you using? Is it possible you don't > have JCE on those nodes? (I believe more recent versions of Java 8 has this > baked in so that might not be it) > > > *Marc Selwan | *DataStax *| *PM, Server Team *|* *(925) 413-7079* *|* > Twitter <https://twitter.com/MarcSelwan> > > * Quick links | *DataStax <http://www.datastax.com> *| *Training > <http://www.academy.datastax.com> *| *Documentation > <http://www.datastax.com/documentation/getting_started/doc/getting_started/gettingStartedIntro_r.html> > *| *Downloads <http://www.datastax.com/download> > > > > On Mon, Aug 26, 2019 at 1:56 PM Michael Carlise < > mcarl...@salesforce.com.invalid> wrote: > >> >> I originally opened this issue on stackoverflow ( >> https://stackoverflow.com/questions/57516660/cassandra-node-to-node-encryption-throws-unable-to-gossip-with-peers-exception >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__stackoverflow.com_questions_57516660_cassandra-2Dnode-2Dto-2Dnode-2Dencryption-2Dthrows-2Dunable-2Dto-2Dgossip-2Dwith-2Dpeers-2Dexception&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=E6NVfMr2TIhW42QMfARTvsfCLtdF-oEA3KfAQRfVZdk&m=KdhQDpMbz8v1GYrbdYL_opGq-GBPXftrEYEkgcGeMp0&s=4CR8PRQopb4FyCLj8PDI44mSouBz65Yx8THnH8tOb7o&e=> >> ). >> >> However, I haven't gotten any responses in over a week. I'm going to >> post it here and maybe someone will have an idea on where I can look. >> >> We currently run a multi region cassandra cluster in AWS. It runs in four >> regions, 12 nodes per region. It runs without node to node encryption (or >> client encryption either). We are trying to enable inter datacenter node to >> node encryption. However, when we flip encryption over we get an exception >> that nodes are unable to gossip with any peers. >> >> It could possibly be that we didn't build our jks keystore/truststores >> correctly (more on how we built these files below). But, we additionally do >> not see intra datacenter communication working (which should be set to >> unencrypted communication). Additionally, cqlsh cannot connect to the node >> either; even though we have (by default) client_auth_required set to >> false. >> >> ERROR [main] 2019-08-15 18:46:32,241 CassandraDaemon.java:749 - Exception >> encountered during startup >> java.lang.RuntimeException: Unable to gossip with any peers >> at >> org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1435) >> ~[apache-cassandra-3.11.4.jar:3.11.4] >> at >> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:566) >> ~[apache-cassandra-3.11.4.jar:3.11.4] >> at >> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:823) >> ~[apache-cassandra-3.11.4.jar:3.11.4] >> at >> org.apache.cassandra.service.StorageService.initServer(StorageService.java:683) >> ~[apache-cassandra-3.11.4.jar:3.11.4] >> at >> org.apache.cassandra.service.StorageService.initServer(StorageService.java:632) >> ~[apache-cassandra-3.11.4.jar:3.11.4] >> at >> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:388) >> [apache-cassandra-3.11.4.jar:3.11.4] >> at >> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:620) >> [apache-cassandra-3.11.4.jar:3.11.4] >> at >> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:732) >> [apache-cassandra-3.11.4.jar:3.11.4] >> INFO [main] 2019-08-15 18:47:07,384 YamlConfigurationLoader.java:89 - >> Configuration location: file:/etc/cassandra/cassandra.yaml >> >> >> Something to note is that this error message occurs after a few minutes >> of the node being up. (i.e. there is a delay between start up before this >> exception is thrown). >> >> *Information about our cassandra setup* >> >> cassandra version: 3.11.4 >> JDK version: openjdk-8. >> Linux: Ubuntu 18.04 (bionic). >> >> *cassandra.yaml* >> >> endpoint_snitch: Ec2MultiRegionSnitch >> >> server_encryption_options: >> internode_encryption: dc >> keystore: <omitted> >> keystore_password: <omitted> >> truststore: <omitted> >> truststore_password: <omitted> >> >> client_encryption_options: >> enabled: false >> >> *cassandra-rackdc.properties* >> >> prefer_local=true >> >> *No obvious errors with SSH output* >> >> When starting cassandra with JVM_OPTS="$JVM_OPTS -Djavax.net.debug=ssl" added >> to cassandra-env.sh we see SSL logs printed to stdout (*Note: Subject >> and Issuer were omitted on purpose)*. >> >> found key for : cassy-us-west-2 >> adding as trusted cert: >> Subject: ... >> Issuer: ... >> Algorithm: RSA; Serial number: 0xdad28d843fc73325d4c1a75207d4e74 >> Valid from Fri May 27 00:00:00 UTC 2016 until Tue May 26 23:59:59 UTC 2026 >> >> ... >> >> trigger seeding of SecureRandom >> done seeding SecureRandom >> >> Looking at Java SE SSL/TLS connection debugging >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.oracle.com_javase_7_docs_technotes_guides_security_jsse_ReadDebug.html&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=E6NVfMr2TIhW42QMfARTvsfCLtdF-oEA3KfAQRfVZdk&m=KdhQDpMbz8v1GYrbdYL_opGq-GBPXftrEYEkgcGeMp0&s=SR3ashwvSRxA75nBjGDwjAwq65nDuBZUaDOvHPGDrps&e=>, >> this looks correct. But to note, we see this series of messages (along with >> the RSA key signature output) repeated several times in rapid fire. We >> never observe any messages about the trust store being added; however that >> might be something that occurs only on client initiation (?) >> >> Additionally, we do see cassandra report that the Encrypted Messaging >> service has been started. >> >> INFO [main] 2019-08-15 18:45:31,022 MessagingService.java:704 - Starting >> Encrypted Messaging Service on SSL port 7001 >> >> *Doesn't appear to be a cassandra.yaml configuration problem* >> >> We can bring the node back online by simply configuring internode_encryption: >> none. This action seems to rule out a broadcast_address or rpc_address >> configuration problem. >> >> *How we built our keystore/truststores* >> >> We followed the basic template datastax docs for preparing SSL >> certificates >> <https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/configuration/secureSSLCertWithCA.html>. >> One minor difference was that our private key and CSRs were generated using >> openssl. One per each region (we plan to share key/signed certs across >> nodes in regions). This was created using a command template as: >> >> openssl req -new -newkey rsa:2048 -out cassy-<region>.csr -keyout >> cassy-<region>.key -config cassy-<region>.conf -subj "..." -nodes -sha256 >> >> The generated CSR was then signed by an internal root CA. Because we >> generated our files using openssl, we had to build our jks files by >> importing our certs into them. >> >> *Commands to generate truststore* >> >> We distribute this one file to all nodes. >> >> keytool -importcert >> -keystore generic-server-truststore.jks >> -alias rootCa >> -file rootCa.crt >> -noprompt >> -keypass omitted >> -storepass omitted >> >> *Commands to generate keystore* >> >> This was done one per region; but essentially we created a keystore with >> keytool, then deleted the key entry and then imported our key entry using >> keytool from a pkcs12 file. >> >> keytool -genkeypair -keyalg RSA -alias cassy-${region} -keystore >> cassy-${region}.jks -storepass omitted -keypass omitted -validity 365 >> -keysize 2048 -dname "..." >> >> keytool -delete -alias cassy-${region} -keystore cassy-${region}.jks >> -storepass omitted >> >> openssl pkcs12 -export -in signed_certs/${region}.pem -inkey >> keys/cassandra.${region}.key -name cassy-${region} -out ${region}.p12 >> >> keytool -importkeystore -deststorepass omitted -destkeystore >> cassy-${region}.jks -srckeystore ${region}.p12 -srcstoretype PKCS12 >> >> keytool -importcert -keystore cassy-${region}.jks -alias rootCa -file ca.crt >> -noprompt -keypass omitted -storepass omitted >> >> Looking back at this, I don't remember why we used keytool to generate a >> keypair/keystore, then deleted and imported. I think it was because the >> keytool importkeystore command refused to run if the keystore didn't >> already exist. >> >> *ca.crt and pem file* >> >> The ca.crt file contains the root certificate and the intermediate >> certificate that was used to sign the CSR. The pem file contains the signed >> CSR returned to us, the intermediate cert, and the root CA (in that order). >> >> *openssl verify ca.crt and pem* >> >> openssl verify -CAfile ca.crt us-west-2.pem >> signed_certs/us-west-2.pem: OK >> >> *Command output after enabling encryption* >> >> *nodetool status (output truncated)* >> >> Datacenter: us-east >> =================== >> Status=Up/Down >> |/ State=Normal/Leaving/Joining/Moving >> -- Address Load Tokens Owns (effective) Host ID >> Rack >> ?N 52.44.11.221 ? 256 25.4% null >> 1c >> ... >> ?N 52.204.232.195 ? 256 23.2% null >> 1d >> Datacenter: us-west-2 >> ===================== >> Status=Up/Down >> |/ State=Normal/Leaving/Joining/Moving >> -- Address Load Tokens Owns (effective) Host ID >> Rack >> ?N 34.209.2.144 ? 256 26.5% null >> 2c >> UN 52.40.32.177 105.99 GiB 256 23.7% null >> 2c >> ?N 34.210.109.203 ? 256 24.7% null >> 2a >> ... >> >> With the online node being the node with encryption set. >> >> *cqlsh to localhost* >> >> cassy-node6:~$ cqlsh >> Connection error: ('Unable to connect to any servers', {'127.0.0.1': >> error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error: >> Connection refused")}) >> >> *cqlsh to remote node* Remote node is a node with encryption enabled >> >> cassy-node6:~$ cqlsh 10.0.2.7 >> Connection error: ('Unable to connect to any servers', {'10.0.2.7': >> error(111, "Tried connecting to [('10.0.2.7', 9042)]. Last error: Connection >> refused")}) >> >>