Update - the answer was spark.cassandra.input.split.sizeInMB. The
default value is 512MBytes. Setting this to 50 resulted in a lot more
splits and the job ran in under 11 minutes; no timeout errors. In this
case the job was a simple count. 10 minutes 48 seconds for over 8.2
billion rows. Fa
Update - I believe that for large tables, the
spark.cassandra.read.timeoutMS needs to be very long; like 4 hours or
longer. The job now runs much longer, but still doesn't complete. I'm
now facing this all too familiar error:
com.datastax.oss.driver.api.core.servererrors.ReadTimeoutException: