Hi, I'm trying to write ~60 million rows from a DataFrame to a database using JDBC using Spark 1.6.1, something similar to df.write().jdbc(...)
The write seems to not be performing well. Profiling the application with a master of local[*] it appears there is not much socket write activity and also not much CPU. I would expect there to be an almost continuous block of socket write activity showing up somewhere in the profile. I can see that the top hot method involves apache.spark.unsafe.platform.CopyMemory all from calls within JdbcUtils.savePartition(...). However, the CPU doesn't seem particularly stressed so I'm guessing this isn't the cause of the problem. Is there any best practices or has anyone come across a case like this before where a write to a database seems to perform poorly? Thanks, Jon