Hi,

I'm trying to write ~60 million rows from a DataFrame to a database using
JDBC using Spark 1.6.1, something similar to df.write().jdbc(...)

The write seems to not be performing well.  Profiling the application with
a master of local[*] it appears there is not much socket write activity and
also not much CPU.

I would expect there to be an almost continuous block of socket write
activity showing up somewhere in the profile.

I can see that the top hot method involves
apache.spark.unsafe.platform.CopyMemory all from calls within
JdbcUtils.savePartition(...).  However, the CPU doesn't seem particularly
stressed so I'm guessing this isn't the cause of the problem.

Is there any best practices or has anyone come across a case like this
before where a write to a database seems to perform poorly?

Thanks,
Jon

Reply via email to