Sorry to wrongly send message in mid. How about trying to increate 'batchsize` in a jdbc option to improve performance?
// maropu On Thu, Apr 21, 2016 at 2:15 PM, Takeshi Yamamuro <linguin....@gmail.com> wrote: > Hi, > > How about trying to increate 'batchsize > > On Wed, Apr 20, 2016 at 7:14 AM, Jonathan Gray <jonny.g...@gmail.com> > wrote: > >> Hi, >> >> I'm trying to write ~60 million rows from a DataFrame to a database using >> JDBC using Spark 1.6.1, something similar to df.write().jdbc(...) >> >> The write seems to not be performing well. Profiling the application >> with a master of local[*] it appears there is not much socket write >> activity and also not much CPU. >> >> I would expect there to be an almost continuous block of socket write >> activity showing up somewhere in the profile. >> >> I can see that the top hot method involves >> apache.spark.unsafe.platform.CopyMemory all from calls within >> JdbcUtils.savePartition(...). However, the CPU doesn't seem particularly >> stressed so I'm guessing this isn't the cause of the problem. >> >> Is there any best practices or has anyone come across a case like this >> before where a write to a database seems to perform poorly? >> >> Thanks, >> Jon >> > > > > -- > --- > Takeshi Yamamuro > -- --- Takeshi Yamamuro