I think what Dana is suggesting is that since Python isn't doing a good job utilising all the available CPU power, you could run multiple python processes to process the load. Divide the mongodb collection between, say, 4 parts and process each part with one python process. On kafka side.
Or use a multi threaded java producer that is able to use the machine optimally. On Thu, Aug 25, 2016 at 10:21 PM, Dominik Safaric <dominiksafa...@gmail.com> wrote: > Dear Dana, > > > I would recommend > > other tools for bulk transfers. > > > What tools/languages would you rather recommend then using Python? > > I could for sure accomplish the same by using the native Java Kafka > Producer API, but should this really affect the performance under the > assumption that the Kafka configuration stays as is? > > > On 25 Aug 2016, at 18:43, Dana Powers <dana.pow...@gmail.com> wrote: > > > > python is generally restricted to a single CPU, and kafka-python will max > > out a single CPU well before it maxes a network card. I would recommend > > other tools for bulk transfers. Otherwise you may find that partitioning > > your data set and running separate python processes for each will > increase > > the overall CPU available and therefore the throughput. > > > > One day I will spend time improving the CPU performance of kafka-python, > > but probably not in the near term. > > > > -Dana > > -- -- Sharninder