I think what Dana is suggesting is that since Python isn't doing a good job
utilising all the available CPU power, you could run multiple python
processes to process the load. Divide the mongodb collection between, say,
4 parts and process each part with one python process. On kafka side.

Or use a multi threaded java producer that is able to use the machine
optimally.


On Thu, Aug 25, 2016 at 10:21 PM, Dominik Safaric <dominiksafa...@gmail.com>
wrote:

> Dear Dana,
>
> > I would recommend
> > other tools for bulk transfers.
>
>
> What tools/languages would you rather recommend then using Python?
>
> I could for sure accomplish the same by using the native Java Kafka
> Producer API, but should this really affect the performance under the
> assumption that the Kafka configuration stays as is?
>
> > On 25 Aug 2016, at 18:43, Dana Powers <dana.pow...@gmail.com> wrote:
> >
> > python is generally restricted to a single CPU, and kafka-python will max
> > out a single CPU well before it maxes a network card. I would recommend
> > other tools for bulk transfers. Otherwise you may find that partitioning
> > your data set and running separate python processes for each will
> increase
> > the overall CPU available and therefore the throughput.
> >
> > One day I will spend time improving the CPU performance of kafka-python,
> > but probably not in the near term.
> >
> > -Dana
>
>


-- 
--
Sharninder

Reply via email to