In the current implementation of TorrentBroadcast, the blocks are fetched one-by-one in single thread, so it can not fully utilize the network bandwidth.
Davies On Fri, Jan 9, 2015 at 2:11 AM, Jun Yang <yangjun...@gmail.com> wrote: > Guys, > > I have a question regarding to Spark 1.1 broadcast implementation. > > In our pipeline, we have a large multi-class LR model, which is about 1GiB > size. > To employ the benefit of Spark parallelism, a natural thinking is to > broadcast this model file to the worker node. > > However, it looks that broadcast performance is not quite good. > > During the process of broadcasting the model file, I just monitor the > network card throughput of worker node, their > recv/write throughput is just around 30~40 MiB( our server box is equipped > with 100MiB ethernet card). > > Is this the real limitation of Spark 1.1 broadcast implementation? Or there > may be some configuration or tricks > that can help make Spark broadcast perform better. > > Thanks > > > > -- > yangjun...@gmail.com > http://hi.baidu.com/yjpro --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org