You could distribute the computation across a cluster with Spark and Horovod (and Petastorm) for example: https://github.com/horovod/horovod https://github.com/uber/petastorm
If you're at a few hours, it may not be worth it - it's not hard to set up but is more involved. You may do better with a larger GPU or multiple GPUs on a single VM first. But totally possible, definitely have seen this work On Wed, Oct 13, 2021 at 11:01 PM 刘沛文 <john....@drbrain.com.cn> wrote: > Hi, > My name is Peiwen. I'm working with Dr. Brain, an AI company focused on > medical imaging processing and deep learning. Our website is > http://drbrain.net/index_en.aspx > We basically do 2 major things. 1. image process, like lesion drawing 2. > deep learning for neural disease prediction, like stroke, Alzheimer's > Disease. > Currently we use Tensorflow and other deep learning frameworks. Due to the > size of the medical image (1 ~ 5 GB per record), with traditional framework > on single computer, it takes long time (a few hours) for data processing > and model training before we get the result. > I'm writing the email to check if there's some good solution that Apache > Spark can provide to accelerate the calculation. > I know Tensorflow can work with Spark. Just want to have a brief > understanding that compared to traditional Tensorflow, how faster Apache > Spark can help achieve, saying a cluster of 10 nodes. > > Thank you very much! > > Peiwen >