All of these tools are reasonable choices. I don't think the Spark project itself has a view on what works best. These things do different things. For example petastorm is not a training framework, but a way to feed data to a distributed DL training process on Spark. For what it's worth, Databricks ships Horovod and Petastorm, but that doesn't mean the other projects are second-class.
On Tue, Jun 1, 2021 at 4:59 PM Gourav Sengupta < gourav.sengupta.develo...@gmail.com> wrote: > Dear TD, Matei, Michael, Reynold, > > I hope all of you and your loved ones are staying safe and doing well. > > as a member of the community the direction from the SPARK mentors is > getting to be a bit confusing for me and I was wondering if I can seek your > help. > > We have to make long term decisions which is aligned with the open source > SPARK compatibility and directions and it will be wonderful to know what is > the most dependable route to get data from SPARK to tensorflow, is it: > 1. petastorm > 2. horovod > 3. tensorflowonspark > 4. spark_tensorflow_distributor > or something else. > > > Any comments from you will be super useful. > > If I am not wrong, seamless integration between SPARK to tensorflow/ > pytorch was one of the most exciting visions of SPARK 3.x > > While using SPARK ML has its own favourite space, I think that tensorflow > and pytorch will see a lot of focused development as well. > > > Regards, > Gourav Sengupta >