IIUC Spark doesn't strongly bind to HDFS, it uses a common FileSystem layer which supports different FS implementations, HDFS is just one option. You could also use S3 as a backend FS, from Spark's point it is transparent to different FS implementations.
On Sun, Feb 12, 2017 at 5:32 PM, ayan guha <guha.a...@gmail.com> wrote: > How about adding more NFS storage? > > On Sun, 12 Feb 2017 at 8:14 pm, Sean Owen <so...@cloudera.com> wrote: > >> Data has to live somewhere -- how do you not add storage but store more >> data? Alluxio is not persistent storage, and S3 isn't on your premises. >> >> On Sun, Feb 12, 2017 at 4:29 AM Benjamin Kim <bbuil...@gmail.com> wrote: >> >> Has anyone got some advice on how to remove the reliance on HDFS for >> storing persistent data. We have an on-premise Spark cluster. It seems like >> a waste of resources to keep adding nodes because of a lack of storage >> space only. I would rather add more powerful nodes due to the lack of >> processing power at a less frequent rate, than add less powerful nodes at a >> more frequent rate just to handle the ever growing data. Can anyone point >> me in the right direction? Is Alluxio a good solution? S3? I would like to >> hear your thoughts. >> >> Cheers, >> Ben >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> -- > Best Regards, > Ayan Guha >