s/playing a role/paying a role/ On Thu, Aug 25, 2016 at 12:51 PM, Mark Hamstra <m...@clearstorydata.com> wrote:
> One way you can start to make this make more sense, Sean, is if you > exploit the code/data duality so that the non-distributed data that you are > sending out from the driver is actually paying a role more like code (or at > least parameters.) What is sent from the driver to an Executer is then > used (typically as seeds or parameters) to execute some procedure on the > Worker node that generates the actual data on the Workers. After that, you > proceed to execute in a more typical fashion with Spark using the > now-instantiated distributed data. > > But I don't get the sense that this meta-programming-ish style is really > what the OP was aiming at. > > On Thu, Aug 25, 2016 at 12:39 PM, Sean Owen <so...@cloudera.com> wrote: > >> Without a distributed storage system, your application can only create >> data on the driver and send it out to the workers, and collect data back >> from the workers. You can't read or write data in a distributed way. There >> are use cases for this, but pretty limited (unless you're running on 1 >> machine). >> >> I can't really imagine a serious use of (distributed) Spark without >> (distribute) storage, in a way I don't think many apps exist that don't >> read/write data. >> >> The premise here is not just replication, but partitioning data across >> compute resources. With a distributed file system, your big input exists >> across a bunch of machines and you can send the work to the pieces of data. >> >> On Thu, Aug 25, 2016 at 7:57 PM, kant kodali <kanth...@gmail.com> wrote: >> >>> @Mich I understand why I would need Zookeeper. It is there for fault >>> tolerance given that spark is a master-slave architecture and when a mater >>> goes down zookeeper will run a leader election algorithm to elect a new >>> leader however DevOps hate Zookeeper they would be much happier to go with >>> etcd & consul and looks like if we mesos scheduler we should be able to >>> drop Zookeeper. >>> >>> HDFS I am still trying to understand why I would need for spark. I >>> understand the purpose of distributed file systems in general but I don't >>> understand in the context of spark since many people say you can run a >>> spark distributed cluster in a stand alone mode but I am not sure what are >>> its pros/cons if we do it that way. In a hadoop world I understand that one >>> of the reasons HDFS is there is for replication other words if we write >>> some data to a HDFS it will store that block across different nodes such >>> that if one of nodes goes down it can still retrieve that block from other >>> nodes. In the context of spark I am not really sure because 1) I am new 2) >>> Spark paper says it doesn't replicate data instead it stores the >>> lineage(all the transformations) such that it can reconstruct it. >>> >>> >>> >>> >>> >>> >>> On Thu, Aug 25, 2016 9:18 AM, Mich Talebzadeh mich.talebza...@gmail.com >>> wrote: >>> >>>> You can use Spark on Oracle as a query tool. >>>> >>>> It all depends on the mode of the operation. >>>> >>>> If you running Spark with yarn-client/cluster then you will need yarn. >>>> It comes as part of Hadoop core (HDFS, Map-reduce and Yarn). >>>> >>>> I have not gone and installed Yarn without installing Hadoop. >>>> >>>> What is the overriding reason to have the Spark on its own? >>>> >>>> You can use Spark in Local or Standalone mode if you do not want >>>> Hadoop core. >>>> >>>> HTH >>>> >>>> Dr Mich Talebzadeh >>>> >>>> >>>> >>>> LinkedIn * >>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>> >>>> >>>> >>>> http://talebzadehmich.wordpress.com >>>> >>>> >>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>>> any loss, damage or destruction of data or any other property which may >>>> arise from relying on this email's technical content is explicitly >>>> disclaimed. The author will in no case be liable for any monetary damages >>>> arising from such loss, damage or destruction. >>>> >>>> >>>> >>>> On 24 August 2016 at 21:54, kant kodali <kanth...@gmail.com> wrote: >>>> >>>> What do I loose if I run spark without using HDFS or Zookeper ? which >>>> of them is almost a must in practice? >>>> >>>> >>>> >> >