Hm, perhaps some intro will help.
An event is a change in state, or an update of some key business system. An event-driven architecture (EDA) consists of an event producer, an event consumer, and an event broker. If we follow this line of thought, an event is a primitive construct that travels from the producer to the consumer via an event broker. It carries a state change that happened in the producer. For example, new trade prices were placed or the temperature reading at a certain time was 21 degrees Celsius. A sequence of related events is commonly called a stream, which brings up things like Spark Structured Streaming etc. EDA are ideal for flexibility and moving quickly. They are commonly found in modern applications that use microservices (Microservices is an architectural style that structures an application as a collection of services that are highly scalable, loosely coupled, independently deployable and organised around business capabilities). They get their name because each function of the application operates as an independent service or microservice. A microservice is a software design pattern which runs within a container. Back to Spark in EDA, Spark relies on Kafka, Google Pub/Sub or other messaging systems to process the related streaming data via topic or topics and present them to Spark. At this stage Spark does not care to know how this streaming data is produced. Spark relies on the appropriate API to read data from Kafka <https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html> or from Google Pub/Sub <https://cloud.google.com/pubsub/lite/docs/spark#console_1>. For example if you are processing temperature, you construct a streaming dataframe that subscribes to a topic say temperature. As long as you have the correct jar files to interface with Kafka, this should work. In reality the Kafka will be running on its own container(s) plus the zookeeper containers. (A container is a useful resource allocation and sharing technology all about packaging software for deployment).Your Streaming DataFrame below sreamingDataFrame = self.spark \ .readStream \ .format("kafka") \ .option("kafka.bootstrap.servers", config['MDVariables']['bootstrapServers'],) \ ..... All you need to know is the values for *Kafka .bootstrap.serve*rs that are read from the yaml file at the start and you can easily change them without code change. Someone else can look after the Kafka part and most likely Kafka could be running on its own Kubernetes (k8s) cluster. Come to Spark, you can also deploy Spark on k8s for scalability and availability. At this stage, you can build a complete decision engine on Spark Structured Streaming running on k8s. You can work out average temperatures over a period and write it to a time series database via Spark API (to such a database0, or you can decide to publish it to another kafka topic. In short, you can see this architecture is pretty agile so to speak with loose deployment of microservices. Some of these terms like Kubernetes, containers, microservices, pod, docker, docker image etc, I have tried to explain here <https://www.linkedin.com/pulse/spark-kubernetes-practitioners-guide-mich-talebzadeh-ph-d-/> HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Fri, 14 Jan 2022 at 19:33, ashok34...@yahoo.com.INVALID <ashok34...@yahoo.com.invalid> wrote: > Hi gurus, > > I am trying to understand the role of Spark in an event driven > architecture. I know Spark deals with massive parallel processing. However, > does Spark follow event driven architecture like Kafka as well? Say > handling producers, filtering and pushing the events to consumers like > database etc. > > thanking you >