Multiple destination single source

2021-05-14 Thread abhilash.kr
I have a single source of data. The processing of records have to be directed to multiple destinations. i.e 1. read the source data 2. based on condition route to the following sources 1. Kafka for error records 2. store success records with certain condition in s3 bucket, bucket name : "A

Re: Understanding what happens when a job is submitted to a cluster

2021-05-13 Thread abhilash.kr
Thank you. This was helpful. I have follow up questions. 1. How does spark know the data size is 5 million? 2. Are there any books or documentation that takes one simple job and goes deeper in terms of understanding what happens under the hood? -- Sent from: http://apache-spark-user-list.10015

Understanding what happens when a job is submitted to a cluster

2021-05-13 Thread abhilash.kr
Hello, What happens when a job is submitted to a cluster? I know the 10,000 foot overview of the spark architecture. But I need the minute details as to how spark estimates the resources to ask yarn, what's the response of yarn etc... I need the *step by step* understanding of the complete