Big Burst of Streaming Changes

2018-07-29 Thread ayan guha
Hi We have a situation where we are ingesting high volume streaming ingest coming from a Oracle table. The requirement Whenever there is a change in Oracle table, a CDC process will write out the change in a Kafka or Event Hub stream, and the stream will be consumed a spark streaming application.

[SparkContext] will application immediately stop after sc.stop()?

2018-07-29 Thread bsikander
Is it possible that a job keeps on running for some time after onApplicationEnd is fired? For example, I have a spark job which has 10 batches still to process and let's say that the processing them will take 10 minutes. If I execute sparkContext.stop(), I will receive onApplicationEnd

modeling timestamp in Avro messages (read using Spark Structured Streaming)

2018-07-29 Thread karan alang
i've a questing regarding modeling timestamp column in Avro messages The options are - ISO 8601 "String" (UTC Time) - "int" 32bit signed UNIX Epoch time - Long (modeled as Logica datatype - timestamp in schema) what would be the best way to model the timestamp ? fyi. we are using Apache