Hi
We have a situation where we are ingesting high volume streaming ingest
coming from a Oracle table.
The requirement
Whenever there is a change in Oracle table, a CDC process will write out
the change in a Kafka or Event Hub stream, and the stream will be consumed
a spark streaming application.
Is it possible that a job keeps on running for some time after
onApplicationEnd is fired?
For example,
I have a spark job which has 10 batches still to process and let's say that
the processing them will take 10 minutes. If I execute sparkContext.stop(),
I will receive onApplicationEnd
i've a questing regarding modeling timestamp column in Avro messages
The options are
- ISO 8601 "String" (UTC Time)
- "int" 32bit signed UNIX Epoch time
- Long (modeled as Logica datatype - timestamp in schema)
what would be the best way to model the timestamp ?
fyi. we are using Apache