Re: [Spark Streaming] Spark Streaming with S3 vs Kinesis

2018-06-28 Thread Farshid Zavareh
Thanks. A workaround I can think of is to rename/move the objects which have been processed to a different prefix (which is not monitored), But with StreamingContext. textFileStream method there doesn't seem to be a way to know where each record is coming from. Is there another way to do this? On

Re: [Spark Streaming] Spark Streaming with S3 vs Kinesis

2018-06-26 Thread Steve Loughran
On 25 Jun 2018, at 23:59, Farshid Zavareh mailto:fhzava...@gmail.com>> wrote: I'm writing a Spark Streaming application where the input data is put into an S3 bucket in small batches (using Database Migration Service - DMS). The Spark application is the only consumer. I'm considering two poss

[Spark Streaming] Spark Streaming with S3 vs Kinesis

2018-06-25 Thread Farshid Zavareh
I'm writing a Spark Streaming application where the input data is put into an S3 bucket in small batches (using Database Migration Service - DMS). The Spark application is the only consumer. I'm considering two possible architectures: Have Spark Streaming watch an S3 prefix and pick up new objects