Hi All,

We are working in Apache spark with Kafka integration, in this use case we are 
using DirectStream approach. we want to avoid the data loss in this approach 
for actually we take offsets and saving that offset into MongoDB.
We want some clarification is Spark stores any offsets internally, let us 
explain some example :
For the first rdd batch we get 0 to 5 offsets of events to be processed, but 
unexpectedly the application is crashed, then we started aging the application, 
then this job fetches again from 0 to 5 events or where the event stopped in 
previous job.
We are not committing any offsets in the above process, because we have to 
commit offsets manually in DirectStream approach. Is that new job fetches 
events form 0th position.


Thanks & Regards,
Ganga Phani Charan Adabala | Software Engineer
o:  +91-40-23116680 | c:  +91-9491418099
e:  char...@eiqnetworks.com<mailto:char...@eiqnetworks.com>
[cid:image001.jpg@01CF60B1.87C0C870]
EiQ Networks(r), Inc. |  www.eiqnetworks.com<http://www.eiqnetworks.com/>
www.socvue.com<http://www.socvue.com/> | 
www.eiqfederal.com<http://www.eiqfederal.com/>

[Blog]<http://blog.eiqnetworks.com/>Blog<http://blog.eiqnetworks.com/>   
[Twitter] <https://twitter.com/eiqnetworks>  
Twitter<https://twitter.com/eiqnetworks>   [LinkedIn] 
<http://www.linkedin.com/company/eiqnetworks>  
LinkedIn<http://www.linkedin.com/company/eiqnetworks>   [Facebook] 
<http://www.facebook.com/eiqnetworks>  
Facebook<http://www.facebook.com/eiqnetworks>

"This email is intended only for the use of the individual or entity named 
above and may contain information that is confidential and privileged. If you 
are not the intended recipient, you are hereby notified that any dissemination, 
distribution or copying of the email is strictly prohibited. If you have 
received this email in error, please destroy the original message."


Reply via email to