I'll have to get back to you on the code bits, but I'm pretty sure we're doing 
simple sync batching. We're not in production yet, but after some months of 
development I haven't seen any failures, even when pushing load doing multiple 
years' backfill. I think the real question is why are you losing tablet 
servers? The only instability we ever had with Kudu was when it had that weird 
ntp sync issue that was fixed I think for 1.6. What version are you running?

Anyway I would think that infinite loop should be catchable somewhere. Our 
pipeline is set to fail/retry with Flink snapshots. I imagine there is similar 
with Spark. Sorry I cant be of more help!



On Feb 26, 2018 9:10 PM, Ravi Kanth <ravikanth....@gmail.com> wrote:
Cliff,

Thanks for the response. Well, I do agree that its simple and seamless. In my 
case, I am able to upsert ~25000 events/sec into Kudu. But, I am facing the 
problem when any of the Kudu Tablet or master server is down. I am not able to 
get a hold of the exception from client. The client is going into an infinite 
loop trying to connect to Kudu. Meanwhile, I am loosing my records. I tried 
handling the errors through getPendingErrors() but still it is helpless. I am 
using AsyncKuduClient to establish the connection and retrieving the syncClient 
from the Async to open the session and table. Any help?

Thanks,
Ravi

On 26 February 2018 at 18:00, Cliff Resnick 
<cre...@gmail.com<mailto:cre...@gmail.com>> wrote:
While I can't speak for Spark, we do use the client API from Flink streaming 
and it's simple and seamless. It's especially nice if you require an Upsert 
semantic.

On Feb 26, 2018 7:51 PM, "Ravi Kanth" 
<ravikanth....@gmail.com<mailto:ravikanth....@gmail.com>> wrote:
Hi,

Anyone using Spark Streaming to ingest data into Kudu and using Kudu Client API 
to do so rather than the traditional KuduContext API? I am stuck at a point and 
couldn't find a solution.

Thanks,
Ravi


Reply via email to