yes, we create connectors on the fly and in some cases after a new deployment msk cluster's public access is disabled thereby making the bootstrap brokers present there unavailable to currently running connectors.
I am curious to know if it's possible to momentarily catch exceptions in such cases without stopping or failing the connector tasks. perhaps this is not the right solution but catching exceptions in the poll method of connector have created strange behavior where it keeps on spawning the tasks On Thu, 13 Jun 2024, 03:40 Alex Craig, <alexcrai...@gmail.com> wrote: > Hi Akash, if your connector doesn't have the appropriate permissions for a > topic then it can't run - so I'm not sure what value there is in trying to > handle or tolerate that kind of exception. If someone is changing ACLs in > such a way that it breaks a connector, then you probably want that > connector to go into a failed state. I don't think there's a way to change > your connector code to handle this since consuming/producing to topics is > handled by the Connect framework, but in my opinion if people are > repeatedly breaking ACLs, then that might be better fixed with a process > change? (if this is an ongoing problem then it sounds like there's a > larger issue happening that should be addressed) > > When you say "bootstrap brokers are missing"... are you referring to > someone misconfiguring a connector such that the bootstrap servers config > is missing? > > - alex > > On Wed, Jun 12, 2024 at 4:16 PM Akash Dhiman <akashdhiman...@gmail.com> > wrote: > > > Hello, > > we have a requirement to make kafka connector more fault tolerant for our > > use, where in we don't want them to fail for some kinds of errors, i.e > > error where bootstrap broker are missing or if we don't have sufficient > > permission to read data from topic (we are reading from aws msk). > > > > we tried using basic error handing for such exception where in, in the > poll > > method we tried to swallow SaslException and config exceptions where > > bootstrap brokers are missing, and retry after t amount of time but this > > seems to make the connector make lots of unassigned task corresponding to > > even a connector even with max.tasks set to 1. > > > > 1. unclear as to why that happens > > 2. looking for guidance on more standardised way to ensure resilient > > connectors that don't transition to fail state on some errors which we > > expect can happen > > >