Hi Jay,
  Many thanks for the info. All that makes sense, but from an API
standpoint when something is labelled async and returns a Future, this will
be misconstrued and developers will place async sends in critical client
facing request/response pathways of code that should never block. If the
app comes up with a bad config, it will hang all incoming connections.

Obviously, there is a spectrum of use cases with regard to message loss and
the defaults cannot cater to all use cases. I like that the defaults tend
towards best effort guarantees, but I am not sure it justifies the
inconsistency in the API.

1) It sounds like the client is already structured to handle changes in
partitions on the fly, I am sure I am over simplifying but in the case
where no meta is available, but my naive approach would be assume some
number of partitions and then when there is metadata treat it as a
partition change event. If there are more unknown than just partition
count, probably won't work.
2) Pretty much makes sense, especially now that I see people on this
discussion list wanting a million topics (good luck)
3) I agree client creation shouldn't fail, but any sends should probably
fast fail or have it explicit on the call the choice you are making.

I'm still thinking about how I am going to make the client behave as I'd
like. I think I need a background process kicked off on startup to prime
the topics I am interested in. Until that process completes, any sends
through the producer will need to fast fail instead of hang. This would
still leave the window for blocking if you send to a topic your app wasn't
aware it would send to, but now we're getting into corner corner cases.

Would having something like that be a baked in option be accepted into
Kafka clients mainline?

A quick win might be to clarify the documentation so that it is clear that
this API will block in cases XYZ (maybe this is mentioned somewhere and I
missed it).

Thanks,
Paul


On Thu, Dec 18, 2014 at 1:17 PM, Jay Kreps <j...@confluent.io> wrote:
>
> Hey Paul,
>
> Here are the constraints:
> 1. We wanted the storage of messages to be in their compact binary form so
> we could bound memory usage. This implies partitioning prior to enqueue.
> And as you note partitioning requires having metadata (even stale metadata)
> about topics.
> 2. We wanted to avoid prefetching metadata for all topics since there may
> be quite a lot of topics.
> 3. We wanted to make metadata fetching lazy so that it would be possible to
> create a client without having an active network connection. This tends to
> be important when services are brought up in development or test
> environments where it is annoying to have to control the dependency graph
> when starting things.
>
> This blocking isn't too bad as it only occurs on the first request for each
> topic. Our feeling was that many things tend to get setup on a first
> request (DB connections are established, caches populated, etc) so this was
> not unreasonable.
>
> If you want to pre-initialize the metadata to avoid blocking on the first
> request you can do so by fetching the metadata using the
> producer.partitionsFor(topic) api at start-up.
>
> -Jay
>
> On Thu, Dec 18, 2014 at 9:07 AM, Paul Pearcy <ppea...@gmail.com> wrote:
> >
> > Hello,
> >
> >   Playing around with the 0.8.2-beta producer client. One of my test
> cases
> > is to ensure producers can deal with Kafka being down when the producer
> is
> > created. My tests failed miserably because of the default blocking in the
> > producer with regard to metadata.fetch.timeout.ms. The first line of new
> > producer is waitOnMetadata which is blocking.
> >
> > I can handle this case by loading topic meta on init and setting the
> > timeout value to very low metadata.fetch.timeout.ms and either throwing
> > away messages or creating my own internal queue to buffer.
> >
> > I’m surprised the metasync isn’t done async. If it fails, return that in
> > the future/callback. This way the API could actually be considered safely
> > async and the producer buffer could try to hold on to things until
> > block.on.buffer.full kicks in to either drop messages or block. You’d
> > probably need a partition callback since numPartitions wouldn’t be
> > available.
> >
> > The implication is that people's apps will work fine if first messages
> are
> > sent while kafka server is up, however, if kafka is down and they restart
> > their app, the new producer will block all sends and blow things up if
> you
> > haven't written your app to be aware of this edge case.
> >
> >
> > Thanks,
> >
> > Paul
> >
>

Reply via email to