Re: Config for new clients (and server)

Pradeep Gollakota Mon, 10 Feb 2014 15:11:09 -0800

+1 Jun.


On Mon, Feb 10, 2014 at 2:17 PM, Sriram Subramanian <
srsubraman...@linkedin.com> wrote:

> +1 on Jun's suggestion.
>
> On 2/10/14 2:01 PM, "Jun Rao" <jun...@gmail.com> wrote:
>
> >I actually prefer to see those at INFO level. The reason is that the
> >config
> >system in an application can be complex. Some configs can be overridden in
> >different layers and it may not be easy to determine what the final
> >binding
> >value is. The logging in Kafka will serve as the source of truth.
> >
> >For reference, ZK client logs all overridden values during initialization.
> >It's a one time thing during starting up, so shouldn't add much noise.
> >It's
> >very useful for debugging subtle config issues.
> >
> >Exposing final configs programmatically is potentially useful. If we don't
> >want to log overridden values out of box, an app can achieve the same
> >thing
> >using the programming api. The only missing thing is that we won't know
> >those unused property keys, which is probably less important than seeing
> >the overridden values.
> >
> >Thanks,
> >
> >Jun
> >
> >
> >On Mon, Feb 10, 2014 at 10:15 AM, Jay Kreps <jay.kr...@gmail.com> wrote:
> >
> >> Hey Jun,
> >>
> >> I think that is reasonable but would object to having it be debug
> >>logging?
> >> I think logging out a bunch of noise during normal operation in a client
> >> library is pretty ugly. Also, is there value in exposing the final
> >>configs
> >> programmatically?
> >>
> >> -Jay
> >>
> >>
> >>
> >> On Sun, Feb 9, 2014 at 9:23 PM, Jun Rao <jun...@gmail.com> wrote:
> >>
> >> > +1 on the new config. Just one comment. Currently, when initiating a
> >> config
> >> > (e.g. ProducerConfig), we log those overridden property values and
> >>unused
> >> > property keys (likely due to mis-spelling). This has been very useful
> >>for
> >> > config verification. It would be good to add similar support in the
> >>new
> >> > config.
> >> >
> >> > Thanks,
> >> >
> >> > Jun
> >> >
> >> >
> >> > On Tue, Feb 4, 2014 at 9:34 AM, Jay Kreps <jay.kr...@gmail.com>
> wrote:
> >> >
> >> > > We touched on this a bit in previous discussions, but I wanted to
> >>draw
> >> > out
> >> > > the approach to config specifically as an item of discussion.
> >> > >
> >> > > The new producer and consumer use a similar key-value config
> >>approach
> >> as
> >> > > the existing scala clients but have different implementation code to
> >> help
> >> > > define these configs. The plan is to use the same approach on the
> >> server,
> >> > > once the new clients are complete; so if we agree on this approach
> >>it
> >> > will
> >> > > be the new default across the board.
> >> > >
> >> > > Let me split this into two parts. First I will try to motivate the
> >>use
> >> of
> >> > > key-value pairs as a configuration api. Then let me discuss the
> >> mechanics
> >> > > of specifying and parsing these. If we agree on the public api then
> >>the
> >> > > public api then the implementation details are interesting as this
> >>will
> >> > be
> >> > > shared across producer, consumer, and broker and potentially some
> >> tools;
> >> > > but if we disagree about the api then there is no point in
> >>discussing
> >> the
> >> > > implementation.
> >> > >
> >> > > Let me explain the rationale for this. In a sense a key-value map of
> >> > > configs is the worst possible API to the programmer using the
> >>clients.
> >> > Let
> >> > > me contrast the pros and cons versus a POJO and motivate why I
> >>think it
> >> > is
> >> > > still superior overall.
> >> > >
> >> > > Pro: An application can externalize the configuration of its kafka
> >> > clients
> >> > > into its own configuration. Whatever config management system the
> >> client
> >> > > application is using will likely support key-value pairs, so the
> >>client
> >> > > should be able to directly pull whatever configurations are present
> >>and
> >> > use
> >> > > them in its client. This means that any configuration the client
> >> supports
> >> > > can be added to any application at runtime. With the pojo approach
> >>the
> >> > > client application has to expose each pojo getter as some config
> >> > parameter.
> >> > > The result of many applications doing this is that the config is
> >> > different
> >> > > for each and it is very hard to have a standard client config shared
> >> > > across. Moving config into config files allows the usual tooling
> >> (version
> >> > > control, review, audit, config deployments separate from code
> >>pushes,
> >> > > etc.).
> >> > >
> >> > > Pro: Backwards and forwards compatibility. Provided we stick to our
> >> java
> >> > > api many internals can evolve and expose new configs. The
> >>application
> >> can
> >> > > support both the new and old client by just specifying a config that
> >> will
> >> > > be unused in the older version (and of course the reverse--we can
> >> remove
> >> > > obsolete configs).
> >> > >
> >> > > Pro: We can use a similar mechanism for both the client and the
> >>server.
> >> > > Since most people run the server as a stand-alone process it needs a
> >> > config
> >> > > file.
> >> > >
> >> > > Pro: Systems like Samza that need to ship configs across the network
> >> can
> >> > > easily do so as configs have a natural serialized form. This can be
> >> done
> >> > > with pojos using java serialization but it is ugly and has bizare
> >> failure
> >> > > cases.
> >> > >
> >> > > Con: The IDE gives nice auto-completion for pojos.
> >> > >
> >> > > Con: There are some advantages to javadoc as a documentation
> >>mechanism
> >> > for
> >> > > java people.
> >> > >
> >> > > Basically to me this is about operability versus niceness of api
> >>and I
> >> > > think operability is more important.
> >> > >
> >> > > Let me now give some details of the config support classes in
> >> > > kafka.common.config and how they are intended to be used.
> >> > >
> >> > > The goal of this code is the following:
> >> > > 1. Make specifying configs, their expected type (string, numbers,
> >> lists,
> >> > > etc) simple and declarative
> >> > > 2. Allow for validating simple checks (numeric range checks, etc)
> >> > > 3. Make the config "self-documenting". I.e. we should be able to
> >>write
> >> > code
> >> > > that generates the configuration documentation off the config def.
> >> > > 4. Specify default values.
> >> > > 5. Track which configs actually get used.
> >> > > 6. Make it easy to get config values.
> >> > >
> >> > > There are two classes there: ConfigDef and AbstractConfig. ConfigDef
> >> > > defines the specification of the accepted configurations and
> >> > AbstractConfig
> >> > > is a helper class for implementing the configuration class. The
> >> > difference
> >> > > is kind of like the difference between a "class" and an "object":
> >> > ConfigDef
> >> > > is for specifying the configurations that are accepted,
> >>AbstractConfig
> >> is
> >> > > the base class for an instance of these configs.
> >> > >
> >> > > You can see this in action here:
> >> > >
> >> > >
> >> >
> >>
> >>
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=blob_plain;f=client
> >>s/src/main/java/kafka/clients/producer/ProducerConfig.java;hb=HEAD
> >> > >
> >> > > (Ignore the static config names in there for now...I'm not actually
> >> sure
> >> > > that is the best approach).
> >> > >
> >> > > So the way this works is that the config specification is defined
> >>as:
> >> > >
> >> > >         config = new ConfigDef().define("bootstrap.brokers",
> >>Type.LIST,
> >> > > "documentation")
> >> > >
> >> > >                                 .define("metadata.timeout.ms",
> >> > Type.LONG,
> >> > > 60 * 1000, atLeast(0), "documentation")
> >> > >                                 .define("max.partition.size",
> >>Type.INT,
> >> > > 16384, atLeast(0), "documentation")
> >> > >
> >> > >
> >> > > This is used in a ProducerConfig class which extends AbstractConfig
> >>to
> >> > get
> >> > > access to some helper methods as well as the logic for tracking
> >>which
> >> > > configs get accessed.
> >> > >
> >> > > Currently I have included static String variables for each of the
> >> config
> >> > > names in that class. However I actually think that is not very
> >>helpful
> >> as
> >> > > the javadoc for them doesn't give the constant value and requires
> >> > > duplicating the documentation. To understand this point look at the
> >> > javadoc
> >> > > and note that the doc on the string is not the same as what we
> >>define
> >> in
> >> > > the ConfigDef. We could just have the javadoc for the config string
> >>be
> >> > the
> >> > > source of truth but it is actually pretty inconvient for that as it
> >> > doesn't
> >> > > show you the value of the constant, just the variable name (unless
> >>you
> >> > > discover how to unhide it). That is fine for the clients, but for
> >>the
> >> > > server would be very weird especially for non-java people. We could
> >> > attempt
> >> > > to duplicate documentation between the javadoc and the ConfigDef but
> >> > given
> >> > > our struggle to get well-documented config in a single place this
> >>seems
> >> > > unwise.
> >> > >
> >> > > So I recommend we have a single source for documentation of these
> >>and
> >> > that
> >> > > that source be the website documentation on configuration that
> >>covers
> >> > > clients and server and that that be generated off the config defs.
> >>The
> >> > > javadoc on KafkaProducer will link to this table so it should be
> >>quite
> >> > > convenient to discover. This makes things a little more typo prone,
> >>but
> >> > > that should be easily caught by the key detection. This will also
> >>make
> >> it
> >> > > possible for us to retire configs in the future without causing
> >>compile
> >> > > failures and add configs without having use of them break backwards
> >> > > compatibility. This is useful during upgrades where you want to be
> >> > > compatible with the old and new version so you can roll forwards and
> >> > > backwards.
> >> > >
> >> > > -Jay
> >> > >
> >> >
> >>
>
>

Re: Config for new clients (and server)

Reply via email to