Re: Config for new clients (and server)

Sriram Subramanian Mon, 10 Feb 2014 14:18:48 -0800

+1 on Jun's suggestion.

On 2/10/14 2:01 PM, "Jun Rao" <jun...@gmail.com> wrote:


>I actually prefer to see those at INFO level. The reason is that the
>config
>system in an application can be complex. Some configs can be overridden in
>different layers and it may not be easy to determine what the final
>binding
>value is. The logging in Kafka will serve as the source of truth.
>
>For reference, ZK client logs all overridden values during initialization.
>It's a one time thing during starting up, so shouldn't add much noise.
>It's
>very useful for debugging subtle config issues.
>
>Exposing final configs programmatically is potentially useful. If we don't
>want to log overridden values out of box, an app can achieve the same
>thing
>using the programming api. The only missing thing is that we won't know
>those unused property keys, which is probably less important than seeing
>the overridden values.
>
>Thanks,
>
>Jun
>
>
>On Mon, Feb 10, 2014 at 10:15 AM, Jay Kreps <jay.kr...@gmail.com> wrote:
>
>> Hey Jun,
>>
>> I think that is reasonable but would object to having it be debug
>>logging?
>> I think logging out a bunch of noise during normal operation in a client
>> library is pretty ugly. Also, is there value in exposing the final
>>configs
>> programmatically?
>>
>> -Jay
>>
>>
>>
>> On Sun, Feb 9, 2014 at 9:23 PM, Jun Rao <jun...@gmail.com> wrote:
>>
>> > +1 on the new config. Just one comment. Currently, when initiating a
>> config
>> > (e.g. ProducerConfig), we log those overridden property values and
>>unused
>> > property keys (likely due to mis-spelling). This has been very useful
>>for
>> > config verification. It would be good to add similar support in the
>>new
>> > config.
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>> >
>> > On Tue, Feb 4, 2014 at 9:34 AM, Jay Kreps <jay.kr...@gmail.com> wrote:
>> >
>> > > We touched on this a bit in previous discussions, but I wanted to
>>draw
>> > out
>> > > the approach to config specifically as an item of discussion.
>> > >
>> > > The new producer and consumer use a similar key-value config
>>approach
>> as
>> > > the existing scala clients but have different implementation code to
>> help
>> > > define these configs. The plan is to use the same approach on the
>> server,
>> > > once the new clients are complete; so if we agree on this approach
>>it
>> > will
>> > > be the new default across the board.
>> > >
>> > > Let me split this into two parts. First I will try to motivate the
>>use
>> of
>> > > key-value pairs as a configuration api. Then let me discuss the
>> mechanics
>> > > of specifying and parsing these. If we agree on the public api then
>>the
>> > > public api then the implementation details are interesting as this
>>will
>> > be
>> > > shared across producer, consumer, and broker and potentially some
>> tools;
>> > > but if we disagree about the api then there is no point in
>>discussing
>> the
>> > > implementation.
>> > >
>> > > Let me explain the rationale for this. In a sense a key-value map of
>> > > configs is the worst possible API to the programmer using the
>>clients.
>> > Let
>> > > me contrast the pros and cons versus a POJO and motivate why I
>>think it
>> > is
>> > > still superior overall.
>> > >
>> > > Pro: An application can externalize the configuration of its kafka
>> > clients
>> > > into its own configuration. Whatever config management system the
>> client
>> > > application is using will likely support key-value pairs, so the
>>client
>> > > should be able to directly pull whatever configurations are present
>>and
>> > use
>> > > them in its client. This means that any configuration the client
>> supports
>> > > can be added to any application at runtime. With the pojo approach
>>the
>> > > client application has to expose each pojo getter as some config
>> > parameter.
>> > > The result of many applications doing this is that the config is
>> > different
>> > > for each and it is very hard to have a standard client config shared
>> > > across. Moving config into config files allows the usual tooling
>> (version
>> > > control, review, audit, config deployments separate from code
>>pushes,
>> > > etc.).
>> > >
>> > > Pro: Backwards and forwards compatibility. Provided we stick to our
>> java
>> > > api many internals can evolve and expose new configs. The
>>application
>> can
>> > > support both the new and old client by just specifying a config that
>> will
>> > > be unused in the older version (and of course the reverse--we can
>> remove
>> > > obsolete configs).
>> > >
>> > > Pro: We can use a similar mechanism for both the client and the
>>server.
>> > > Since most people run the server as a stand-alone process it needs a
>> > config
>> > > file.
>> > >
>> > > Pro: Systems like Samza that need to ship configs across the network
>> can
>> > > easily do so as configs have a natural serialized form. This can be
>> done
>> > > with pojos using java serialization but it is ugly and has bizare
>> failure
>> > > cases.
>> > >
>> > > Con: The IDE gives nice auto-completion for pojos.
>> > >
>> > > Con: There are some advantages to javadoc as a documentation
>>mechanism
>> > for
>> > > java people.
>> > >
>> > > Basically to me this is about operability versus niceness of api
>>and I
>> > > think operability is more important.
>> > >
>> > > Let me now give some details of the config support classes in
>> > > kafka.common.config and how they are intended to be used.
>> > >
>> > > The goal of this code is the following:
>> > > 1. Make specifying configs, their expected type (string, numbers,
>> lists,
>> > > etc) simple and declarative
>> > > 2. Allow for validating simple checks (numeric range checks, etc)
>> > > 3. Make the config "self-documenting". I.e. we should be able to
>>write
>> > code
>> > > that generates the configuration documentation off the config def.
>> > > 4. Specify default values.
>> > > 5. Track which configs actually get used.
>> > > 6. Make it easy to get config values.
>> > >
>> > > There are two classes there: ConfigDef and AbstractConfig. ConfigDef
>> > > defines the specification of the accepted configurations and
>> > AbstractConfig
>> > > is a helper class for implementing the configuration class. The
>> > difference
>> > > is kind of like the difference between a "class" and an "object":
>> > ConfigDef
>> > > is for specifying the configurations that are accepted,
>>AbstractConfig
>> is
>> > > the base class for an instance of these configs.
>> > >
>> > > You can see this in action here:
>> > >
>> > >
>> >
>> 
>>https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=blob_plain;f=client
>>s/src/main/java/kafka/clients/producer/ProducerConfig.java;hb=HEAD
>> > >
>> > > (Ignore the static config names in there for now...I'm not actually
>> sure
>> > > that is the best approach).
>> > >
>> > > So the way this works is that the config specification is defined
>>as:
>> > >
>> > >         config = new ConfigDef().define("bootstrap.brokers",
>>Type.LIST,
>> > > "documentation")
>> > >
>> > >                                 .define("metadata.timeout.ms",
>> > Type.LONG,
>> > > 60 * 1000, atLeast(0), "documentation")
>> > >                                 .define("max.partition.size",
>>Type.INT,
>> > > 16384, atLeast(0), "documentation")
>> > >
>> > >
>> > > This is used in a ProducerConfig class which extends AbstractConfig
>>to
>> > get
>> > > access to some helper methods as well as the logic for tracking
>>which
>> > > configs get accessed.
>> > >
>> > > Currently I have included static String variables for each of the
>> config
>> > > names in that class. However I actually think that is not very
>>helpful
>> as
>> > > the javadoc for them doesn't give the constant value and requires
>> > > duplicating the documentation. To understand this point look at the
>> > javadoc
>> > > and note that the doc on the string is not the same as what we
>>define
>> in
>> > > the ConfigDef. We could just have the javadoc for the config string
>>be
>> > the
>> > > source of truth but it is actually pretty inconvient for that as it
>> > doesn't
>> > > show you the value of the constant, just the variable name (unless
>>you
>> > > discover how to unhide it). That is fine for the clients, but for
>>the
>> > > server would be very weird especially for non-java people. We could
>> > attempt
>> > > to duplicate documentation between the javadoc and the ConfigDef but
>> > given
>> > > our struggle to get well-documented config in a single place this
>>seems
>> > > unwise.
>> > >
>> > > So I recommend we have a single source for documentation of these
>>and
>> > that
>> > > that source be the website documentation on configuration that
>>covers
>> > > clients and server and that that be generated off the config defs.
>>The
>> > > javadoc on KafkaProducer will link to this table so it should be
>>quite
>> > > convenient to discover. This makes things a little more typo prone,
>>but
>> > > that should be easily caught by the key detection. This will also
>>make
>> it
>> > > possible for us to retire configs in the future without causing
>>compile
>> > > failures and add configs without having use of them break backwards
>> > > compatibility. This is useful during upgrades where you want to be
>> > > compatible with the old and new version so you can roll forwards and
>> > > backwards.
>> > >
>> > > -Jay
>> > >
>> >
>>

Re: Config for new clients (and server)

Reply via email to