[Puppet-dev] Re: RFC - A specification for module schemas

Henrik Lindberg Mon, 01 Feb 2016 10:10:14 -0800

There were many great replies to this, I am following up on this
and the comments made elsewhere in one go here.


On 2016-30-01 5:45, Corey Osman wrote:

I wanted to bring up a conversation in hopes that we as a community can
create a specification for something I am calling module schemas.
Before I get into that I want to provide a little background info.

This all started a few years ago when hiera first came out. Data
seperation in the form of parameters and auto hiera lookups quickly
became the norm and reusable modules exploded into what the forge is
today . Because of the popularity of hiera, data validation is now a
major problem though. Without good data, excellent modules become useless.

Puppet 4 and stdlib brought many new functions and ways to validate
incoming data, and I consider puppet 4 to now be a loosely typed
language now. Hell, there was even this a long time ago:
https://github.com/puppetlabs/puppetlabs-kwalify But puppet only does
so much, and while having validation reside in code might make
troubleshooting a snap, there is still a delay in the feedback loop when
the code is tightly coupled with an external “database” of data. Data
that is inserted by non puppet developers who don’t know YAML or data
structures.

So with that said I want to introduce something new to puppet module
development, called module schemas. A module schema is a specification
that details the inner workings of a module. For right now this means
a detailed specification of all the parameters for classes and
definitions used inside a module who’s goal is to make it impossible to
insert a bad data structure. But ideally, we can specify so much more
(functions, types, providers, templates) even hiera calls in weird
places like templates and functions, which are usually things that do
not get documented and are hard to reference and usually requires
looking at source code.

What does such a schema look like?

Here is a example schema for the apache module which contains 446
parameters!.
https://github.com/logicminds/puppet_module_schemas/blob/master/apache_schema.yaml

The most immediate use case for such a schema is hiera validation as I
have outlined here:
http://logicminds.github.io/blog/2016/01/16/testing-hiera-data. Which
works AWESOME!. We are validating hiera data and not YAML and doing it
under 500 ms for every commit on every single file.

As a community we need a solution for validating hiera data. Its my
belief that schemas are the way to go. After all hiera data is now in
modules with no way to easily validate.

Other use cases that come to mind:

- generating documentation (Many modules on the forge usually contain
a static map of parameters used inside the module). If a schema was
present, we could just generate that same map automatically.
- useful for other 3rd party tools like puppet strings
Parameter specification lookup
- Imagine a face that shows internal puppet module specifications.
I am not talking about puppet-strings, this would detail the
parameters given a class, or an example parameter value given a
parameter name.
Scenario:
- puppet module puppetlabs/apache (outputs all the parameters,
classes for that module) in a specified format (json or yaml)
- puppet module puppetlabs-apache::class_name (outputs all the
parameters for the class in a specified format (json or yaml)
- puppet module puppetlabs-apache::class_name::param1 (outputs
an example value for that parameter, as well as the default value) in a
specified format (json or yaml)

Foreman and Puppet Console need this level of detail as well.
Currently, both of these solutions spend quite a bit of time parsing
code to show parameters for UI display. It would be much easier if a
schema was available that detailed this level of data.. Think of the
speed improvements that could be had if this information was “cached” in
a file. These solutions currently load or intelligently scan all the
puppet code for every puppet environment to get the parameters and
defaults.

Here is how we can create a schema
http://logicminds.github.io/blog/2016/01/15/how-to-build-a-module-schema/
(which I even automated with retrospect-puppet
(https://github.com/nwops/puppet-retrospec.git)

However, we all need to agree on something before schemas can ever be a
“thing”. We need a schema for module schemas. This is important
because as soon as 3rd party tools or scripts start to use schemas and
later we decide the schema needs changing, everything breaks. Tools
need a specification to work from.

So with this in mind and an example schema here:
https://github.com/logicminds/puppet_module_schemas/blob/master/apache_schema.yaml.
How can this be improved? What should we add?

About the only change I was pondering was adding another object for the
types themselves.
https://github.com/logicminds/puppet_module_schemas/blob/master/specification_with_types.yaml

What are your thoughts? What steps do we need to take to make this a
supported specification? What would you desire in a module schema?

Am I the only one that thinks this is a killer solution?

Corey Osman


First, about schemas/meta-models:

We are working on the Puppet Type System to make it powerful enought todscribe a complete schema. We are doing this so that there is a metalevel (schema level) in muppet that can serve as the foundation forserialization, and model / schema transformations. I.e. that you cantake such a puppet meta-model (i.e. schema) and transform it into someother kind of schema.


The new meta-model is being based on the Puppet 4.x type system.

As R.I pointed out, when typing everything in Puppet 4.x this doesdefine all of the constraints on any data provided via data binding.

What is a bit more difficult to automatically extract are theexpectations on data keys and type constraints for keys that are simplylooked up. This cannot be achieved until runtime since keys (and alsoexpectations) are dynamically evaluated. Thus, to be able to validate,there would need to a static declaration of the expectations for these keys.

In addition; (and a bad design) would be if a module depended onsomething else to supply data (it comes without defaults).


With puppet 4, you can write the validation in puppet itself.

A module could call a mymodule::verify_data_expectations() function(writen in .pp syntax). From the command line, you can then run thatwith puppet apply:


  puppet apply -e 'mymodule::verify_data_expectations()

Or, always run this at runtime.

The function itself looks like a schema - simply map keys to types anditerate.


{ foo::bar => Integer[1,10],
  a_hash_merge => Struct{{ a => Iteger, b => String[1] }]
  ...
}.each |$key, $type] {

  $val = lookup($key) # handle missing key here
  assert_type($type, $val) |$t, $v| {

fail("The lookup of key $key expected a type of $t, but got the noncompliant value: $v")

}
}

...or some variation on that - there are several options on both lookupand assert_type that can be used, and assert_type IIRC uses heuristicsto point to where the type diverges from the wanted type (which isbetter for complex types), so it may be more helpful than the manuallycrafted error message. And - if the automatic assertion is enough, thenthe expected type can be presented directly to lookup and it will do thetype checking.

Then, if everyone does this the same way, the puppet hash is the schema,and it could be referencenced in module meta data - perhaps by givingthe name of the function that validates.

Then, it would be easy to write something that iterates over all modulesin an environment and calls each modules - data-expectency-validationfunction.

Other alterantives when we are done with the meta-model in puppet. Thehashmap mapping keys to names could be expressed that way, and if sodesired transformed to some other schema, that can be used with tools ofyour choice to validate some data file in some format loaded by somehiera backend. (I.e. without having to evaluate and run any puppetcode). Meanwhile, puppet validate could call of the functions.

Just some ideas about how to achieve the goal of validating dataexpectations.


- henrik

--

Visit my Blog "Puppet on the Edge"
http://puppet-on-the-edge.blogspot.se/

--
You received this message because you are subscribed to the Google Groups "Puppet 
Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-dev/n8o6vd%24dh3%241%40ger.gmane.org.
For more options, visit https://groups.google.com/d/optout.

[Puppet-dev] Re: RFC - A specification for module schemas

Reply via email to