This is an automated email from the ASF dual-hosted git repository. cdutz pushed a commit to branch develop in repository https://gitbox.apache.org/repos/asf/plc4x.git
The following commit(s) were added to refs/heads/develop by this push: new 099ca20d09 docs: Updated some documentation on the code-generation. 099ca20d09 is described below commit 099ca20d0998974559bbc2a75bef3458cb7e56b5 Author: Christofer Dutz <cd...@apache.org> AuthorDate: Fri Mar 22 22:22:50 2024 +0100 docs: Updated some documentation on the code-generation. --- src/site/asciidoc/developers/code-gen/index.adoc | 111 +++++---- .../developers/code-gen/protocol/mspec.adoc | 263 +++++++++++++++------ 2 files changed, 260 insertions(+), 114 deletions(-) diff --git a/src/site/asciidoc/developers/code-gen/index.adoc b/src/site/asciidoc/developers/code-gen/index.adoc index 92726df6fa..83019d265f 100644 --- a/src/site/asciidoc/developers/code-gen/index.adoc +++ b/src/site/asciidoc/developers/code-gen/index.adoc @@ -54,17 +54,17 @@ The `Types Base` module provides all the structures the `Protocol` modules outpu `Protocol Base` and `Language Base` hereby just provide the interfaces that reference these types and provide the API for the `plc4x-maven-plugin` to use. -These modules are also maintained in a repository which is separate from the rest of the PLC4X code. +These modules are also maintained in a link:https://github.com/apache/plc4x-build-tools/tree/develop/code-generation[repository] which is separate from the rest of the PLC4X code. -This is due to some restrictions in the Maven build system. If you are interested in understanding the reasons - please read the chapter on `Problems with Maven` near the end of this page. +This is generally only due to some restrictions in the Maven build system. If you are interested in understanding the reasons - please read the chapter on `Problems with Maven` near the end of this page. -Concrete protocol spec parsers and templates that actually generate code are implemented in derived modules. +Concrete link:https://github.com/apache/plc4x/tree/develop/code-generation/protocol-base-mspec[protocol spec parsers], link:https://github.com/apache/plc4x/tree/develop/code-generation/language-base-freemarker[code generators] as well as link:https://github.com/apache/plc4x/tree/develop/code-generation/language-java[templates] that actually generate code are implemented in derived modules all located under the link:https://github.com/apache/plc4x/tree/develop/code-generation[code-generat [...] -We didn't want to tie ourselves to only one way to specify protocols and to generate code. Generally multiple types of formats for specifying drivers are thinkable and the same way multiple ways of generating code are possible. Currently however we only have one parser: `MSpec` and one generator: `Freemarker`. +We didn't want to tie ourselves to only one way to specify protocols and to generate code. Generally multiple types of formats for specifying drivers are thinkable and the same way, multiple ways of generating code are possible. Currently, however we only have one parser: `MSpec` and one generator: `Freemarker`. These add more layers to the hierarchy. -So for example in case of generating a Siemens S7 Driver for Java this would look like this: +So for example in case of generating a `Siemens S7` Driver for `Java` this would look like this: [ditaa,code-generation-intro-s7-java] .... @@ -129,14 +129,17 @@ So in general it is possible to add new forms of providing protocol definitions For the formats of specifying a protocol we have tried out numerous tools and frameworks, however the results were never quite satisfying. Usually using them required a large amount of workarounds, which made the solution quite complicated. +This is mainly the result, that tools like Thrift, Avro, GRPc, ... all are made for transferring an object structure from A to B. They lay focus on keeping the structure of the object in takt and not offer ways to control the format for transferring them. -In the end only DFDL and the corresponding Apache project https://daffodil.apache.org[Apache Daffodil] seemed to provide what we were looking for. +Existing industry standards, such as `ASN.1` unfortunately mostly relied on large portions of text to describe part of the parsing or serializing logic, which made it pretty much useless for a fully automated code genration. + +In the end only `DFDL` and the corresponding Apache project link:https://daffodil.apache.org[Apache Daffodil] seemed to provide what we were looking for. With this we were able to provide first driver versions fully specified in XML. -The downside was, that the PLC4X community regarded this XML format as pretty complicated and when implementing an experimental code generator we quickly noticed that generating a nice object model would not be possible, due to the lack ability to model the inheritance of types in DFDL. +The downside was, that the PLC4X community regarded this XML format as pretty complicated and when implementing an experimental code generator we quickly noticed that generating a nice object model would not be possible, due to the lack of an ability to model inheritance of types into a DFDL schema. -In the end we came up with our own solution which we called `MSpec` and is described in the link:protocol/mspec.html[MSpec Format description]. +In the end we came up with our own format which we called `MSpec` and is described in the link:protocol/mspec.html[MSpec Format description]. === Configuration @@ -144,19 +147,28 @@ The `plc4x-maven-plugin` has a very limited set of configuration options. In general all you need to specify, is the `protocolName` and the `languageName`. -An additional option `outputFlavor` allows generating multiple versions of a driver for a given language. This can come in handy if we want to be able to generate `read-only` or `passive mode` driver variants. +An additional option `outputFlavor` allows generating multiple versions of a driver for a given language. +This can come in handy if we want to be able to generate `read-only` or `passive mode` driver variants. + +In order to be able to refactor and improve protocol specifications without having to update all drivers for a given protocol, we recently added a `protocolVersion` attribute, that allows us to provide and use multiple versions of one protocol. +So in case of us updating the fictional `wombat-protocol`, we could add a `version 2` `mspec` for that, then use the version 2 in the java-driver and continue to use version 1 in all other languages. +Once all drivers are updated we could eliminate the version again. Last, not least, we have a pretty generic `options` config option, which is a Map type. -With options is it possible to pass generic options to the code-generation. So if a driver or language requires further customization, these options can be used. +With options is it possible to pass generic options to the code-generation. +So if a driver or language requires further customization, these options can be used. +For a list of all supported options for a given language template, please refer to the corresponding language page. Currently, the `Java` module makes use of such an option for specifying the Java `package` the generated code uses. -If no `package` option is provided, the default package `org.apache.plc4x.{language-name}.{protocol-name}.{output-flavor}` is used, but especially when generating custom drivers, which are not part of the Apache PLC4X project, different package names are better suited. So in these cases, the user can simply override the default package name. +If no `package` option is provided, the default package `org.apache.plc4x.{language-name}.{protocol-name}.{output-flavor}` is used, but especially when generating custom drivers, which are not part of the Apache PLC4X project, different package names are better suited. +So in these cases, the user can simply override the default package name. There is also an additional parameter: `outputDir`, which defaults to `${project.build.directory}/generated-sources/plc4x/` and usually shouldn't require being changed in case of a `Java` project, but usually requires tweaking when generating code for other languages. Here's an example of a driver pom for building a `S7` driver for `java`: +[subs=attributes+] .... <?xml version="1.0" encoding="UTF-8"?> <!-- @@ -185,7 +197,7 @@ Here's an example of a driver pom for building a `S7` driver for `java`: <parent> <groupId>org.apache.plc4x.plugins</groupId> <artifactId>plc4x-code-generation</artifactId> - <version>0.6.0-SNAPSHOT</version> + <version>{current-last-released-version}</version> </parent> <artifactId>test-java-s7-driver</artifactId> @@ -217,13 +229,13 @@ Here's an example of a driver pom for building a `S7` driver for `java`: <dependency> <groupId>org.apache.plc4x.plugins</groupId> <artifactId>plc4x-code-generation-driver-base-java</artifactId> - <version>0.6.0-SNAPSHOT</version> + <version>{current-last-released-version}</version> </dependency> <dependency> <groupId>org.apache.plc4x.plugins</groupId> <artifactId>plc4x-code-generation-language-java</artifactId> - <version>0.6.0-SNAPSHOT</version> + <version>{current-last-released-version}</version> <!-- Scope is 'provided' as this way it's not shipped with the driver --> <scope>provided</scope> </dependency> @@ -231,7 +243,7 @@ Here's an example of a driver pom for building a `S7` driver for `java`: <dependency> <groupId>org.apache.plc4x.plugins</groupId> <artifactId>plc4x-code-generation-protocol-s7</artifactId> - <version>0.6.0-SNAPSHOT</version> + <version>{current-last-released-version}</version> <!-- Scope is 'provided' as this way it's not shipped with the driver --> <scope>provided</scope> </dependency> @@ -244,33 +256,42 @@ So the plugin configuration is pretty straight forward, all that is specified, i The dependency: +[subs=attributes+] +.... <dependency> <groupId>org.apache.plc4x.plugins</groupId> <artifactId>plc4x-code-generation-driver-base-java</artifactId> - <version>0.6.0-SNAPSHOT</version> + <version>{current-last-released-version}</version> </dependency> +.... For example contains all classes the generated code relies on. The definitions of both the `s7` protocol and `java` language are provided by the two dependencies: +[subs=attributes+] +.... <dependency> <groupId>org.apache.plc4x.plugins</groupId> <artifactId>plc4x-code-generation-language-java</artifactId> - <version>0.6.0-SNAPSHOT</version> + <version>{current-last-released-version}</version> <!-- Scope is 'provided' as this way it's not shipped with the driver --> <scope>provided</scope> </dependency> +.... and: +[subs=attributes+] +.... <dependency> <groupId>org.apache.plc4x.plugins</groupId> <artifactId>plc4x-code-generation-protocol-s7</artifactId> - <version>0.6.0-SNAPSHOT</version> + <version>{current-last-released-version}</version> <!-- Scope is 'provided' as this way it's not shipped with the driver --> <scope>provided</scope> </dependency> +.... The reason for why the dependencies are added as code-dependencies and why the scope is set the way it is, is described in the <<Why are the protocol and language dependencies done so strangely?>> section. @@ -282,15 +303,14 @@ The plugin uses the https://docs.oracle.com/javase/7/docs/api/java/util/ServiceL In order to provide a new protocol module, all that is required, it so create a module containing a `META-INF/services/org.apache.plc4x.plugins.codegenerator.protocol.Protocol` file referencing an implementation of the `org.apache.plc4x.plugins.codegenerator.protocol.Protocol` interface. -This interface is located in the `org.apache.plc4x.plugins:plc4x-code-generation-protocol-base` module and generally only defines two methods: +This interface is located in the `org.apache.plc4x.plugins:plc4x-code-generation-protocol-base` module and generally only defines three methods: .... package org.apache.plc4x.plugins.codegenerator.protocol; -import org.apache.plc4x.plugins.codegenerator.types.definitions.ComplexTypeDefinition; import org.apache.plc4x.plugins.codegenerator.types.exceptions.GenerationException; -import java.util.Map; +import java.util.Optional; public interface Protocol { @@ -302,28 +322,35 @@ public interface Protocol { String getName(); /** - * Returns a map of complex type definitions for which code has to be generated. + * Returns a map of type definitions for which code has to be generated. * * @return the Map of types that need to be generated. * @throws GenerationException if anything goes wrong parsing. */ - Map<String, TypeDefinition> getTypeDefinitions() throws GenerationException; + TypeContext getTypeContext() throws GenerationException; + + + /** + * @return the protocolVersion is applicable + */ + default Optional<String> getVersion() { + return Optional.empty(); + } } .... -These implementations could use any form of way to generate the Map of `ComplexTypeDefinition`'s. -They could even be hard coded. +The `name` is being used for the module to find the right language module, so the result of `getName()` needs to match the value provided in the maven config-option `protocolName`. -However, we have currently implemented utilities for universally providing input: +As mentioned before, we support multiple versions of a protocol, so if `getVersions()` returns a non-empty version, this is used to select the version. -- link:protocol/mspec.html[MSpec Format] PLC4X proprietary format. +The most important method for the actual code-generation however is the `getTypeContext()` method, which returns a `TypeContext` type which generally contains a list of all parsed types for this given protocol. ==== Language Modules -Analog to the <<Protocol Modules>> the Language modules are constructed equally. +Analog to the <<Protocol Modules>> the Language modules are constructed very similar. -The `Language` interface is very simplistic too and is located in the `org.apache.plc4x.plugins:plc4x-code-generation-language-base` module and generally only defines two methods: +The `LanguageOutput` interface is very simplistic too and is located in the `org.apache.plc4x.plugins:plc4x-code-generation-language-base` module and generally only defines four methods: .... package org.apache.plc4x.plugins.codegenerator.language; @@ -353,7 +380,7 @@ public interface LanguageOutput { */ Set<String> supportedOptions(); - void generate(File outputDir, String languageName, String protocolName, String outputFlavor, + void generate(File outputDir, String version, String languageName, String protocolName, String outputFlavor, Map<String, TypeDefinition> types, Map<String, String> options) throws GenerationException; } @@ -361,9 +388,11 @@ public interface LanguageOutput { The file for registering Language modules is located at: `META-INF/services/org.apache.plc4x.plugins.codegenerator.language.LanguageOutput` -Same as with the protocol modules, the language modules could also be implemented in any thinkable way, however we have already implemented some helpers for using: +The `name` being used by the plugin to find the language output module defined by the maven config option `languageName`. + +`supportedOutputFlavors` provides a possible list of flavors, that can be referred to by the maven config option `outputFlavor`. -- link:language/freemarker.html[Apache Freemarker Format] Generate output using https://freemarker.apache.org[Apache Freemarker] Project. +`supportedOptions` provides a list of `options` that the current language module is able to use and which can be passed in to the maven configuration using the `options` settings. === Problems with Maven @@ -376,34 +405,34 @@ This is due to some restrictions in Maven, which result from the way Maven gener The main problem is that when starting a build, in the `validate`-phase, Maven goes through the configuration, downloads the plugins and configures these. This means that Maven also tries to download the dependencies of the plugins too. -In case of using a Maven plugin in a project which also produces the maven plugin, this is guaranteed to fail - Especially during releases. -While during normal development, Maven will probably just download the latest `SNAPSHOT` from our Maven repository and be happy with this and not complain that this version will be overwritten later on in the build. +In case of using a Maven plugin in a project which also builds the maven plugin itself, this is guaranteed to fail - Especially during releases. +While during normal development, Maven will probably just download the latest `SNAPSHOT` from our Maven repository and will be happy with this and not complain even if this version will be overwritten later on in the build. It will just use the new version as soon as it has to. During releases however the release plugin changes the version to a release version and then spawns a build. -In this case the build will fail because there is no Plugin with that version to download. -In this case the only option would be to manually build and install the plugin in the release version and to re-start the release (Which is not a nice thing for the release manager). +In this case the build will fail because there is no Plugin with that version to download from anywhere. +In this case the only option would be to manually build and deploy the plugin in the release version and to re-start the release (Which is not a nice thing for the release manager). -For this reason we have stripped down the plugin and its dependencies to an absolute minimum and have released (or will release) that separately from the rest, hoping due to the minimality of the dependencies that we will not have to do it very often. +For this reason we have stripped down the plugin and its dependencies to an absolute minimum and have released that separately from the rest, hoping due to the minimality of the dependencies that we will not have to do it very often. As soon as the tooling is released, the version is updated in the PLC4X build and the release version is used without any complications. ==== Why are the protocol and language dependencies done so strangely? -It would certainly be a lot cleaner, if we provided the modules as plugin dependencies. +It would certainly be a lot cleaner, if we provided the dependencies to protocol and language modules as plugin dependencies. -However, as we mentioned in the previous sub-chapter, Maven tries to download and configure the plugins prior to running the build. +However, as we mentioned in the previous subchapter, Maven tries to download and configure the plugins prior to running the build. So during a release the new versions of the modules wouldn't exist, this would cause the build to fail. We could release the protocol- and the language modules separately too, but we want the language and protocol modules to be part of the project, to not over-complicate things - especially during a release. -So the Maven plugin is built in a way, that it uses the modules dependencies and creates its own Classloader to contain all of these modules at runtime. +In order to keep the build and the release as simple as possible, we built the Maven plugin in a way, that it uses the modules dependencies and creates its own Classloader to contain all of these modules at runtime. This brings the benefit of being able to utilize Maven's capability of determining the build order and dynamically creating the modules build classpath. Adding a normal dependency however would make Maven deploy the artifacts with the rest of the modules. -We don't want that as the modules are useless as soon as they have been used to generate the code. +We don't want that as both the protocol as well as the language-modules are useless as soon as they have been used to generate the code. So we use a trick that is usually used in Web applications, for example: Here the vendor of a Servlet engine is expected to provide an implementation of the `Servlet API`. diff --git a/src/site/asciidoc/developers/code-gen/protocol/mspec.adoc b/src/site/asciidoc/developers/code-gen/protocol/mspec.adoc index 0c6050f10e..8d19b56480 100644 --- a/src/site/asciidoc/developers/code-gen/protocol/mspec.adoc +++ b/src/site/asciidoc/developers/code-gen/protocol/mspec.adoc @@ -20,28 +20,29 @@ The `MSpec` format (Message Specification) was a result of a brainstorming session after evaluating a lot of other options. -We simply sat down and started to write some imaginary format (`imaginary` was even the initial Name we used) and created parses for this afterwards and fine-tuned spec and parsers as part of the process of implementing first protocols and language templates. +We simply sat down and started to write some imaginary format (imaginary was even the initial Name we used Machine-Readable SPEC = `mspec`). +After we had an initial format that seemed to do the trick, we then stated creating parses for this and started iteratively fine-tuning both spec and parsers as part of the process of implementing new protocols and language templates. It's a text-based format. At the root level of these specs are a set of `type`, `discriminatedType`, `dataIo` and `enum` blocks. -`type` elements are objects who's content is independent of the input. +`type` elements are objects who`s content and structure is independent of the input. -An example would be the `TPKTPacket` of the S7 format: +An example would be the `TPKTPacket` of the `S7` format: .... [type TPKTPacket - [const uint 8 protocolId 0x03] - [reserved uint 8 '0x00'] - [implicit uint 16 len 'payload.lengthInBytes + 4'] - [field COTPPacket 'payload'] + [const uint 8 protocolId 0x03] + [reserved uint 8 '0x00'] + [implicit uint 16 len 'payload.lengthInBytes + 4'] + [simple COTPPacket('len - 4') payload] ] .... -A `discriminatedType` type, in contrast, is an object who's content and structure is influenced by the input. +A `discriminatedType` type, in contrast, is an object who`s content and structure is influenced by the input. -Every discriminated type can contain an arbitrary number of `discriminator` fields and exactly one `typeSwitch` element. +Every discriminated type can contain an arbitrary number of normal fields but must contain exactly one `typeSwitch` element. For example part of the spec for the S7 format looks like this: @@ -51,47 +52,52 @@ For example part of the spec for the S7 format looks like this: [discriminator uint 8 messageType] [reserved uint 16 '0x0000'] [simple uint 16 tpduReference] - [implicit uint 16 parameterLength 'parameter.lengthInBytes'] - [implicit uint 16 payloadLength 'payload.lengthInBytes'] - [typeSwitch 'messageType' + [implicit uint 16 parameterLength 'parameter != null ? parameter.lengthInBytes : 0'] + [implicit uint 16 payloadLength 'payload != null ? payload.lengthInBytes : 0'] + [typeSwitch messageType ['0x01' S7MessageRequest ] - ['0x03' S7MessageResponse + ['0x02' S7MessageResponse [simple uint 8 errorClass] - [simple uint 8 errorCode ] + [simple uint 8 errorCode] + ] + ['0x03' S7MessageResponseData + [simple uint 8 errorClass] + [simple uint 8 errorCode] ] ['0x07' S7MessageUserData ] ] - [simple S7Parameter('messageType') parameter] - [simple S7Payload('messageType', 'parameter') payload ] + [optional S7Parameter ('messageType') parameter 'parameterLength > 0'] + [optional S7Payload ('messageType', 'parameter') payload 'payloadLength > 0' ] ] .... -A types start is declared by an opening square bracket `[` and ended with a closing one `]`. - -Also, to both provide a name as first argument. +A type`s start is declared by an opening square bracket `[` followed by the `type` or `discriminatedType` keyword, which is directly followed by a name. +A Type definition is ended with a closing square bracket `]`. -Every type definition contains a list of fields that can have different types. +Every type definition contains a list of so-called fields. -The list of available types are: +The list of available field types are: -- abstract: used in the parent type declaration do declare a field that has to be defined with the identical type in all sub-types (reserved for `discriminatedType`). +- abstract: used in the parent type declaration do declare a field that has to be defined with the identical type in all subtypes (reserved for `discriminatedType`). - array: array of simple or complex typed objects. +- assert: generally similar to `constant` fields, however do they throw `AssertionExceptions` instead of hard `ParseExceptions`. They are used in combination with optional fields. - checksum: used for calculating and verifying checksum values. - const: expects a given value and causes a hard exception if the value doesn't match. - discriminator: special type of simple typed field which is used to determine the concrete type of object (reserved for `discriminatedType`). - enum: special form of field, used if an enum types property is to be used instead of it's primary value. - implicit: a field required for parsing, but is usually defined though other data, so it's not stored in the object, but calculated on serialization. -- assert: generally similar to `constant` fields, however do they throw `AssertionExceptions` instead of hard `ParseExceptions`. They are used in combination with optional fields. - manualArray: like an array field, however the logic for serializing, parsing, number of elements and size have to be provided manually. - manual: simple field, where the logic for parsing, serializing and size have to be provided manually. - optional: simple or complex typed object, that is only present if an optional condition expression evaluates to `true` and no `AssertionException` is thrown when parsing the referenced type. - padding: field used to add padding data to make datastructures aligned. +- peek: field that tries to parse a given structure without actually consuming the bytes. - reserved: expects a given value, but only warns if condition is not meet. - simple: simple or complex typed object. -- typeSwitch: not a real field, but indicates the existence of sub-types, which are declared inline (reserved for `discriminatedType`). +- typeSwitch: not a real field, but indicates the existence of subtypes, which are declared inline (reserved for `discriminatedType`). - unknown: field used to declare parts of a message that still has to be defined. Generally used when reverse-engineering a protocol. Messages with `unknown` fields can only be parsed and not serialized. +- validation: this field is not actually a real field, it's more a condition that is checked during parsing and if the check fails, it throws a validation exception, wich is handled by - virtual: generates a field in the message, that is generally only used for simplification. It's not used for parsing or serializing. The full syntax and explanations of these type follow in the following chapters. @@ -113,15 +119,20 @@ The base types available are currently: - *bit*: Simple boolean value or bit. - *byte*: Special value fixed to 8 bit, which defaults to either signed or unsigned depending on the programming language (Java it defaults to signed integer values and in C and Go it defaults to unsigned integers). -- *uint*: The input is treated as unsigned integer value. - *int*: The input is treated as signed integer value. +- *uint*: The input is treated as unsigned integer value. - *float*: The input is treated as floating point number. - *string*: The input is treated as string. -All above types take a `size` value which provides how many `bits` should be read. -All except the `bit` type, which is fixed to one single bit. +Then for `dataIo` types we have some additional types: +- *time*: The input is treated as a time representation +- *date*: The input is treated as a date representation +- *dateTime*: The input is treated as a date with time + +All except the `bit` and `byte` types take a `size` value which provides how many `bits` should be read. +For the `bit` field, this obviously defaults to 1 and for the `byte` the bits default to 8. -So reading an unsigned byte would be: `uint 8`. +So reading an unsigned 8-bit integer would be: `uint 8`. There is currently one special type, reserved for string values, whose length is determined by an expression instead of a fixed number of bits. It is considered a variable length string: @@ -129,7 +140,7 @@ There is currently one special type, reserved for string values, whose length is === Complex Types -In contrast to simple types, complex type reference other complex types (Root elements of the spec document). +In contrast to simple types, complex types reference other complex types (Root elements of the spec document). How the parser should interpret them is defined in the referenced types definition. @@ -142,9 +153,11 @@ In the example above, for example the `S7Parameter` is defined in another part o An `array` field is exactly what you expect. It generates an field which is not a single-value element but an array or list of elements. - [array {simple-type} {size} '{name}' {'count', 'length', 'terminated'} '{expression}'] + [array {bit|byte} {name} {count|length|terminated} '{expression}'] - [array {complex-type} '{name}' {'count', 'length', 'terminated'} '{expression}'] + [array {simple-type} {size} {name} {count|length|terminated} '{expression}'] + + [array {complex-type} {name} {count|length|terminated} '{expression}'] Array types can be both simple and complex typed and have a name. An array field must specify the way it's length is determined as well as an expression defining it's length. @@ -153,11 +166,32 @@ Possible values are: - `length`: In this case a given number of bytes are being read. So if an element has been parsed and there are still bytes left, another element is parsed. - `terminated`: In this case the parser will continue reading elements until it encounters a termination sequence. +==== assert Field + +An assert field is pretty much identical to a `const` field. +The main difference however it how the case is handled, if the parsed value does not match the expected value. + + [assert {bit|byte} {name} '{assert-value}'] + + [assert {simple-type} {size} {name} '{assert-value}'] + +While a `const` field would abort parsing in total with an error, an `assert` field with abort parsing, but the error will only bubble up in the stack till the first `optional` field is found. + +In this case the parser will be rewound to the position before starting to parse the `optional` field and continue parsing with the next field, skipping the `optional` field. + +If there is no upstream `optional` field, then parsing of the message terminates with an error. + +See also: +- validation field: Similar to an `assert` field, however no parsing is done, and instead simply a condition is checked. +- optional field: `optional` fields are aware of the types of parser errors produced by `assert` and `validation` fields + ==== checksum Field A checksum field can only operate on simple types. - [checksum {simple-type} {size} '{name}' '{checksum-expression}'] + [checksum {bit|byte} {name} '{checksum-expression}'] + + [checksum {simple-type} {size} {name} '{checksum-expression}'] When parsing a given simple type is parsed and then the result is compared to the value the `checksum-expression` provides. If they don't match an exception is thrown. @@ -175,9 +209,11 @@ See also: A const field simply reads a given simple type and compares to a given reference value. - [const {simple-type} {size} '{name}' {reference}] + [const {bit|byte} {name} {reference}] -When parsing it makes the parser throw an Exception if the parsed value does not match. + [const {simple-type} {size} {name} {reference}] + +When parsing it makes the parser throws an Exception if the parsed value does not match the expected one. When serializing is simply outputs the expected constant. @@ -190,7 +226,10 @@ See also: Discriminator fields are only used in `discriminatedType`s. - [discriminator {simple-type} {size} '{name}'] + [discriminator {simple-type} {size} {name}] + +They are used, in cases where the value of a field determines the concrete type of a discriminated type. +In this case we don't have to waste memory on storing the discriminator value and this can be statically assigned to the type. When parsing a discriminator fields result just in being a locally available variable. @@ -204,7 +243,9 @@ See also: Implicit types are fields that get their value implicitly from the data they contain. - [implicit {simple-type} {size} '{name}' '{serialization-expression}'] + [implicit {bit|byte} {name} '{serialization-expression}'] + + [implicit {simple-type} {size} {name} '{serialization-expression}'] When parsing an implicit type is available as a local variable and can be used by other expressions. @@ -216,30 +257,41 @@ This field doesn't keep any data in memory. ==== manualArray Field - [manualArray {simple-type} {size} '{name}' {'count', 'length', 'terminated'} '{loop-expression}' '{serialization-expression}' '{deserialization-expression}' '{length-expression}'] + [manualArray {bit|byte} {name} {count|length|terminated} '{loop-expression}' '{serialization-expression}' '{deserialization-expression}' '{length-expression}'] - [manualArray {complex-type} '{name}' {'count', 'length', 'terminated'} '{loop-expression}' '{serialization-expression}' '{deserialization-expression}' '{length-expression}'] + [manualArray {simple-type} {size} {name} {count|length|terminated} '{loop-expression}' '{serialization-expression}' '{deserialization-expression}' '{length-expression}'] + + [manualArray {complex-type} {name} {count|length|terminated} '{loop-expression}' '{serialization-expression}' '{deserialization-expression}' '{length-expression}'] ==== manual Field - [manual {simple-type} {size} '{name}' '{serialization-expression}' '{deserialization-expression}' '{length-expression}'] + [manual {bit|byte} {name} '{serialization-expression}' '{deserialization-expression}' '{length-expression}'] + + [manual {simple-type} {size} {name} '{serialization-expression}' '{deserialization-expression}' '{length-expression}'] - [manual {complex-type} '{name}' '{serialization-expression}' '{deserialization-expression}' '{length-expression}'] + [manual {complex-type} {name} '{serialization-expression}' '{deserialization-expression}' '{length-expression}'] ==== optional Field An optional field is a type of field that can also be `null`. - [optional {simple-type} {size} '{name}' '{optional-expression}'] + [optional {bit|byte} {name} ('{optional-expression}')?] - [optional {complex-type} '{name}' '{optional-expression}'] + [optional {simple-type} {size} {name} ('{optional-expression}')?] -When parsing the `optional-expression` is evaluated. If this results in`false` nothing is output, if it evaluates to `true` it is serialized as a `simple` field. + [optional {complex-type} {name} ('{optional-expression}')?] + +The `optional-expression` attribute is optional. If it is provided the `optional-expression` is evaluated. +If this results in`false` nothing is parsed, if it evaluates to `true` it is parsed. + +In any case, if when parsing the content of an `optional` field a `assert` or `validation` field fails, the parser is rewound to the position before starting to parse the `optional` field, the optional field is then skipped and the parser continues with the next field. When serializing, if the field is `null` nothing is output, if it is not `null` it is serialized normally. See also: - simple field: In general `optional` fields are identical to `simple` fields except the ability to be `null` or be skipped. +- `assert`: Assert fields are similar to `const` fields, but can abort parsing of an `optional` filed. +- `validation`: If a validation field in any of the subtypes fails, this aborts parsing of the `optional` field. ==== padding Field @@ -247,25 +299,35 @@ A padding field allows aligning of data blocks. It outputs additional padding data, given amount of times specified by padding expression. Padding is added only when result of expression is bigger than zero. - [padding {simple-type} {size} '{pading-value}' '{padding-expression}'] + [padding {bit|byte} {name} '{pading-value}' '{times-padding}'] + + [padding {simple-type} {size} {name} '{pading-value}' '{times-padding}'] -When parsing a `padding` field is just consumed without being made available as property or local variable if the `padding-expression` evaluates to value greater than zero. -If it doesn't, it is just skipped. +When parsing a `padding` field is being parsed, the `times-padding` expressions determines how often the `padding-value` should be read. So it doesn't really check if the read values match the `padding-value`, it just ensures the same amount of bits are being read. The read values are simply discarded. + +When serializing, the `times-padding` defines how often the `padding-value` should be written. This field doesn't keep any data in memory. +===== peek Field + +// TODO: Implement + ==== reserved Field Reserved fields are very similar to `const` fields, however they don't throw exceptions, but instead log messages if the values don't match. -The reason for this is that in general reserved fields have the given value until they start to be used. +The reason for this is that in general reserved fields have the given value until they start being used. If the field starts to be used this shouldn't break existing applications, but it should raise a flag as it might make sense to update the drivers. - [reserved {simple-type} {size} '{name}' '{reference}'] + [reserved {bit|byte} {name} '{reference}'] + + [reserved {simple-type} {size} {name} '{reference}'] + +When parsing the values a `reserved` field is parsed and the result is compared to the reference value and then discarded. -When parsing the values is parsed and the result is compared to the reference value. -If the values don't match, a log message is sent. +If the values don't match, a log message is written. This field doesn't keep any data in memory. @@ -275,48 +337,42 @@ See also: ==== simple Field Simple fields are the most common types of fields. -A `simple` field directly mapped to a normally typed field. - [simple {simple-type} {size} '{name}'] +A `simple` field directly mapped to a normally typed field of a message type. - [simple {complex-type} '{name}'] - -When parsing, the given type is parsed (can't be `null`) and saved in the corresponding model instance's property field. + [simple {bit|byte} {name}] -When serializing it is serialized normally. + [simple {simple-type} {size} {name}] -==== virtual Field - -Virtual fields have no impact on the input or output. -They simply result in creating artificial get-methods in the generated model classes. + [simple {complex-type} {name}] - [virtual {simple-type} {size} '{name}' '{value-expression}'] - - [virtual {complex-type} '{name}' '{value-expression}'] +When parsing, the given type is parsed (can't be `null`) and saved in the corresponding model instance's property field. -Instead of being bound to a property, the return value of a `virtual` property is created by evaluating the `value-expression`. +When serializing it is serialized normally using either a simple type serializer or by delegating serialization to a complex type. ==== typeSwitch Field +// TODO: Finish this ... + These types of fields can only occur in discriminated types. A `discriminatedType` must contain *exactly one* `typeSwitch` field, as it defines the sub-types. - [typeSwitch '{arument-1}', '{arument-2}', ... - ['{argument-1-value-1}' {subtype-1-name} + [typeSwitch {field-or-attribute-1}(,{field-or-attribute-2}, ...) + ['{field-1-value-1}' {subtype-1-name} ... Fields ... ] - ['{vargument-1-value-2}', '{argument-2-value-1}' {subtype-2-name} + ['{field-1-value-2}', '{field-2-value-1}' {subtype-2-name} ... Fields ... ] - ['{vargument-1-value-3}', '{argument-2-value-2}' {subtype-2-name} [uint 8 'existing-attribute-1', uint 16 'existing-attribute-2'] + ['{field-1-value-3}', '{field-2-value-2}' {subtype-2-name} [uint 8 'existing-attribute-1', uint 16 'existing-attribute-2'] ... Fields ... ] A type switch element must contain a list of at least one argument expression. Only the last option can stay empty, which results in a default type. -Each sub-type declares a comma-separated list of concrete values. +Each subtype declares a comma-separated list of concrete values. It must contain at most as many elements as arguments are declared for the type switch. @@ -326,18 +382,54 @@ If it matches and there are no more values, the type is found, if more values ar If no type is found, an exception is thrown. -Inside each sub-type can declare fields using a subset of the types (`discriminator` and `typeSwitch` can't be used here) +Inside each subtype can declare fields using a subset of the types (`discriminator` and `typeSwitch` can't be used here) -The third case in above code-snippet also passes a named attribute to the sub-type. +The third case in above code-snippet also passes a named attribute to the subtype. The name must be identical to any argument or named field parsed before the switchType. These arguments are then available for expressions or passing on in the subtypes. +// TODO: Wildcard names + See also: - `discriminatedType` +===== unknown Field + +// TODO: Finish this ... + +This type of field is mainly used when working on reverse-engineering a new protocol. +It allows parsing any type of information, storing and using it and serializing it back. + +In general, it's something similar to a `simple` field, just explicitly states, that we don't yet quite know how to handle the content. + +===== validation Field + +As mentioned before, a `validation` field is not really a field, it's a check that is added to the type parser. + +// TODO: Finish this ... + +If the expression provided in the `validation` field fails, the parser aborts parsing and goes up the stack, till it finds the first `optional` field. +If it finds one, it rewinds the parser to the position just before starting to parse the `optional` field, then skips the `optional` fields and continues with the next field. + +If there is no `optional` field up the stack, then parsing fails. + + +==== virtual Field + +Virtual fields have no impact on the input or output. +They simply result in creating artificial get-methods in the generated model classes. + + [virtual {bit|byte} {name} '{value-expression}'] + + [virtual {simple-type} {size} {name} '{value-expression}'] + + [virtual {complex-type} {name} '{value-expression}'] + +Instead of being bound to a property, the return value of a `virtual` property is created by evaluating the `value-expression`. + ==== Parameters -Some times it is necessary to pass along additional parameters. +Sometimes it is necessary to pass along additional parameters. If a complex type requires parameters, these are declared in the header of that type. @@ -361,8 +453,33 @@ If a complex type requires parameters, these are declared in the header of that ] .... -Therefore wherever a complex type is referenced an additional list of parameters can be passed to the next type. +Therefore, wherever a complex type is referenced an additional list of parameters can be passed to the next type. Here comes an example of this in above snippet: [field S7Payload 'payload' ['messageType', 'parameter']] + +==== Serializer and Parser-Arguments + +Arguments influence the way the parser or serializer operates. + +Wherever an parser-argument is used, this should also be valid in all subtypes the parser processes. + +===== byteOrder + +A `byteOrder` argument can set or change the byte-order used by the parser. + +We currently support two variants: + +- BIG_ENDIAN +- LITTLE_ENDIAN + +===== encoding + +Each simple type has a default encoding, which is ok for a very high percentage of cases. + +Unsigned integers for example use 2s-complement notation, floating point values are encoded in IEEE 754 single- or double precision encoding. Strings are encoded as UTF-8 per default. + +However, in some cases an alternate encoding needs to be used. Especially when dealing with Strings, different encodings, such as ASCII, UTF-16 and many more, can be used. But also for numeric values, different encodings might be used. For example does KNX use a 16bit floating point encoding, which is not standard or in S7 drivers a special encoding was used to encode numeric values so they represent the number in hex format. + +An `encoding` attribute can be used to select a non-default encoding. \ No newline at end of file