This is an automated email from the ASF dual-hosted git repository. jorgebg pushed a commit to branch TINKERPOP-1942 in repository https://gitbox.apache.org/repos/asf/tinkerpop.git
commit 2fc2683c0d518e2a27b8b3eba9c136be1d9ee9e8 Author: Jorge Bay Gondra <jorgebaygon...@gmail.com> AuthorDate: Thu Apr 19 11:35:47 2018 +0200 GraphBinary specification --- docs/src/dev/io/graphbinary.asciidoc | 616 +++++++++++++++++++++++++++++++++++ docs/src/dev/io/index.asciidoc | 2 + 2 files changed, 618 insertions(+) diff --git a/docs/src/dev/io/graphbinary.asciidoc b/docs/src/dev/io/graphbinary.asciidoc new file mode 100644 index 0000000..2412a5d --- /dev/null +++ b/docs/src/dev/io/graphbinary.asciidoc @@ -0,0 +1,616 @@ +//// +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to You under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. + +//// + +[[graphbinary]] += GraphBinary + +GraphBinary is a binary serialization format that is designed to reduce serialization overhead on both the client +and the server, as well as limiting the size of the payload that is transmitted over the wire. + +It describes arbitrary object graphs with a fully-qualified format: + +[source] +---- +{type_code}{value} +---- + +Where: + +- `{type_code}` is a single byte representing the type number. +- `{value}` is a sequence of bytes which content is determined by the type. + +All encodings are big-endian. + +Quick examples, using hexadecimal notation to represent each byte: + +- `01 00 00 00 01`: a 32-bit integer number, that represents the decimal number 1. It’s composed by the +type_code `0x01` and four bytes to describe the value. +- `01 00 00 00 ff`: a 32-bit integer, representing the 256. +- `02 00 00 00 00 00 00 00 01`: a 64-bit integer number 1. It’s composed by the type_code `0x02` and eight bytes +to describe the value. + +== Version 1.0 + +=== Forward Compatibility + +The serialization format supports new types being added without the need to introduce a new version. + +Changes to existing types require new revision. + +=== Request Message + +Represents a message from the client to the server. + +Format: `{version}{request_id}{op}{processor}{args}` + +Where: + +- `{version}` is a `Byte` representing the protocol version, with the most significant bit set to one. For this version of the protocol, the value expected is `0x81` (`10000001`). +- `{request_id}` is a `UUID`. +- `{op}` is a `String`. +- `{processor}` is a `String`. +- `{args}` is a `Map`. + +The total length is not part of the message as the transport layer will provide it. For example: WebSockets, +as a framing protocol, defines payload length. + +=== Response Message + +Format: `{version}{id_present}{request_id}{status_code}{status_message}{status_attributes}{result_meta}{result_data}` + +Where: + +- `{version}` is a `Byte` representing the protocol version, with the most significant bit set to one. For this version of the protocol, the value expected is `0x81` (`10000001`). +- `{id_present}` is a single `Byte` representing whether a request id is present with only two possible values 0 and 1. +- `{request_id}` is a `UUID`. +- `{status_code}` is an `Int`. +- `{status_message}` is a `String`. +- `{status_attributes}` is a `Map`. +- `{result_meta}` is a `Map`. +- `{result_data}` is a fully qualified typed value composed of `{type_code}{value}`. + +The total length is not part of the message as the transport layer will provide it. + +=== Data Type Codes + +==== Core Data Types + +- `0x01`: Int +- `0x02`: Long +- `0x03`: String +- `0x04`: Date +- `0x05`: Timestamp +- `0x06`: Class +- `0x07`: Double +- `0x08`: Float +- `0x09`: List +- `0x0a`: Map +- `0x0b`: Set +- `0x0c`: UUID +- `0x0d`: Edge +- `0x0e`: Path +- `0x0f`: Property +- `0x10`: TinkerGraph +- `0x11`: Vertex +- `0x12`: VertexProperty +- `0x13`: Barrier +- `0x14`: Binding +- `0x15`: Bytecode +- `0x16`: Cardinality +- `0x17`: Column +- `0x18`: Direction +- `0x19`: Operator +- `0x1a`: Order +- `0x1b`: Pick +- `0x1c`: Pop +- `0x1d`: Lambda +- `0x1e`: P +- `0x1f`: Scope +- `0x20`: T +- `0x21`: Traverser +- `0x22`: BigDecimal +- `0x23`: BigInteger +- `0x24`: Byte +- `0x25`: ByteBuffer +- `0x26`: Short +- `0x00`: Custom + +==== Extended Types + +- `0x80`: Char +- `0x81`: Duration +- `0x82`: InetAddress +- `0x83`: Instant +- `0x84`: LocalDate +- `0x85`: LocalDateTime +- `0x86`: LocalTime +- `0x87`: MonthDay +- `0x88`: OffsetDateTime +- `0x89`: OffsetTime +- `0x90`: Period +- `0x92`: Year +- `0x93`: YearMonth +- `0x94`: ZonedDateTime +- `0x95`: ZoneOffset + +=== Data Type Formats + +==== Int + +Format: 4-byte two's complement integer. + +Example values: + +- `00 00 00 01`: 32-bit integer number 1. +- `00 00 01 01`: 32-bit integer number 256. +- `ff ff ff ff`: 32-bit integer number -1. +- `ff ff ff fe`: 32-bit integer number -2. + +==== Long + +Format: 4-byte two's complement integer. + +Example values + +- `00 00 00 00 00 00 00 01`: 64-bit integer number 1. +- `ff ff ff ff ff ff ff fe`: 64-bit integer number -2. + +==== String + +Format: `{length}{text_value}` + +Where: + +- `{length}` is an `Int` describing the byte length of the text. Negative value -1 represents the null string. +- `{text_value}` is a sequence of bytes representing the string value in UTF8 encoding. + +Example values + +- `00 00 00 03 61 62 63`: the string 'abc'. +- `00 00 00 04 61 62 63 64`: the string 'abcd'. + +==== Date + +Format: An 8-byte two's complement signed integer representing a millisecond-precision offset from the unix epoch. + +Example values + +- `00 00 00 00 00 00 00 00`: The moment in time 1970-01-01T00:00:00.000Z. +- `ff ff ff ff ff ff ff ff`: The moment in time 1969-12-31T23:59:59.999Z. + +==== Timestamp + +Format: The same as `Date`. + +==== Class + +Format: A `String` containing the fqcn. + +==== Double + +Format: 8 bytes representing IEEE 754 double-precision binary floating-point format. + +Example values + +- `3f f0 00 00 00 00 00 00`: Double 1 +- `3f 70 00 00 00 00 00 00`: Double 0.00390625 +- `3f b9 99 99 99 99 99 9a`: Double 0.1 + +==== Float + +Format: 4 bytes representing IEEE 754 single-precision binary floating-point format. + +Example values + +- `3f 80 00 00`: Float 1 +- `3e c0 00 00`: Float 0.375 + +==== List + +An ordered collection of items. + +Format: `{length}{item_0}...{item_n}` + +Where: + +- `{length}` is an unsigned 4-byte two's complement integer describing the length of the list. +- `{item_0}...{item_n}` are the items of the list. `{item_i}` is a fully qualified typed value composed of `{type_code}{value}`. + +==== Set + +A collection that contains no duplicate elements. + +Format: Same as `List`. + +==== Map + +A dictionary of keys to values. + +Format: `{length}{item_0}...{item_n}` + +Where: + +- `{length}` is an unsigned 4-byte two's complement integer describing the length of the map. +- `{item_0}...{item_n}` are the items of the map. `{item_i}` is sequence of 2 fully qualified typed values one representing the key and the following representing the value, each composed composed of `{type_code}{value}`. + +==== UUID + +A 128-bit universally unique identifier. + +Format: 16 bytes representing the uuid. + +Example + +- `00 11 22 33 44 55 66 77 88 99 aa bb cc dd ee ff`: Uuid 00112233-4455-6677-8899-aabbccddeeff. + +==== Edge + +Format: `{id}{label}{inVLabel}{outVLabel}{inV}{outV}{properties}` + +Where: + +- `{id}` is a fully qualified typed value composed of `{type_code}{value}`. +- `{label}` is a `String` value. +- `{inVLabel}` is a `String` value. +- `{outVLabel}` is a `String` value. +- `{inV}` is a fully qualified typed value composed of `{type_code}{value}`. +- `{outV}` is a fully qualified typed value composed of `{type_code}{value}`. +- `{properties}` is a `List` of `VertexProperty` items. + +==== Path + +Format: `{labels}{objects}` + +Where: + +- `{labels}` is a `List` in which each item is a `Set` of `String`. +- `{objects}` is a `List` of fully qualified typed values. + +==== Property + +Format: `{key}{value}` + +Where: + +- `{key}` is a `String` value. +- `{value}` is a fully qualified typed value composed of `{type_code}{value}`. + +==== TinkerGraph + +A collection of vertices and edges. + +Format: `{vertices}{edges}` + +Where: + +- `{vertices}` is a `List` in which each item is a `Vertex`. +- `{edges}` is a `List` in which each item is a `Edge`. + +==== Vertex + +Format: `{id}{label}{properties}` + +Where: + +- `{id}` is a fully qualified typed value composed of `{type_code}{value}`. +- `{label}` is a `String` value. +- `{properties}` is a `List` of `VertexProperty` items. + +==== VertexProperty + +Format: `{id}{label}{value}` + +Where: + +- `{id}` is a fully qualified typed value composed of `{type_code}{value}`. +- `{label}` is a `String` value. +- `{value}` is a fully qualified typed value composed of `{type_code}{value}`. + +==== Barrier + +Format: a single `String` representing the enum value. + +==== Binding + +Format: `{key}{value}` + +Where: + +- `{key}` is a `String` value. +- `{value}` is a fully qualified typed value composed of `{type_code}{value}`. + +==== Bytecode + +Format: `{steps_length}{step_0}...{step_n}{sources_length}{source_0}...{source_n}` + +Where: + +* `{steps_length}` is an `Int` value describing the amount of steps. +* `{step_i}` is composed of `{name}{values_length}{value_0}...{value_n}`, where: +** `{name}` is a String. +** `{values_length}` is an `Int` describing the amount values. +** `{value_i}` is a fully qualified typed value composed of `{type_code}{value}` describing the step argument. +* `{sources_length}` is an `Int` value describing the amount of source instructions. +* `{source_i}` is composed of `{name}{values_length}{value_0}...{value_n}`, where: +** `{name}` is a `String`. +** `{values_length}` is an `Int` describing the amount values. +** `{value_i}` is a fully qualified typed value composed of `{type_code}{value}`. + +==== Cardinality + +Format: a single `String` representing the enum value. + +==== Column + +Format: a single `String` representing the enum value. + +==== Direction + +Format: a single `String` representing the enum value. + +==== Operator + +Format: a single `String` representing the enum value. + +==== Order + +Format: a single `String` representing the enum value. + +==== Pick + +Format: a single `String` representing the enum value. + +==== Pop + +Format: a single `String` representing the enum value. + +==== Lambda + +Format: `{language}{script}{arguments_length}` +Where: + +- `{language}` is a `String`. +- `{script}` is a `String`. +- `{arguments_length}` is an `Int`. + +==== P + +Format: `{predicate}{values_length}{value_0}...{value_n}` + +Where: + +- `{name}` is a String. +- `{values_length}` is an `Int` describing the amount values. +- `{value_i}` is a fully qualified typed value composed of `{type_code}{value}`. + +==== Scope + +Format: a single `String` representing the enum value. + +==== T + +Format: a single `String` representing the enum value. + +==== Traverser + +Format: `{bulk}{value}` + +Where: + +- `{name}` is an `Int`. +- `{value}` is a fully qualified typed value composed of `{type_code}{value}`. + +==== BigDecimal + +Represents an arbitrary-precision signed decimal number, consisting of an arbitrary precision integer unscaled value and a 32-bit integer scale. + +Format: `{scale}{unscaled_value}` + +Where: + +- `{scale}` is an `Int`. +- `{unscaled_value}` is a `BigInteger`. + +==== BigInteger + +A variable-length two's complement encoding of a signed integer. + +Example values + +- `00`: Integer 0. +- `01`: Integer 1. +- `127`: Integer 7f. +- `00 80`: Integer 128. +- `ff`: Integer -1. +- `80`: Integer -128. +- `ff 7f`: Integer -129. + +==== Byte + +An unsigned 8-bit integer. + +==== ByteBuffer + +Format: `{length}{value}` + +Where: + +- `{length}` is an `Int` representing the amount of bytes contained in the value. +- `{value}` sequence of bytes. + +==== Short + +Format: 2-byte two's complement integer. + +==== Custom + +A custom type, represented with a name and a blob value. + +Format: `{name}{blob}` + +Where: + +- `{name}` is `String`. +- `{blob}` is a `ByteBuffer`. + +==== Char + +Format: one to four bytes representing a single UTF8 char, according to the Unicode standard. + +For characters `0x00`-`0x7F`, UTF-8 encodes the character as a single byte. + +For characters `0x80`-`0x7FF`, UTF-8 uses 2 bytes: the first byte is binary `110` followed by the 5 high bits of the character, while the second byte is binary 10 followed by the 6 low bits of the character. + +The 3 and 4-byte encodings are similar to the 2-byte encoding, except that the first byte of the 3-byte encoding starts with `1110` and the first byte of the 4-byte encoding starts with `11110`. + +Example values (hex bytes) + +- `97`: Character 'a'. +- `c2 a2`: Character '¢'. +- `e2 82 ac`: Character '€' + +==== Duration + +A time-based amount of time. + +Format: `{seconds}{nanos}` + +Where: + +- `{seconds}` is a `Long`. +- `{nanos}` is an `Int`. + +==== InetAddress + +Format: Same as `ByteBuffer`. + +==== Instant + +An instantaneous point on the time-line. + +Format: `{seconds}{nanos}` + +Where: + +- `{seconds}` is a `Long`. +- `{nanos}` is an `Int`. + +==== LocalDate + +A date without a time-zone in the ISO-8601 calendar system. + +Format: `{year}{month}{day}` + +Where: + +- `{year}` is an `Int` from -999,999,999 to 999,999,999. +- `{month}` is a `Byte` to represent, from 1 (January) to 12 (December) +- `{day}` is a `Byte` from 1 to 31. + +==== LocalDateTime + +Format: `{date}{time}` + +Where: + +- `{date}` is `LocalDate`. +- `{time}` is a `LocalTime`. + +==== LocalTime +A time without a time-zone in the ISO-8601 calendar system. + +Format: An 8 byte two's complement long representing nanoseconds since midnight. + +Valid values are in the range 0 to 86399999999999 + +==== MonthDay + +A month-day in the ISO-8601 calendar system. + +Format: `{month}{day}` + +Where: + +- `{month}` is `Byte` value from 1 to 12. +- `{day}` is `Byte` value from 1 to 31. + +==== OffsetDateTime + +A date-time with an offset from UTC/Greenwich in the ISO-8601 calendar system, such as 2007-12-03T10:15:30+01:00. + +Format: `{local_date_time}{offset}` + +Where: + +- `{local_date_time}` is `LocalDateTime`. +- `{offset}` is `ZoneOffset`. + +==== OffsetTime + +A time with an offset from UTC/Greenwich in the ISO-8601 calendar system, such as 10:15:30+01:00. + +Format: `{local_time}{offset}` + +Where: + +- `{local_time}` is `LocalTime`. +- `{offset}` is `ZoneOffset`. + +==== Period + +A date-based amount of time in the ISO-8601 calendar system, such as '2 years, 3 months and 4 days'. + +Format: `{years}{month}{days}` + +Where: + +`{years}`, `{month}` and `{days}` are `Int` values. + +==== Year + +A year in the ISO-8601 calendar system, such as 2018. + +Format: An `Int` representing the years. + +==== YearMonth + +A year-month in the ISO-8601 calendar system, such as 2007-12. + +Format: `{year}{month}` + +Where: + +- `{year}` is an `Int`. +- `{month} is a `Byte` from 1 to 12. + +==== ZonedDateTime + +A date-time with a time-zone in the ISO-8601 calendar system. + +Format: `{local_date_time}{zone_id}` + +Where: + +- `{local_date_time}` is `LocalDateTime`. +- `{time}` is a `LocalTime`. + +==== ZoneOffset + +A time-zone offset from Greenwich/UTC, such as +02:00. + +Format: An `Int` representing total zone offset in seconds. diff --git a/docs/src/dev/io/index.asciidoc b/docs/src/dev/io/index.asciidoc index 5620fc3..73f5d5c 100644 --- a/docs/src/dev/io/index.asciidoc +++ b/docs/src/dev/io/index.asciidoc @@ -34,3 +34,5 @@ include::graphml.asciidoc[] include::graphson.asciidoc[] include::gryo.asciidoc[] + +include::graphbinary.asciidoc[] \ No newline at end of file