Re: [swift-evolution] [Proposal] Foundation Swift Archival & Serialization

Brent Royal-Gordon via swift-evolution Wed, 15 Mar 2017 21:19:50 -0700

> On Mar 15, 2017, at 3:40 PM, Itai Ferber via swift-evolution 
> <swift-evolution@swift.org> wrote:
> 
> Hi everyone,
> 
> The following introduces a new Swift-focused archival and serialization API 
> as part of the Foundation framework. We’re interested in improving the 
> experience and safety of performing archival and serialization, and are happy 
> to receive community feedback on this work.


Thanks to all of the people who've worked on this. It's a great proposal.

> Specifically:
> 
>       • It aims to provide a solution for the archival of Swift struct and 
> enum types

I see a lot of discussion here of structs and classes, and an example of an 
enum without associated values, but I don't see any discussion of enums with 
associated values. Can you sketch how you see people encoding such types?

For example, I assume that `Optional` is going to get some special treatment, 
but if it doesn't, how would you write its `encode(to:)` method?

What about a more complex enum, like the standard library's 
`UnicodeDecodingResult`:

        enum UnicodeDecodingResult {
                case emptyInput
                case error
                case scalarValue(UnicodeScalar)
        }

Or, say, an `Error`-conforming type from one of my projects:

        public enum SQLError: Error {
            case connectionFailed(underlying: Error)
            case executionFailed(underlying: Error, statement: SQLStatement)
            case noRecordsFound(statement: SQLStatement)
            case extraRecordsFound(statement: SQLStatement)
            case columnInvalid(underlying: Error, key: ColumnSpecifier, 
statement: SQLStatement)
            case valueInvalid(underlying: Error, key: AnySQLColumnKey, 
statement: SQLStatement)
        }

(You can assume that all the types in the associated values are `Codable`.)

I don't necessarily assume that the compiler should write conformances to these 
sorts of complicated enums for me (though that would be nice!); I'm just 
wondering what the designers of this feature envision people doing in cases 
like these.

>       • protocol Codable: Adopted by types to opt into archival. Conformance 
> may be automatically derived in cases where all properties are also Codable.

Have you given any consideration to supporting types which only need to decode? 
That seems likely to be common when interacting with web services.

>       • protocol CodingKey: Adopted by types used as keys for keyed 
> containers, replacing String keys with semantic types. Conformance may be 
> automatically derived in most cases.
>       • protocol Encoder: Adopted by types which can take Codable values and 
> encode them into a native format.
>               • class KeyedEncodingContainer<Key : CodingKey>: Subclasses of 
> this type provide a concrete way to store encoded values by CodingKey. Types 
> adopting Encoder should provide subclasses of KeyedEncodingContainer to vend.
>               • protocol SingleValueEncodingContainer: Adopted by types which 
> provide a concrete way to store a single encoded value. Types adopting 
> Encoder should provide types conforming to SingleValueEncodingContainer to 
> vend (but in many cases will be able to conform to it themselves).
>       • protocol Decoder: Adopted by types which can take payloads in a 
> native format and decode Codable values out of them.
>               • class KeyedDecodingContainer<Key : CodingKey>: Subclasses of 
> this type provide a concrete way to retrieve encoded values from storage by 
> CodingKey. Types adopting Decoder should provide subclasses of 
> KeyedDecodingContainer to vend.
>               • protocol SingleValueDecodingContainer: Adopted by types which 
> provide a concrete way to retrieve a single encoded value from storage. Types 
> adopting Decoder should provide types conforming to 
> SingleValueDecodingContainer to vend (but in many cases will be able to 
> conform to it themselves).

I do want to note that, at this point in the proposal, I was sort of thinking 
you'd gone off the deep end modeling this. Having read the whole thing, I now 
understand what all of these things do, but this really is a very large 
subsystem. I think it's worth asking if some of these types can be eliminated 
or combined.

> Structured types (i.e. types which encode as a collection of properties) 
> encode and decode their properties in a keyed manner. Keys may be 
> String-convertible or Int-convertible (or both),

What does "may" mean here? That, at runtime, the encoder will test for the 
preferred key type and fall back to the other one? That seems a little bit 
problematic.

I'm also quite worried about how `Int`-convertible keys will interact with code 
synthesis. The obvious way to assign integers—declaration order—would mean that 
reordering declarations would invisibly break archiving, potentially (if the 
types were compatible) without breaking anything in an error-causing way even 
at runtime. You could sort the names, but then adding a new property would 
shift the integers of the properties "below" it. You could hash the names, but 
then there's no obvious relationship between the integers and key cases.

At the same time, I also think that using arbitrary integers is a poor match 
for ordering. If you're making an ordered container, you don't want arbitrary 
integers wrapped up in an abstract type. You want adjacent integers forming 
indices of an eventual array. (Actually, you may not want indices at all—you 
may just want to feed elements in one at a time!)

So I would suggest the following changes:

* The coding key always converts to a string. That means we can eliminate the 
`CodingKey` protocol and instead use `RawRepresentable where RawValue == 
String`, leveraging existing infrastructure. That also means we can call the 
`CodingKeys` associated type `CodingKey` instead, which is the correct name for 
it—we're not talking about an `OptionSet` here.

* If, to save space on disk, you want to also people to use integers as the 
serialized representation of a key, we might introduce a parallel 
`IntegerCodingKey` protocol for that, but every `CodingKey` type should map to 
`String` first and foremost. Using a protocol here ensures that it can be 
statically determined at compile time whether a type can be encoded with 
integer keys, so the compiler can select an overload of `container(keyedBy:)`.

* Intrinsically ordered data is encoded as a single value containers of type 
`Array<Codable>`. (I considered having an `orderedContainer()` method and type, 
but as I thought about it, I couldn't think of an advantage it would have over 
`Array`.)

>     /// Returns an encoding container appropriate for holding a single 
> primitive value.
>     ///
>     /// - returns: A new empty single value container.
>     /// - precondition: May not be called after a prior 
> `self.container(keyedBy:)` call.
>     /// - precondition: May not be called after a value has been encoded 
> through a previous `self.singleValueContainer()` call.
>     func singleValueContainer() -> SingleValueEncodingContainer

Speaking of which, I'm not sure about single value containers. My first 
instinct is to say that methods should be moved from them to the `Encoder` 
directly, but that would probably cause code duplication. But...isn't there 
already duplication between the `SingleValue*Container` and the 
`Keyed*Container`? Why, yes, yes there is. So let's talk about that.

>     open func encode<Value : Codable>(_ value: Value?, forKey key: Key) throws
>     open func encode(_ value: Bool?,   forKey key: Key) throws
>     open func encode(_ value: Int?,    forKey key: Key) throws
>     open func encode(_ value: Int8?,   forKey key: Key) throws
>     open func encode(_ value: Int16?,  forKey key: Key) throws
>     open func encode(_ value: Int32?,  forKey key: Key) throws
>     open func encode(_ value: Int64?,  forKey key: Key) throws
>     open func encode(_ value: UInt?,   forKey key: Key) throws
>     open func encode(_ value: UInt8?,  forKey key: Key) throws
>     open func encode(_ value: UInt16?, forKey key: Key) throws
>     open func encode(_ value: UInt32?, forKey key: Key) throws
>     open func encode(_ value: UInt64?, forKey key: Key) throws
>     open func encode(_ value: Float?,  forKey key: Key) throws
>     open func encode(_ value: Double?, forKey key: Key) throws
>     open func encode(_ value: String?, forKey key: Key) throws
>     open func encode(_ value: Data?,   forKey key: Key) throws

Wait, first, a digression for another issue: I'm concerned that, if you look at 
the `decode` calls, there are plain `decode(…)` calls which throw if a `nil` 
was originally encoded and `decodeIfPresent` calls which return optional. The 
result is, essentially, that the encoding system eats a level of optionality 
for its own purposes—seemingly good, straightforward-looking code like this:

        struct MyRecord: Codable {
                var id: Int?
                …
                
                func encode(to encoder: Encoder) throws {
                        let container = encoder.container(keyedBy: 
CodingKey.self)
                        try container.encode(id, forKey: .id)
                        …
                }
                
                init(from decoder: Decoder) throws {
                        let container = decoder.container(keyedBy: 
CodingKey.self)
                        id = try container.decode(Int.self, forKey: .id)
                        …
                }
        }

Will crash. (At least, I assume that's what will happen.)

I think we'd be better off having `encode(_:forKey:)` not take an optional; 
instead, we should have `Optional` conform to `Codable` and behave in some 
appropriate way. Exactly how to implement it might be a little tricky because 
of nested optionals; I suppose a `none` would have to measure how many levels 
of optionality there are between it and a concrete value, and then encode that 
information into the data. I think our `NSNull` bridging is doing something 
broadly similar right now.

I know that this is not the design you would use in Objective-C, but Swift uses 
`Optional` differently from how Objective-C uses `nil`. Swift APIs consider 
`nil` and absent to be different things; where they can both occur, good Swift 
APIs use doubled-up Optionals to be precise about the situation. I think the 
design needs to be a little different to accommodate that.

Now, back to the `SingleValue*Container`/`Keyed*Container` issue. The list 
above is, frankly, gigantic. You specify a *lot* of primitives in 
`Keyed*Container`; there's a lot to implement here. And then you have to 
implement it all *again* in `SingleValue*Container`:

>     func encode(_ value: Bool) throws
>     func encode(_ value: Int) throws
>     func encode(_ value: Int8) throws
>     func encode(_ value: Int16) throws
>     func encode(_ value: Int32) throws
>     func encode(_ value: Int64) throws
>     func encode(_ value: UInt) throws
>     func encode(_ value: UInt8) throws
>     func encode(_ value: UInt16) throws
>     func encode(_ value: UInt32) throws
>     func encode(_ value: UInt64) throws
>     func encode(_ value: Float) throws
>     func encode(_ value: Double) throws
>     func encode(_ value: String) throws
>     func encode(_ value: Data) throws


This is madness.

Look, here's what we do. You have two types: `Keyed*Container` and 
`Value*Container`. `Keyed*Container` looks something like this:

        final public class KeyedEncodingContainer<EncoderType: Encoder, Key: 
RawRepresentable> where Key.RawValue == String {
            public let encoder: EncoderType
            
            public let codingKeyContext: [RawRepresentable where RawValue == 
String]
            // Hmm, we might need a CodingKey protocol after all.
            // Still, it could just be `protocol CodingKey: RawRepresentable 
where RawValue == String {}`
            
            subscript (key: Key) -> ValueEncodingContainer {
                return encoder.makeValueEncodingContainer(forKey: key)
            }
        }

It's so simple, it doesn't even need to be specialized. You might even be able 
to get away with combining the encoding and decoding variants if the subscript 
comes from a conditional extension. `Value*Container` *does* need to be 
specialized; it looks like this (modulo the `Optional` issue I mentioned above):

        public protocol ValueEncodingContainer {
            func encode<Value : Codable>(_ value: Value?, forKey key: Key) 
throws
            func encode(_ value: Bool?) throws
            func encode(_ value: Int?) throws
            func encode(_ value: Int8?) throws
            func encode(_ value: Int16?) throws
            func encode(_ value: Int32?) throws
            func encode(_ value: Int64?) throws
            func encode(_ value: UInt?) throws
            func encode(_ value: UInt8?) throws
            func encode(_ value: UInt16?) throws
            func encode(_ value: UInt32?) throws
            func encode(_ value: UInt64?) throws
            func encode(_ value: Float?) throws
            func encode(_ value: Double?) throws
            func encode(_ value: String?) throws
            func encode(_ value: Data?) throws
            
            func encodeWeak<Object : AnyObject & Codable>(_ object: Object?) 
throws
            
            var codingKeyContext: [CodingKey]
        }

And use sites would look like:

        func encode(to encoder: Encoder) throws {
                let container = encoder.container(keyedBy: CodingKey.self)
                try container[.id].encode(id)
                try container[.name].encode(name)
                try container[.birthDate].encode(birthDate)
        }

Decoding is slightly tricker. You could either make the subscript `Optional`, 
which would be more like `Dictionary` but would be inconsistent with `Encoder` 
and would give the "never force-unwrap anything" crowd conniptions, or you 
could add a `contains()` method to `ValueDecodingContainer` and make 
`decode(_:)` throw. Either one works.

Also, another issue with the many primitives: swiftc doesn't really like large 
overload sets very much. Could this set be reduced? I'm not sure what the logic 
was in choosing these particular types, but many of them share protocols in 
Swift—you might get away with just this:

        public protocol ValueEncodingContainer {
            func encode<Value : Codable>(_ value: Value?, forKey key: Key) 
throws
            func encode(_ value: Bool?,   forKey key: Key) throws
            func encode<Integer: SignedInteger>(_ value: Integer?, forKey key: 
Key) throws
            func encode<UInteger: UnsignedInteger>(_ value: UInteger?, forKey 
key: Key) throws
            func encode<Floating: FloatingPoint>(_ value: Floating?, forKey 
key: Key) throws
            func encode(_ value: String?, forKey key: Key) throws
            func encode(_ value: Data?,   forKey key: Key) throws
            
            func encodeWeak<Object : AnyObject & Codable>(_ object: Object?, 
forKey key: Key) throws
            
            var codingKeyContext: [CodingKey]
        }

To accommodate my previous suggestion of using arrays to represent ordered 
encoded data, I would add one more primitive:

            func encode(_ values: [Codable]) throws

(Also, is there any sense in adding `Date` to this set, since it needs special 
treatment in many of our formats?)

> Encoding Container Types
> 
> For some types, the container into which they encode has meaning. Especially 
> when coding for a specific output format (e.g. when communicating with a JSON 
> API), a type may wish to explicitly encode as an array or a dictionary:
> 
> // Continuing from before
> public protocol Encoder {
>     func container<Key : CodingKey>(keyedBy keyType: Key.Type, type 
> containerType: EncodingContainerType) -> KeyedEncodingContainer<Key>
> }
> 
> /// An `EncodingContainerType` specifies the type of container an `Encoder` 
> should use to store values.
> public enum EncodingContainerType {
>     /// The `Encoder`'s preferred container type; equivalent to either 
> `.array` or `.dictionary` as appropriate for the encoder.
>     case `default`
>     
>     /// Explicitly requests the use of an array to store encoded values.
>     case array
> 
>     /// Explicitly requests the use of a dictionary to store encoded values.
>     case dictionary
> }

I see what you're getting at here, but I don't think this is fit for purpose, 
because arrays are not simply dictionaries with integer keys—their elements are 
adjacent and ordered. See my discussion earlier about treating inherently 
ordered containers as simply single-value `Array`s.

> Nesting
> 
> In practice, some types may also need to control how data is nested within 
> their container, or potentially nest other containers within their container. 
> Keyed containers allow this by returning nested containers of differing key 
> types:

[snip]

> This can be common when coding against specific external data representations:
> 
> // User type for interfacing with a specific JSON API. JSON API expects 
> encoding as {"id": ..., "properties": {"name": ..., "timestamp": ...}}. Swift 
> type differs from encoded type, and encoding needs to match a spec:

This comes very close to—but doesn't quite—address something else I'm concerned 
about. What's the preferred way to handle differences in serialization to 
different formats?

Here's what I mean: Suppose I have a BlogPost model, and I can both fetch and 
post BlogPosts to a cross-platform web service, and store them locally. But 
when I fetch and post remotely, I ned to conform to the web service's formats; 
when I store an instance locally, I have a freer hand in designing my storage, 
and perhaps need to store some extra metadata. How do you imagine handling that 
sort of situation? Is the answer simply that I should use two different types?

> To remedy both of these points, we adopt a new convention for 
> inheritance-based coding — encoding super as a sub-object of self:

[snip]

>         try super.encode(to: container.superEncoder())

This seems like a good idea to me. However, it brings up another point: What 
happens if you specify a superclass of the originally encoded class? In other 
words:

        let joe = Employee(…)
        let payload = try SomeEncoder().encode(joe)
        …
        let someone = try SomeDecoder().decode(Person.self, from: payload)
        print(type(of: someone))                // Person, Employee, or does 
`decode(_:from:)` fail?

> The encoding container types offer overloads for working with and processing 
> the API's primitive types (String, Int, Double, etc.). However, for ease of 
> implementation (both in this API and others), it can be helpful for these 
> types to conform to Codable themselves. Thus, along with these overloads, we 
> will offer Codable conformance on these types:

[snip]

> Since Swift's function overload rules prefer more specific functions over 
> generic functions, the specific overloads are chosen where possible (e.g. 
> encode("Hello, world!", forKey: .greeting) will choose encode(_: String, 
> forKey: Key) over encode<T : Codable>(_: T, forKey: Key)). This maintains 
> performance over dispatching through the Codable existential, while allowing 
> for the flexibility of fewer overloads where applicable.

How important is this performance? If the answer is "eh, not really that much", 
I could imagine a setup where every "primitive" type eventually represents 
itself as `String` or `Data`, and each `Encoder`/`Decoder` can use dynamic type 
checks in `encode(_:)`/`decode(_:)` to define whatever "primitives" it wants 
for its own format.

* * *

One more thing. In Alternatives Considered, you present two designs—#2 and 
#3—where you generate a separate instance which represents the type in a fairly 
standardized way for the encoder to examine.

This design struck me as remarkably similar to the reflection system and its 
`Mirror` type, which is also a separate type describing an original instance. 
My question was: Did you look at the reflection system when you were building 
this design? Do you think there might be anything that can be usefully shared 
between them?

Thank you for your attention. I hope this was helpful!

-- 
Brent Royal-Gordon
Architechies

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Re: [swift-evolution] [Proposal] Foundation Swift Archival & Serialization

Reply via email to