Thanks for the feedback Gregory. You raise a lot of good points; this is going to help me write a clearer pep. (0) Pretty much. They can be used as refinement for more advanced type checkers (e.g.: for linear types).
(1a) I knew about the postponed evaluation but hadn't read PEP-563 yet. Thx for the heads up. (1b) I think you think you meant `Intersection` type rather than `Union` type. A value of type `Intersection[A, B]` is both of type `A` and of type `B`. If we had Intersection and allowed to passed arguments decorated with NoTypeCheck then we good do without `Annotation`. This could be a bit messy though because you'd probably want to make sure that NoTypeCheck only appears in `Intersection`. Another advantage of `Annotated` is that there's a clear "principal" type. So you can make calls to constructors transparent. e.g.: class A: .... A_with_info = Annotated[A, ...] A_with_info(5) # create the value A(5) (2a) and (2b): I don't have any strong feelings when it comes to syntax; I tried to be consistent with the standard library (and maybe I got it wrong). My understanding [] is used to create a new type whereas () is used to create a new value: > Deque(range(2)) deque([0, 1]) > Deque[int] typing.Deque[int] On Thu, 17 Jan 2019 at 19:35 Gregory P. Smith <g...@krypto.org> wrote: > On Thu, Jan 17, 2019 at 2:34 PM Till <till.varoqu...@gmail.com> wrote: > >> We started a discussion in https://github.com/python/typing/issues/600 >> about adding support for extra annotations in the typing module. >> >> Since this is probably going to turn into a PEP I'm transferring the >> discussion here to have more visibility. >> >> The document below has been modified a bit from the one in GH to reflect >> the feedback I got: >> >> + Added a small blurb about how ``Annotated`` should support being used >> as an alias >> >> Things that were raised but are not reflected in this document: >> >> + The dataclass example is confusing. I kept it for now because >> dataclasses often come up in conversations about why we might want to >> support annotations in the typing module. Maybe I should rework the >> section. >> >> + `...` as a valid parameter for the first argument (if you want to add >> an annotation but use the type inferred by your type checker). This is an >> interesting idea, it's probably worth adding support for it if and only if >> we decide to support in other places. (c.f.: >> https://github.com/python/typing/issues/276) >> >> Thanks, >> >> Add support for external annotations in the typing module >> ========================================================== >> >> We propose adding an ``Annotated`` type to the typing module to decorate >> existing types with context-specific metadata. Specifically, a type ``T`` >> can be annotated with metadata ``x`` via the typehint ``Annotated[T, x]``. >> This metadata can be used for either static analysis or at runtime. If a >> library (or tool) encounters a typehint ``Annotated[T, x]`` and has no >> special logic for metadata ``x``, it should ignore it and simply treat the >> type as ``T``. Unlike the `no_type_check` functionality that current exists >> in the ``typing`` module which completely disables typechecking annotations >> on a function or a class, the ``Annotated`` type allows for both static >> typechecking of ``T`` (e.g., via MyPy or Pyre, which can safely ignore >> ``x``) together with runtime access to ``x`` within a specific >> application. We believe that the introduction of this type would address a >> diverse set of use cases of interest to the broader Python community. >> >> Motivating examples: >> ~~~~~~~~~~~~~~~~~~~~ >> >> reading binary data >> +++++++++++++++++++ >> >> The ``struct`` module provides a way to read and write C structs directly >> from their byte representation. It currently relies on a string >> representation of the C type to read in values:: >> >> record = b'raymond \x32\x12\x08\x01\x08' >> name, serialnum, school, gradelevel = unpack('<10sHHb', record) >> >> The struct documentation [struct-examples]_ suggests using a named tuple >> to unpack the values and make this a bit more tractable:: >> >> from collections import namedtuple >> Student = namedtuple('Student', 'name serialnum school gradelevel') >> Student._make(unpack('<10sHHb', record)) >> # Student(name=b'raymond ', serialnum=4658, school=264, gradelevel=8) >> >> >> However, this recommendation is somewhat problematic; as we add more >> fields, it's going to get increasingly tedious to match the properties in >> the named tuple with the arguments in ``unpack``. >> >> Instead, annotations can provide better interoperability with a type >> checker or an IDE without adding any special logic outside of the >> ``struct`` module:: >> >> from typing import NamedTuple >> UnsignedShort = Annotated[int, struct.ctype('H')] >> SignedChar = Annotated[int, struct.ctype('b')] >> >> @struct.packed >> class Student(NamedTuple): >> # MyPy typechecks 'name' field as 'str' >> name: Annotated[str, struct.ctype("<10s")] >> serialnum: UnsignedShort >> school: SignedChar >> gradelevel: SignedChar >> >> # 'unpack' only uses the metadata within the type annotations >> Student.unpack(record)) >> # Student(name=b'raymond ', serialnum=4658, school=264, gradelevel=8) >> >> >> >> dataclasses >> ++++++++++++ >> >> Here's an example with dataclasses [dataclass]_ that is a problematic >> from the typechecking standpoint:: >> >> from dataclasses import dataclass, field >> >> @dataclass >> class C: >> myint: int = 0 >> # the field tells the @dataclass decorator that the default action in >> the >> # constructor of this class is to set "self.mylist = list()" >> mylist: List[int] = field(default_factory=list) >> >> Even though one might expect that ``mylist`` is a class attribute >> accessible via ``C.mylist`` (like ``C.myint`` is) due to the assignment >> syntax, that is not the case. Instead, the ``@dataclass`` decorator strips >> out the assignment to this attribute, leading to an ``AttributeError`` upon >> access:: >> >> C.myint # Ok: 0 >> C.mylist # AttributeError: type object 'C' has no attribute 'mylist' >> >> >> This can lead to confusion for newcomers to the library who may not >> expect this behavior. Furthermore, the typechecker needs to understand the >> semantics of dataclasses and know to not treat the above example as an >> assignment operation in (which translates to additional complexity). >> >> It makes more sense to move the information contained in ``field`` to an >> annotation:: >> >> @dataclass >> class C: >> myint: int = 0 >> mylist: Annotated[List[int], field(default_factory=list)] >> >> # now, the AttributeError is more intuitive because there is no >> assignment operator >> C.mylist # AttributeError >> >> # the constructor knows how to use the annotations to set the 'mylist' >> attribute >> c = C() >> c.mylist # [] >> >> The main benefit of writing annotations like this is that it provides a >> way for clients to gracefully degrade when they don't know what to do with >> the extra annotations (by just ignoring them). If you used a typechecker >> that didn't have any special handling for dataclasses and the ``field`` >> annotation, you would still be able to run checks as though the type were >> simply:: >> >> class C: >> myint: int = 0 >> mylist: List[int] >> >> >> lowering barriers to developing new types >> +++++++++++++++++++++++++++++++++++++++++ >> >> Typically when adding a new type, we need to upstream that type to the >> typing module and change MyPy [MyPy]_, PyCharm [PyCharm]_, Pyre [Pyre]_, >> pytype [pytype]_, etc. This is particularly important when working on >> open-source code that makes use of our new types, seeing as the code would >> not be immediately transportable to other developers' tools without >> additional logic (this is a limitation of MyPy plugins [MyPy-plugins]_), >> which allow for extending MyPy but would require a consumer of new >> typehints to be using MyPy and have the same plugin installed). As a >> result, there is a high cost to developing and trying out new types in a >> codebase. Ideally, we should be able to introduce new types in a manner >> that allows for graceful degradation when clients do not have a custom MyPy >> plugin, which would lower the barrier to development and ensure some degree >> of backward compatibility. >> >> For example, suppose that we wanted to add support for tagged unions >> [tagged-unions]_ to Python. One way to accomplish would be to annotate >> ``TypedDict`` in Python such that only one field is allowed to be set:: >> >> Currency = Annotated( >> TypedDict('Currency', {'dollars': float, 'pounds': float}, >> total=False), >> TaggedUnion, >> ) >> >> This is a somewhat cumbersome syntax but it allows us to iterate on this >> proof-of-concept and have people with non-patched IDEs work in a codebase >> with tagged unions. We could easily test this proposal and iron out the >> kinks before trying to upstream tagged union to `typing`, MyPy, etc. >> Moreover, tools that do not have support for parsing the ``TaggedUnion`` >> annotation would still be able able to treat `Currency` as a ``TypedDict``, >> which is still a close approximation (slightly less strict). >> >> >> Details of proposed changes to ``typing`` >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> >> syntax >> ++++++ >> >> ``Annotated`` is parameterized with a type and an arbitrary list of >> Python values that represent the annotations. Here are the specific details >> of the syntax: >> >> * The first argument to ``Annotated`` must be a valid ``typing`` type or >> ``...`` (to use the infered type). >> >> * Multiple type annotations are supported (Annotated supports variadic >> arguments): ``Annotated[int, ValueRange(3, 10), ctype("char")]`` >> >> * ``Annotated`` must be called with at least two arguments >> (``Annotated[int]`` is not valid) >> >> * The order of the annotations is preserved and matters for equality >> checks:: >> >> Annotated[int, ValueRange(3, 10), ctype("char")] != \ >> Annotated[int, ctype("char"), ValueRange(3, 10)] >> >> * Nested ``Annotated`` types are flattened, with metadata ordered >> starting with the innermost annotation:: >> >> Annotated[Annotated[int, ValueRange(3, 10)], ctype("char")] ==\ >> Annotated[int, ValueRange(3, 10), ctype("char")] >> >> * Duplicated annotations are not removed: ``Annotated[int, ValueRange(3, >> 10)] != Annotated[int, ValueRange(3, 10), ValueRange(3, 10)]`` >> >> * ``Annotation`` can be used a higher order aliases:: >> >> Typevar T = ... >> Vec = Annotated[List[Tuple[T, T]], MaxLen(10)] >> # Vec[int] == `Annotated[List[Tuple[int, int]], MaxLen(10)] >> >> >> >> consuming annotations >> ++++++++++++++++++++++ >> >> Ultimately, the responsibility of how to interpret the annotations (if at >> all) is the responsibility of the tool or library encountering the >> `Annotated` type. A tool or library encountering an `Annotated` type can >> scan through the annotations to determine if they are of interest (e.g., >> using `isinstance`). >> >> **Unknown annotations** >> When a tool or a library does not support annotations or encounters an >> unknown annotation it should just ignore it and treat annotated type as the >> underlying type. For example, if we were to add an annotation that is not >> an instance of `struct.ctype` to the annotation for name (e.g., >> `Annotated[str, 'foo', struct.ctype("<10s")]`), the unpack method should >> ignore it. >> >> **Namespacing annotations** >> We do not need namespaces for annotations since the class used by the >> annotations acts as a namespace. >> >> **Multiple annotations** >> It's up to the tool consuming the annotations to decide whether the >> client is allowed to have several annotations on one type and how to merge >> those annotations. >> >> Since the ``Annotated`` type allows you to put several annotations of >> the same (or different) type(s) on any node, the tools or libraries >> consuming those annotations are in charge of dealing with potential >> duplicates. For example, if you are doing value range analysis you might >> allow this:: >> >> T1 = Annotated[int, ValueRange(-10, 5)] >> T2 = Annotated[T1, ValueRange(-20, 3)] >> >> Flattening nested annotations, this translates to:: >> >> T2 = Annotated[int, ValueRange(-10, 5), ValueRange(-20, 3)] >> >> An application consuming this type might choose to reduce these >> annotations via an intersection of the ranges, in which case ``T2`` would >> be treated equivalently to ``Annotated[int, ValueRange(-10, 3)]``. >> >> An alternative application might reduce these via a union, in which >> case ``T2`` would be treated equivalently to ``Annotated[int, >> ValueRange(-20, 5)]``. >> >> Other applications may decide to not support multiple annotations and >> throw an exception. >> > > (0) Observaton / TL;DR - This PEP really seems to be more of a way to > declare multiple different arbitrary purposes annotations all attached to a > single callable/parameter/return/variable. So that static checkers > continue to work, but runtime user of annotations for whatever purpose can > also work at the same time. > > (1a) A struct.unpack supporting this will then need to evaluate > annotations in the outer scope at runtime due to our desired long term > PEP-563 `from __future__ import annotations` behavior. But that becomes > true of anything else wanting to use annotations at runtime so we should > really make a typing library function that does this for everyone to use. > > (1b) This proposal potentially expands the burden of type checkers... but > it shouldn't. They should be free to take the first type listed in an > Annotated[] block as the type of the variable, raising an error if someone > has listed multiple types (telling them to use Union[] for that). a static > checker *could* do useful things with multiple annotations it knows how > to handle, but I think it'd be unwise to implement that in any manner where > Annotated and Union could both be used for the same purpose. > > It makes me wonder if Annotated[] is meaningfully different from Union at > all. > > (2a) At first glance I don't like that the `T1 = Annotated[int, > SomeOtherInfo(23)]` syntax uses [] rather than () as it really is > constructing a runtime type. It isn't clear what should use [] and what > should use () so I'd suggest using () for everything there. > > (2b) Ask yourself: Why should SomeOtherInfo and ValueRange and > struct.ctype be () calls yet none of `Annotated[Union[List[bytes], > Dict[bytes, Optional[float]]]]` be calls? If you can come up with an > answer to that, why _should_ anyone need to know that? > > -gps >
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/