Re: [sqlalchemy] Docstring recommendations for SQLalchemy models?

Mike Bayer Sun, 24 Jan 2021 19:14:57 -0800


On Sun, Jan 24, 2021, at 7:53 PM, Samuel Marks wrote:
> Again, my goal isn't related to Sphinx (although generating nice 
> documentation is of course, a nice-to-have).
> 
> One advantage of having the generated SQL code be commented is that I could 
> write parsers that go from SQL files to SQL alchemy models, complete with 
> docs.


Right now people use sqlacodegen for this: 
https://pypi.org/project/sqlacodegen/    although it doesn't go from a plain 
SQL file first, the SQL would have to be run into a database first such as 
SQLite.


> 
> The issue with having loose strings in the middle of a class like you've done 
> is that there is no built-in semantics, and it'll break all existing linters. 

Docstrings beneath attributes are known as "attribute docstrings" and are 
explicitly mentioned in PEP-257 
https://www.python.org/dev/peps/pep-0257/#what-is-a-docstring  :

"String literals occurring elsewhere in Python code may also act as 
documentation. They are not recognized by the Python bytecode compiler and are 
not accessible as runtime object attributes (i.e. not assigned to __doc__), but 
two types of extra docstrings may be extracted by software tools:
 1. String literals occurring immediately after a simple assignment at the top 
level of a module, class, or __init__ method are called "attribute docstrings".
 2. String literals occurring immediately after another docstring are called 
"additional docstrings".
"

Pep-257 led to Pep 258 
https://www.python.org/dev/peps/pep-0258/#attribute-docstrings, which was 
rejected because it didn't become part of Python, but docutils is the standard 
tool used for documenting Python and is used to generate Python's own 
documentation:

"A string literal immediately following an assignment statement is interpreted 
by the docstring extraction machinery as the docstring of the target of the 
assignment statement, under the following conditions <etc>"

They are the only technique that is usable in all cases since the attribute may 
refer to a value such as ``None`` that's a singleton that does not have a 
``__doc__`` attribute.   All Python linters I'm familiar with accept this style 
of docstring, formatters such as Black will format them, and this is what I use 
in all my projects including SQLAlchemy itself, since if you have something 
like "my_constant = 1" as your attribute, that's your only option really other 
than using a pound sign comment, which seems to actually be more common from my 
googling around but also is not runtime-discoverable.

There is of course the disadvantage that the bytecode doesn't have access to 
them but this is a limitation of Python itself that most projects I'm familiar 
with have learned to live with.  

To the extent that people talk about "how should we document attributes?"  
using a string literal below the value is usually what you'll find, I googled a 
bit and found this styleguide for a major observatory for example: 
https://developer.lsst.io/python/numpydoc.html#py-docstring-attribute-constants-structure
  .  

> 
> 
> 
> 
> Sure, I could extend the linters and traverse the body of the class, 
> inferring out the semantics. But that would be incredibly non-standard. I'm 
> trying to generate code that could be considered *the standard*.

"attribute docstrings" are the standard as discussed in PEP-257, and there is 
no competing standard of any kind that I'm aware of.



> 
> So comment, doc, or an ivar/cvar [Sphinx treats these as the same: 
> https://www.sphinx-doc.org/en/master/usage/restructuredtext/domains.html#info-field-lists]
>  is what I'll generate to/from. I can generate them all, but that would be 
> hard for a human to maintain. The idea with the generated code is that it 
> needs to be human maintainable, as well as machine maintainable.

if you insist upon having runtime discoverability then you would use the "doc" 
parameter of Column and make use of a base declarative class that would copy 
out "doc" into "comment" on the Column objects, but from my end I would never 
call this a "standard".        There's no standard here unfortunately.   If I 
had to pick one I'd want string literals underneath attributes and then Python 
would include a means of collecting these in a similar way that PEP-526 
variable annotations are collected.

The lack of runtime discoverability for attribute docstrings is an "unsolved" 
problem in Python itself, and we can see as evidenced by Python's very own 
"class declarative" system called dataclasses described at pep-557 
https://www.python.org/dev/peps/pep-0557/ has absolutely nothing to say about 
how to inline-document declared attributes nor does the actual field() class 
(https://docs.python.org/3/library/dataclasses.html#dataclasses.field) have any 
notion of a "doc" parameter, though as always, you can put your own docstrings 
if you wanted into the "metadata" collection, where once again there's no 
"standard" here of any kind.    

This is definitely a bit of a problem in Python and I welcome efforts for a 
standard means of runtime discoverability of attribute assignments, but as a 
wise leader once said, it's "beyond my pay grade" as far as SQLAlchemy is 
concerned, anything we propose such as "Column(.. doc)" is just an API feature 
that projects may use if they wish for runtime discoverability, but in my own 
experience runtime discoverability is not necessary since the two purposes I 
have for docstrings are a. reading them in source code and b. generating docs 
with tools that know how to read them.




> 
> (all my code generators go both ways, so you can edit the generated [cli] 
> code and generate [class] code from it, and edit the [class] code and 
> generate [cli] code from it)
> 
> Samuel Marks
> Charity <https://sydneyscientific.org/> | consultancy <https://offscale.io/> 
> | open-source <https://github.com/offscale> | LinkedIn 
> <https://linkedin.com/in/samuelmarks>
> 
> 
> On Mon, Jan 25, 2021 at 10:59 AM Mike Bayer <mike...@zzzcomputing.com> wrote:
>> __
>> 
>> 
>> On Sun, Jan 24, 2021, at 5:20 PM, Samuel Marks wrote:
>>> Dear Mike,
>>> 
>>> My tool works at the AST level, and converts between:
>>>  * different docstring formats;
>>>  * having types in docstring or explicitly annotated;
>>>  * argparse parser augmenting function, class [plain old python class], 
>>> methods/functions
>>> The next step is to add support for SQLalchemy models, routes, and tests.
>>> As you saw from my example code above, the duplication in SQLalchemy is 
>>> intense.
>>> 
>>> Columns can be documented in the docstring, and/or on a column itself with 
>>> `comment` and/or `doc`.
>>> 
>>> So if I'm going to generate these SQLalchemy models, and generate classes 
>>> &etc. from these SQLalchemy models, then I'll need a clean, consistent way 
>>> of documenting each model.
>>> 
>>> What is that way?
>> 
>> the "comment" field applies to the DDL rendered to the database and is 
>> separate from the docstring that would be present in the ORM model.    So 
>> there is not currently any means for these to be "unified" because they are 
>> two different concerns.
>> 
>> Within the docstrings, I note the use of ":cvar:" which IIUC is a "class 
>> variable", SQLAlchemy ORM models do make use of the class variables at this 
>> level but they represent SQL expressions so terms like ":cvar K: backend 
>> engine, e.g., `np` or `tf`. Defaults to np" don't necessarily make sense 
>> unless they are documented as "instance variables".     Then I'm not really 
>> sure from a Sphinx pov why one would have both ":cvar:" in the top level 
>> docstring as well as per-attribute docstrings, which is what Column(... 
>> doc="doc") does.
>> 
>> My understanding of Sphinx is that it has a feature that extracts docstrings 
>> from source code in order to associate class attribute level docstrings.   
>> so if I wanted ORM models that were documented, I'd want them to look like 
>> this:
>> 
>> class Model(Base): 
>>     """
>>     Acquire from the official tensorflow_datasets model zoo, or the
>>     ophthalmology focussed ml-prepare library
>> 
>>     """
>> 
>>     __tablename__ = "model"
>> 
>>     dataset_name = Column(String, primary_key=True, default="mnist")
>>     """name of dataset"""
>> 
>>     tfds_dir = Column(String, default="~/tensorflow_datasets")
>>     "directory to look for models in"
>> 
>>     K = Column(String, default="np")
>>     "backend engine, e.g. np or tf"
>> 
>> that is, like a standard Python class, nothing special used.   if that isn't 
>> working for docs tools then the issue has to be fixed at that level.
>> 
>> I'm not actually sure why Column() has a "doc" keyword given that Sphinx 
>> should be able to scan these from the source.   The "doc" keyword assigns 
>> the given docstring to the "__doc__" attribute of the descriptor but IMO 
>> this should not be necessary, unless Sphinx is still buggy in this regard.  
>> We added that parameter many years ago and it may have been perhaps to work 
>> around limitations in Sphinx, not really sure.
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>>> 
>>> Samuel Marks
>>> Charity <https://sydneyscientific.org/> | consultancy 
>>> <https://offscale.io/> | open-source <https://github.com/offscale> | 
>>> LinkedIn <https://linkedin.com/in/samuelmarks>
>>> 
>>> 
>>> On Mon, Jan 25, 2021 at 6:39 AM Mike Bayer <mike...@zzzcomputing.com> wrote:
>>>> __
>>>> hey there, sorry I hadn't responded to this.
>>>> 
>>>> is your tool reformatting Python code?   I don't see anything "wrong" with 
>>>> it other than the code looks kind of verbose.    This would be a matter of 
>>>> personal preference but if it were me I'd want each attribute to have a 
>>>> string description listed out only once in the source code so that it may 
>>>> be edited directly.   then as far as how it appears in Sphinx and/or DDL 
>>>> there would be transparent extensions that make that happen.
>>>> 
>>>> 
>>>> 
>>>> On Sun, Jan 24, 2021, at 12:14 AM, Samuel Marks wrote:
>>>>> Would be great to have some insight here. If I'm going to start 
>>>>> generating to/fro SQLalchemy models, then I need to get the column 
>>>>> descriptions right
>>>>> 
>>>>> Samuel Marks
>>>>> Charity <https://sydneyscientific.org/> | consultancy 
>>>>> <https://offscale.io/> | open-source <https://github.com/offscale> | 
>>>>> LinkedIn <https://linkedin.com/in/samuelmarks>
>>>>> 
>>>>> 
>>>>> On Tue, Jul 28, 2020 at 5:57 PM Samuel Marks <sam...@offscale.io> wrote:
>>>>>> I have created a little tool—at the AST level—to translate between 
>>>>>> docstrings, methods, classes, and argparse. 
>>>>>> https://github.com/SamuelMarks/doctrans
>>>>>> 
>>>>>> Now looking at adding SQLalchemy support.
>>>>>> 
>>>>>> Using the mock 
>>>>>> <https://github.com/SamuelMarks/doctrans/tree/f35963b/doctrans/tests/mocks>
>>>>>>  I've been using throughout, does this look like the 'right' kind of 
>>>>>> SQLalchemy code?
>>>>>> 
>>>>>> *class *Model(Base):
>>>>>>     *"""
**    Acquire from the official tensorflow_datasets model zoo, or the 
ophthalmology focussed ml-prepare library
**
**    :cvar dataset_name: name of dataset. Defaults to mnist
**    :cvar tfds_dir: directory to look for models in. Defaults to 
~/tensorflow_datasets
**    :cvar K: backend engine, e.g., `np` or `tf`. Defaults to np
**    :cvar as_numpy: Convert to numpy ndarrays
**    :cvar data_loader_kwargs: pass this as arguments to data_loader function
**    """
**    *__tablename__ = *'model'
**
**    *dataset_name = Column(String, primary_key=*True*, default=*'mnist'*,
>>>>>>                           comment=*'name of dataset'*, doc=*'name of 
>>>>>> dataset'*)
>>>>>>     tfds_dir = Column(String, default=*'~/tensorflow_datasets'*,
>>>>>>                       comment=*'directory to look for models in'*, 
>>>>>> doc=*'directory to look for models in'*)
>>>>>>     K = Column(String, default=*'np'*,
>>>>>>                comment=*'backend engine, e.g., `np` or `tf`'*, 
>>>>>> doc=*'backend engine, e.g., `np` or `tf`'*)
>>>>>>     as_numpy = Column(Boolean,
>>>>>>                       comment=*'Convert to numpy ndarrays'*, 
>>>>>> doc=*'Convert to numpy ndarrays'*)
>>>>>>     data_loader_kwargs = Column(*'data_loader_kwargs'*, JSON,
>>>>>>                                 comment=*'pass this as arguments to 
>>>>>> data_loader function'*,
>>>>>>                                 doc=*'pass this as arguments to 
>>>>>> data_loader function'*)
>>>>>> 
>>>>>>     *# _return_type = 'Train and tests dataset splits. Defaults to 
>>>>>> (np.empty(0), np.empty(0))'
**
**    **def *__repr__(self):
>>>>>>         *"""
**        :returns: String representation of constructed object
**        :rtype: ```str```
**        """
**        **return **'<Model(dataset_name={self[dataset_name]!r},' *\
>>>>>>                *'       tfds_dir={self[tfds_dir]!r},' *\
>>>>>>                *'       K={self[K]!r},' *\
>>>>>>                *'       as_numpy={self[as_numpy]!r},' *\
>>>>>>                *'       data_loader_kwargs={self[data_loader_kwargs]!r}' 
>>>>>> *\
>>>>>>                *')>'*.format(self=self)
>>>>>> 
>>>>>> If not, what should it look like?
>>>>>> 
>>>>>> Thanks for your suggestions
>>>>>> 

>>>>>> -- 
>>>>>> SQLAlchemy - 
>>>>>> The Python SQL Toolkit and Object Relational Mapper
>>>>>>  
>>>>>> http://www.sqlalchemy.org/
>>>>>>  
>>>>>> To post example code, please provide an MCVE: Minimal, Complete, and 
>>>>>> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
>>>>>> description.
>>>>>> --- 
>>>>>> You received this message because you are subscribed to a topic in the 
>>>>>> Google Groups "sqlalchemy" group.
>>>>>> To unsubscribe from this topic, visit 
>>>>>> https://groups.google.com/d/topic/sqlalchemy/xZAh5zPswM0/unsubscribe.
>>>>>> To unsubscribe from this group and all its topics, send an email to 
>>>>>> sqlalchemy+unsubscr...@googlegroups.com.
>>>>>> To view this discussion on the web visit 
>>>>>> https://groups.google.com/d/msgid/sqlalchemy/6576d789-d088-4e68-a7f7-a17b5c96a810o%40googlegroups.com
>>>>>>  
>>>>>> <https://groups.google.com/d/msgid/sqlalchemy/6576d789-d088-4e68-a7f7-a17b5c96a810o%40googlegroups.com?utm_medium=email&utm_source=footer>.
>>>>> 

>>>>> -- 
>>>>> SQLAlchemy - 
>>>>> The Python SQL Toolkit and Object Relational Mapper
>>>>>  
>>>>> http://www.sqlalchemy.org/
>>>>>  
>>>>> To post example code, please provide an MCVE: Minimal, Complete, and 
>>>>> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
>>>>> description.
>>>>> --- 
>>>>> You received this message because you are subscribed to the Google Groups 
>>>>> "sqlalchemy" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>>> email to sqlalchemy+unsubscr...@googlegroups.com.
>>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/sqlalchemy/CAGOFhkTFRjQpTiNwM%2BMSX3dw95KGUhX-ATCpNbb_YRhZRM%2B5Rw%40mail.gmail.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/sqlalchemy/CAGOFhkTFRjQpTiNwM%2BMSX3dw95KGUhX-ATCpNbb_YRhZRM%2B5Rw%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>>>> 
>>>> 

>>>> -- 
>>>> SQLAlchemy - 
>>>> The Python SQL Toolkit and Object Relational Mapper
>>>>  
>>>> http://www.sqlalchemy.org/
>>>>  
>>>> To post example code, please provide an MCVE: Minimal, Complete, and 
>>>> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
>>>> description.
>>>> --- 
>>>> You received this message because you are subscribed to a topic in the 
>>>> Google Groups "sqlalchemy" group.
>>>> To unsubscribe from this topic, visit 
>>>> https://groups.google.com/d/topic/sqlalchemy/xZAh5zPswM0/unsubscribe.
>>>> To unsubscribe from this group and all its topics, send an email to 
>>>> sqlalchemy+unsubscr...@googlegroups.com.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/sqlalchemy/ee539bc2-67a9-4831-99e9-5e65bc527d84%40www.fastmail.com
>>>>  
>>>> <https://groups.google.com/d/msgid/sqlalchemy/ee539bc2-67a9-4831-99e9-5e65bc527d84%40www.fastmail.com?utm_medium=email&utm_source=footer>.
>>> 

>>> -- 
>>> SQLAlchemy - 
>>> The Python SQL Toolkit and Object Relational Mapper
>>>  
>>> http://www.sqlalchemy.org/
>>>  
>>> To post example code, please provide an MCVE: Minimal, Complete, and 
>>> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
>>> description.
>>> --- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "sqlalchemy" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to sqlalchemy+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/sqlalchemy/CAGOFhkSXTvSFscgTVMqGx6oVGrwL3WtvhDebGXEAMqA1rBt9kQ%40mail.gmail.com
>>>  
>>> <https://groups.google.com/d/msgid/sqlalchemy/CAGOFhkSXTvSFscgTVMqGx6oVGrwL3WtvhDebGXEAMqA1rBt9kQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>> 
>> 

>> -- 
>> SQLAlchemy - 
>> The Python SQL Toolkit and Object Relational Mapper
>>  
>> http://www.sqlalchemy.org/
>>  
>> To post example code, please provide an MCVE: Minimal, Complete, and 
>> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
>> description.
>> --- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "sqlalchemy" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/sqlalchemy/xZAh5zPswM0/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to 
>> sqlalchemy+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/sqlalchemy/ec302847-868a-40bd-83ea-34a27dfcda0e%40www.fastmail.com
>>  
>> <https://groups.google.com/d/msgid/sqlalchemy/ec302847-868a-40bd-83ea-34a27dfcda0e%40www.fastmail.com?utm_medium=email&utm_source=footer>.
> 

> -- 
> SQLAlchemy - 
> The Python SQL Toolkit and Object Relational Mapper
>  
> http://www.sqlalchemy.org/
>  
> To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description.
> --- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/sqlalchemy/CAGOFhkTUnyP%2Br9CjECu7cozR4mWb5oJfEMJsR0dkygKcN-bmpQ%40mail.gmail.com
>  
> <https://groups.google.com/d/msgid/sqlalchemy/CAGOFhkTUnyP%2Br9CjECu7cozR4mWb5oJfEMJsR0dkygKcN-bmpQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/925cbdca-2fa0-443d-811e-ba59e1c231ca%40www.fastmail.com.

Re: [sqlalchemy] Docstring recommendations for SQLalchemy models?

Reply via email to