Re: [sqlalchemy] How to SELECT computed data using SQLAlchemy?

Mike Bayer Tue, 14 Dec 2021 10:16:04 -0800


On Tue, Dec 14, 2021, at 12:26 PM, 'Daniel Grindrod' via sqlalchemy wrote:
> Hi Michael,
> 
> Thanks for such a quick reply.
> I enjoyed reading it! I actually inherited this API (I swear I'm not just 
> making excuses!) from a colleague who left a few months earlier, so it's very 
> much been a case of 'Figuring it out as I go along'.
> 
> Apologies for the incomplete code - despite it not being particularly 
> exciting code, I wanted to double check that I'm allowed to post it publicly. 
> So the original (complete) code for this function is as follows:
> 
> def similar_structure_matches(smiles, similarity_threshold): 
> 
>   struc_sim_query = db.select([structures_tbl, text(":q_smiles as 
> query_smiles, jc_tanimoto(canonical_smiles, :q_smiles) as 
> similarity").bindparams(q_smiles=smiles)]). \      
> where(text("jc_tanimoto(canonical_smiles, :q_smiles) >= :q_sim"). 
> bindparams(q_smiles=smiles, q_sim=similarity_threshold)) 
> 
>   struc_sim_res = struc_sim_query.execute().fetchall()
> 
>   if len(struc_sim_res) ==   0: 
>     return '', 204 
> 
> returnMatchLimaSchema(many=True).dump(struc_sim_res) 
> The above code is used to generate tanimoto (similarity) scores for the 
> queried structure against each structure in the database( SMILES describe 
> chemical structures).
> As I understand it, the jc_tanimoto function comes from the Chemaxon 
> Cartridge which we have installed on our Oracle server (Cartridge API | 
> ChemAxon Docs 
> <https://docs.chemaxon.com/display/docs/cartridge-api.md#src-1803469-cartridgeapi-jc-tanimoto>).
> 
> I'm not entirely sure how to call this function, without it being wrapped by 
> text().
> As I understand it, the code you sent across would be applying the comparison 
> (now jc_tanimoto) function in the Python; not within Oracle itself (of 
> course, that was impossible for you to predict with the incomplete code I 
> sent across).
> 
> Could you please advise on how to correctly structure this query?


There's a construct in SQLAlchemy called func that renders a SQL -function-like 
syntax for any arbitrary word, like this:

f>>> from sqlalchemy import func
>>> from sqlalchemy import select
>>> print(select([func.jc_tanimoto('some data').label("my_label")]))
SELECT jc_tanimoto(:jc_tanimoto_1) AS my_label

so as long as there's no unusual SQL syntaxes in play you can use func.<name> 
to generate SQL for any SQL function with parameters.





> 
> Thanks again,
> Dan
> 
> On Tuesday, 14 December 2021 at 13:31:12 UTC Mike Bayer wrote:
>> 
>> 
>> On Tue, Dec 14, 2021, at 5:40 AM, 'Daniel Grindrod' via sqlalchemy wrote:
>>> Hi all,
>>> 
>>> I'm working on a REST API which is built using Flask-SQLAlchemy and 
>>> Connexion. I'm fairly new to SQLAlchemy, but it's been brilliant so far :) 
>>> This API uses SQLAlchemy 1.3.16, and connects to an Oracle Database (12c 
>>> 12.1.0.1.0 64bit). 
 <https://stackoverflow.com/posts/70341129/timeline>
>>> I'm having an issue generating the correct SQL from a SQLAlchemy query. I 
>>> would really appreciate any help. The troublesome function is shown below.
>>> 
>>> def similar_matches(input_descriptor, threshold, lim=None, offset): 
>>> 
>>>   query = db.select([tbl, text(":q_descriptors as query_descriptors, 
>>> comparison(descriptors, :q_descriptors) as 
>>> similarity")bindparams(q_descriptor=input_descriptor).\
>>>   where( text("comparison(descriptors, :q_descriptors) >=  
>>> q_threshold").bindparams(q_descriptor=input_descriptor, q_threshold = 
>>> threshold) 
>> 
>> heya -
>> 
>> it's early here but I almost want to be able to tell a story about that 
>> pattern above, which has select(text("cola, colb, colc, ...))  in it.   It's 
>> kind of an "anti-unicorn" for me, as I've done many refactorings to the 
>> result-processing part of SQLAlchemy's engine and each time I do so, there's 
>> some internal handwringing over, "what if someone is SELECTING from a text() 
>> that has multiple columns comma-separated in them?", which specifically is a 
>> problem because it means we can't positionally link the columns we get back 
>> from the cursor to the Python expressions that are in the select() object, 
>> and each time it's like, "nah, nobody would do that", or, "nah, nobody 
>> *should* do that", but yet, as there's not a straightforward way to 
>> detect/warn for that, there's a whole set of code / commentary at 
>> https://github.com/sqlalchemy/sqlalchemy/blob/main/lib/sqlalchemy/engine/cursor.py#L325
>>  which wonders if we'd ever see this.   
>> 
>> and here it is!  :)   the dark unicorn.    So, it's also the source of the 
>> issue here, because the Oracle dialect has to restructure the query to 
>> simulate limit/offset.   Soooo.... back into the barn w/ the unicorn and 
>> what we do here is make sure the select() has enough structure so that 
>> SQLAlchemy knows what's going on and here that would look like (note I'm 
>> making some syntactical assumptions about the code above which seems to be 
>> incomplete ):
>> 
>> from sqlalchemy import literal, func
>> 
>> query = db.select(
>>     [
>>         tbl,
>>         literal(input_descriptor).label("query_descriptors"),
>>         func.comparison(tbl.c.descriptors, q_descriptors).label("similarity")
>>     ]).
>>   where(
>>     func.comparison(tbl.c.descriptors, q_descriptors) >= threshold
>> 
>>   )
>> 
>> that way your select() will have .selected_columns entries for every column 
>> in "tbl" plus columns "query_descriptors" and "similarity", and these will 
>> export on outwards to the subquery that the Oracle dialect creates.
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>>> 
>>> 
>>>   res = query.execute().fetchall() 
>>> 
>>>   if len(res)=0 return '', 204 
>>> 
>>>   return MatchLimaScheme(many = True).dump(res) 
>>> This SQLAlchemy code takes two inputs (descriptor and threshold), and 
>>> searches through each descriptor in an Oracle database, calculating a 
>>> similarity measure  between the queried descriptor and each stored 
>>> descriptor. All rows where similarity score >= threshold are returned in a 
>>> JSON.
>>> 
>>> The above code works fine, but returns all results - whereas I want to also 
>>> be able to include a .offset() and a .limit() (for lazy loading). The code 
>>> above generates SQL along these lines:
>>> 
>>> SELECT ID, last_modified, descriptors, :q_descriptors as query_descriptors, 
>>> comparison(descriptors, :q_descriptors) as similarity' FROM tbl WHERE 
>>> compare(descriptors, :q_descriptors) >= :q_threshold 
>>> which works well. However, when I add .limit() or .offset() on the end of 
>>> my query i.e.
>>> 
>>> query = db.select([tbl, text(":q_descriptors as query_descriptors, 
>>> comparison(descriptors, :q_descriptors) as similarity" 
>>> ).bindparams(q_descriptor=input_descriptor).where( 
>>> text("comparison(descriptors, :q_descriptors) >= :q_threshold") 
>>> .bindparams(q_descriptor=input_descriptor,q_threshold = 
>>> threshold).limit(limit) 
>>> The SQL generated changes to be along these lines: 
>>> 
>>> SELECT ID, last_modified, descriptors FROM (SELECT tbl.ID as ID, 
>>> tbl.last_modified as last_modified, tbl.descriptors as descriptors, 
>>> :q_descriptors as query_descriptors, comparison(descriptors, 
>>> :q_descriptors) as similarity) FROM tbl WHERE compare(descriptors, 
>>> :q_descriptors) >= :q_threshold WHERE ROWNUM <= :q_limit 
>>> As a raw SQL query this is fine, but I'm no longer including the 
>>> query_descriptors and similarity metrics in my SELECT clause. Thus I get a 
>>> columnNotFoundError. How do I adjust the .select() function above so that 
>>> my SQL looks more like:
>>> 
>>> SELECT ID, last_modified, descriptors, query_descriptors, similarity FROM 
>>> (SELECT tbl.ID as ID, tbl.last_modified as last_modified, tbl.descriptors 
>>> as descriptors,:q_descriptors as query_descriptors comparison(descriptors, 
>>> :q_descriptors) as similarity' FROM tbl WHERE compare(descriptors, 
>>> :q_descriptors) >= :q_threshold WHERE ROWNUM <= :q_limit OFFSET :q_offset 
>>> Basically, I'm looking to explicitly tell SQLAlchemy that I want to SELECT 
>>> tbl AND query_descriptors AND similarity.
>>> 
>>> I've also been informed that it's bad practice to not include a .order_by() 
>>> in these queries, but I don't think that is what's causing the issue here. 
>>> It is on my to do list though.
>>> 
>>> Please let me know if I need to provide more information.
>>> 
>>> Thanks for any help,
>>> 
>>> Dan
>>> 
>>> 
>>> 
>>> -- 
>>> SQLAlchemy - 
>>> The Python SQL Toolkit and Object Relational Mapper
>>>  
>>> http://www.sqlalchemy.org/
>>>  
>>> To post example code, please provide an MCVE: Minimal, Complete, and 
>>> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
>>> description.
>>> --- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "sqlalchemy" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to [email protected].
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/sqlalchemy/adf3ee50-d198-4193-bcfb-e866cad62e52n%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/sqlalchemy/adf3ee50-d198-4193-bcfb-e866cad62e52n%40googlegroups.com?utm_medium=email&utm_source=footer>.
>> 
> 
> 
> -- 
> SQLAlchemy - 
> The Python SQL Toolkit and Object Relational Mapper
>  
> http://www.sqlalchemy.org/
>  
> To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description.
> --- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/sqlalchemy/ebbe2a3f-4c2b-4512-a093-870c9d842a8en%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/sqlalchemy/ebbe2a3f-4c2b-4512-a093-870c9d842a8en%40googlegroups.com?utm_medium=email&utm_source=footer>.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/c9802c3a-4ee6-4175-bd36-fff044a167a1%40www.fastmail.com.

Re: [sqlalchemy] How to SELECT computed data using SQLAlchemy?

Reply via email to