Hi Michael,
Thanks for such a quick reply.
I enjoyed reading it! I actually inherited this API (I swear I'm not just
making excuses!) from a colleague who left a few months earlier, so it's
very much been a case of 'Figuring it out as I go along'.
Apologies for the incomplete code - despite it not being particularly
exciting code, I wanted to double check that I'm allowed to post it
publicly.
So the original (complete) code for this function is as follows:
def similar_structure_matches(smiles, similarity_threshold):
struc_sim_query = db.select([structures_tbl, text(":q_smiles as
query_smiles, jc_tanimoto(canonical_smiles, :q_smiles) as
similarity").bindparams(q_smiles=smiles)]). \
where(text("jc_tanimoto(canonical_smiles, :q_smiles) >= :q_sim").
bindparams(q_smiles=smiles, q_sim=similarity_threshold))
struc_sim_res = struc_sim_query.execute().fetchall()
if len(struc_sim_res) == 0:
return '', 204
returnMatchLimaSchema(many=True).dump(struc_sim_res)
The above code is used to generate tanimoto (similarity) scores for the
queried structure against each structure in the database( SMILES describe
chemical structures).
As I understand it, the jc_tanimoto function comes from the Chemaxon
Cartridge which we have installed on our Oracle server (Cartridge API |
ChemAxon Docs
<https://docs.chemaxon.com/display/docs/cartridge-api.md#src-1803469-cartridgeapi-jc-tanimoto>
).
I'm not entirely sure how to call this function, without it being wrapped
by text().
As I understand it, the code you sent across would be applying the
comparison (now jc_tanimoto) function in the Python; not within Oracle
itself (of course, that was impossible for you to predict with the
incomplete code I sent across).
Could you please advise on how to correctly structure this query?
Thanks again,
Dan
On Tuesday, 14 December 2021 at 13:31:12 UTC Mike Bayer wrote:
>
>
> On Tue, Dec 14, 2021, at 5:40 AM, 'Daniel Grindrod' via sqlalchemy wrote:
>
> Hi all,
>
> I'm working on a REST API which is built using Flask-SQLAlchemy and
> Connexion. I'm fairly new to SQLAlchemy, but it's been brilliant so far
> :) This API uses SQLAlchemy 1.3.16, and connects to an Oracle Database (12c
> 12.1.0.1.0 64bit).
> <https://stackoverflow.com/posts/70341129/timeline>
>
> I'm having an issue generating the correct SQL from a SQLAlchemy query. I
> would really appreciate any help. The troublesome function is shown below.
> def similar_matches(input_descriptor, threshold, lim=None, offset):
>
> query = db.select([tbl, text(":q_descriptors as query_descriptors,
> comparison(descriptors, :q_descriptors) as
> similarity")bindparams(q_descriptor=input_descriptor).\
> where( text("comparison(descriptors, :q_descriptors) >=
> q_threshold").bindparams(q_descriptor=input_descriptor, q_threshold =
> threshold)
>
>
> heya -
>
> it's early here but I almost want to be able to tell a story about that
> pattern above, which has select(text("cola, colb, colc, ...)) in it.
> It's kind of an "anti-unicorn" for me, as I've done many refactorings to
> the result-processing part of SQLAlchemy's engine and each time I do so,
> there's some internal handwringing over, "what if someone is SELECTING from
> a text() that has multiple columns comma-separated in them?", which
> specifically is a problem because it means we can't positionally link the
> columns we get back from the cursor to the Python expressions that are in
> the select() object, and each time it's like, "nah, nobody would do that",
> or, "nah, nobody *should* do that", but yet, as there's not a
> straightforward way to detect/warn for that, there's a whole set of code /
> commentary at
> https://github.com/sqlalchemy/sqlalchemy/blob/main/lib/sqlalchemy/engine/cursor.py#L325
>
> which wonders if we'd ever see this.
>
> and here it is! :) the dark unicorn. So, it's also the source of the
> issue here, because the Oracle dialect has to restructure the query to
> simulate limit/offset. Soooo.... back into the barn w/ the unicorn and
> what we do here is make sure the select() has enough structure so that
> SQLAlchemy knows what's going on and here that would look like (note I'm
> making some syntactical assumptions about the code above which seems to be
> incomplete ):
>
> from sqlalchemy import literal, func
>
> query = db.select(
> [
> tbl,
> literal(input_descriptor).label("query_descriptors"),
> func.comparison(tbl.c.descriptors,
> q_descriptors).label("similarity")
> ]).
> where(
> func.comparison(tbl.c.descriptors, q_descriptors) >= threshold
>
> )
>
> that way your select() will have .selected_columns entries for every
> column in "tbl" plus columns "query_descriptors" and "similarity", and
> these will export on outwards to the subquery that the Oracle dialect
> creates.
>
>
>
>
>
>
>
>
>
> res = query.execute().fetchall()
>
> if len(res)=0 return '', 204
>
> return MatchLimaScheme(many = True).dump(res)
>
> This SQLAlchemy code takes two inputs (descriptor and threshold), and
> searches through each descriptor in an Oracle database, calculating a
> similarity measure between the queried descriptor and each stored
> descriptor. All rows where similarity score >= threshold are returned in a
> JSON.
>
> The above code works fine, but returns all results - whereas I want to
> also be able to include a .offset() and a .limit() (for lazy loading). The
> code above generates SQL along these lines:
> SELECT ID, last_modified, descriptors, :q_descriptors as
> query_descriptors, comparison(descriptors, :q_descriptors) as similarity'
> FROM tbl WHERE compare(descriptors, :q_descriptors) >= :q_threshold
>
> which works well. However, when I add .limit() or .offset() on the end of
> my query i.e.
> query = db.select([tbl, text(":q_descriptors as query_descriptors,
> comparison(descriptors, :q_descriptors) as similarity"
> ).bindparams(q_descriptor=input_descriptor).where(
> text("comparison(descriptors, :q_descriptors) >= :q_threshold")
> .bindparams(q_descriptor=input_descriptor,q_threshold =
> threshold).limit(limit)
>
> The SQL generated changes to be along these lines:
> SELECT ID, last_modified, descriptors FROM (SELECT tbl.ID as ID,
> tbl.last_modified as last_modified, tbl.descriptors as descriptors,
> :q_descriptors as query_descriptors, comparison(descriptors,
> :q_descriptors) as similarity) FROM tbl WHERE compare(descriptors,
> :q_descriptors) >= :q_threshold WHERE ROWNUM <= :q_limit
>
> As a raw SQL query this is fine, but I'm no longer including the
> query_descriptors and similarity metrics in my SELECT clause. Thus I get a
> columnNotFoundError. How do I adjust the .select() function above so that
> my SQL looks more like:
> SELECT ID, last_modified, descriptors, query_descriptors, similarity FROM
> (SELECT tbl.ID as ID, tbl.last_modified as last_modified, tbl.descriptors
> as descriptors,:q_descriptors as query_descriptors comparison(descriptors,
> :q_descriptors) as similarity' FROM tbl WHERE compare(descriptors,
> :q_descriptors) >= :q_threshold WHERE ROWNUM <= :q_limit OFFSET :q_offset
>
> Basically, I'm looking to explicitly tell SQLAlchemy that I want to SELECT
> tbl AND query_descriptors AND similarity.
>
> I've also been informed that it's bad practice to not include a
> .order_by() in these queries, but I don't think that is what's causing the
> issue here. It is on my to do list though.
>
> Please let me know if I need to provide more information.
>
> Thanks for any help,
>
> Dan
>
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/sqlalchemy/adf3ee50-d198-4193-bcfb-e866cad62e52n%40googlegroups.com
>
> <https://groups.google.com/d/msgid/sqlalchemy/adf3ee50-d198-4193-bcfb-e866cad62e52n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
>
>
--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
http://www.sqlalchemy.org/
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable
Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/sqlalchemy/ebbe2a3f-4c2b-4512-a093-870c9d842a8en%40googlegroups.com.