Re: [sqlalchemy] `func.similarity` performance

Robert Smith Fri, 01 Apr 2016 10:25:38 -0700

Mike Bayer:

Thank you for your explanation. Yes, I thought `func.XYZ` was more aware of 
its functions. In the case of `%` and indexes, I think there is an 
improvement if there is an index, but the query still works as usual even 
in the absence of an index. However, if the implementation is not trivial, 
it is better to look for alternatives.




On Thursday, March 31, 2016 at 6:15:41 PM UTC-6, Mike Bayer wrote:
>
>
>
> On 03/31/2016 06:40 PM, Robert Smith wrote: 
> > Mike Bayer: 
> > 
> > Thank you for your response. I wasn't really asking whether the 
> > optimization I described above (using `%` instead of `similarity`) is 
> > correct or not. Based on some resources (e.g. Super Fuzzy Searching on 
> > PostgreSQL 
> > <
> http://webcache.googleusercontent.com/search?q=cache:59hB5zYwRzcJ:bartlettpublishing.com/site/bartpub/blog/3/entry/350+&cd=9&hl=en&ct=clnk&gl=mx>,
>  
>
> > FUZZY STRING MATCHING WITH TRIGRAM AND TRIGRAPHS 
> > <
> http://www.postgresonline.com/journal/archives/169-Fuzzy-string-matching-with-Trigram-and-Trigraphs.html>),
>  
>
> > I think it is correct although I wouldn't mind being proven wrong. I was 
> > mainly asking, why sqlalchemy doesn't seem to take advantage of this 
> > improvement? 
>
> The Core keeps everyone sane by emitting to as much a degree as is 
> possible exactly the SQL you tell it to.   I see a hardcoded number ".5" 
> in your optimization as well as a significant structural difference, 
> which I don't think it would be reasonable for Core to take it upon 
> itself to emit with no intent coming from the user. 
>
> More generally, func.XYZ() has no awareness of most functions except a 
> small handful just for the purposes of helping with return types. 
>
> Similarly, there are no attempts to dig into the expressions used within 
> functions and try to make decisions based on indexes associated with the 
> elements of those expressions; parsing SQL and applying indexes to 
> expressions is the job of the database itself and it wouldn't be 
> reasonable for SQLAlchemy to start attempting to duplicate all that 
> extremely rich, complex, and idiosyncratic logic that we get for free 
> from the database.  If this optimization is truly applicable by default 
> I'd ask why doesn't Postgresql itself apply it automatically behind the 
> scenes; if the database engine is failing to take advantage of its own 
> optimizations which can be established as appropriate by default without 
> any additional intent from the user, that's a bug on their end. 
>
> > 
> > Your second comment intrigued me: `though this is not really important 
> > in the case of automapping as indexes do not affect ORM mappings.`. I'm 
> > pretty new to sqlalchemy, so I think I don't understand what you mean. 
> > Can you describe this a bit more? 
>
> An ORM mapping links the attributes in an object-oriented class to rows 
> in a database table.   The purpose of this mapping is to generate 
> SELECT, INSERT, UPDATE, and DELETE statements to be emitted to the 
> database, and to retrieve result data from these statements in the form 
> of database rows.  None of these operations have any requirement to 
> explicitly state the names of indexes nor are any indexes required for 
> any of those operations to function.  So awareness of indexes on the 
> SQLAlchemy side for the purposes of emitting and retrieving data from 
> DQL or DML is unnecessary.  The SQLAlchemy Index() construct is only 
> useful for DDL (e.g. CREATE INDEX and related statements), which is a 
> concern separate from object relational mapping. 
>
>
>
>
>
> > 
> > Regards 
> > 
> > On Thursday, March 31, 2016 at 7:43:11 AM UTC-6, Mike Bayer wrote: 
> > 
> > 
> > 
> >     On 03/30/2016 09:50 PM, Robert Smith wrote: 
> >      > I'm using sqlalchemy 1.0.12 with postgreSQL 9.4.6 with the 
> pg_trgm 
> >      > extension enabled. Basically, I'm performing a similarity-based 
> >     query on 
> >      > a single column: 
> >      > 
> >      > In [26]: 
> >      > 
> >     
> str(session.query(Model).order_by(desc(func.similarity(Model.description, 
> >     'string'))).limit(100)) 
> >      > Out[26]: 'SELECT model.id <http://model.id> AS model_id, 
> >     model.description AS 
> >      > model_description, \nFROM model ORDER BY 
> >     similarity(model.description, 
> >      > :similarity_1) DESC\n LIMIT :param_1' 
> >      > 
> >      > This query works okay, but I'm worried that looking at the query 
> >     that is 
> >      > being used, it doesn't seem to take advantage of some performance 
> >      > improvements that are available in this kind of situation. As a 
> >     simple 
> >      > example, I noticed that people recommend not to use `similarity` 
> and 
> >      > instead rely on `%` or alternatively, use `WHERE 
> >     model.description  % 
> >      > 'string' AND similarity(model.description, 'string') > 0.5`. This 
> is 
> >      > apparently related to the efficient use of a "gist" index. 
> > 
> >     this is more of a Postgresql optimization question so I dont have 
> any 
> >     insight on that. 
> > 
> >      > 
> >      > Another question: In the case of having an index for the trigram 
> >      > matches, does sqlalchemy pick up that information automatically 
> >     when I'm 
> >      > using `automap_base()` to reflect my database? 
> > 
> >     Table reflection will reflect table indexes, though this is not 
> really 
> >     important in the case of automapping as indexes do not affect ORM 
> >     mappings. 
> > 
> > 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> > Groups "sqlalchemy" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> > an email to sqlalchemy+...@googlegroups.com <javascript:> 
> > <mailto:sqlalchemy+unsubscr...@googlegroups.com <javascript:>>. 
> > To post to this group, send email to sqlal...@googlegroups.com 
> <javascript:> 
> > <mailto:sqlal...@googlegroups.com <javascript:>>. 
> > Visit this group at https://groups.google.com/group/sqlalchemy. 
> > For more options, visit https://groups.google.com/d/optout. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] `func.similarity` performance

Reply via email to