Re: [sqlalchemy] Re: "in_()" operator is not currently implemented for many-to-one-relations - alternatives?

Conor Fri, 19 Mar 2010 08:19:08 -0700

Stodge wrote:
> Thanks. That doesn't quite work. Based on my data, the following
> should (and does) work because it only returns document id=1, which
> only has these two tags:
>
> tag_list = ['my document', 'source code']
> session.query(Document).\
>       filter(Document.tags.any(Tag.tag.in_([t for t in tag_list]))).\
>       filter(~Document.tags.any(~Tag.tag.in_([t for t in tag_list])))
>
> The following should return no records, as there is no document that
> has only these tags. Instead it returns document id=2, which only has
> the tag 'random stuff':
>
> tag_list = ['my document', 'source code', 'random stuff']
> session.query(Document).\
>       filter(Document.tags.any(Tag.tag.in_([t for t in tag_list]))).\
>       filter(~Document.tags.any(~Tag.tag.in_([t for t in tag_list])))
>
>


Try this query instead:

tag_list = ['my document', 'source code', 'random stuff']
q = session.query(Document)
for t in tag_list:
    q = q.filter(Document.tags.any(Tag.tag == t))
q = q.filter(~Document.tags.any(~Tag.tag.in_(tag_list)))

Alternatively, if your database supports aggregating row sets into
arrays/strings (e.g. PostgreSQL supports ARRAY(SELECT ...) to collect
row sets into an array, MySQL and others have GROUP_CONCAT() that I
believe you can use for this purpose), you may prefer a different
technique. I will show an example with PostgreSQL and ARRAY().

subq = session.query(Tag.tag)
subq = subq.join(document_tags)
subq = subq.filter(document_tags.c.document_id == Document.id)
subq = subq.order_by(Tag.tag.asc())
subq = subq.correlate(Document)
subq = subq.subquery()

q = session.query(Document)
q = q.filter(func.array(subq) == sorted(tag_list))
q = q.correlate(s) # May not be needed

This should generate SQL like:

SELECT <Document columns>
FROM document
WHERE ARRAY(
  SELECT tag.tag
  FROM tag
  JOIN document_tags ON document_tags.tag_id = tag.id
  WHERE document_tags.document_id = document.id
  ORDER BY tag.tag ASC
) = %(array_1)s

where array_1 would be ['my document', 'random stuff', 'source code'].
Note how both the subquery and array_1 have to be sorted. I tend to
prefer this type of query since its complexity doesn't grow as you add
more tags in your search criteria.

-Conor

>
> On Mar 19, 10:15 am, Michael Bayer <mike...@zzzcomputing.com> wrote:
>   
>> select document.* from document join tags on document.id=tags.document_id
>> where tags.tag='foo' and tags.tag='bar' and tags.tag=....
>>
>> am I missing something ?  that would return no rows in most cases.
>>
>> if you want to find documents that have an exact list of tags, you'd have to 
>> do something like the IN query we started with, and additionally ensure no 
>> extra tags remain.
>>
>> like:
>>
>> sess.query(Document).\
>>                 filter(Document.tags.any(Tag.id.in_([t.id for t in 
>> tag_list])).\
>>                 filter(~Document.tags.any(~Tag.id.in_([t.id for t in 
>> tag_list]))
>>
>> On Mar 19, 2010, at 8:31 AM, Stodge wrote:
>>
>>     
>>> Now we're getting somewhere:
>>>       
>>> expressions = []
>>> for tag in tag_list:
>>>    expressions += [Tag.tag==tag]
>>> documents =
>>> session.query(Document).join(Document.tags).filter(and_(*expressions))
>>>       
>>> Thanks to a Storm example I found. :)
>>>       
>>> On Mar 19, 8:12 am, Stodge <sto...@gmail.com> wrote:
>>>       
>>>> Ok so far I have this:
>>>>         
>>>> expressions = []
>>>> for tag in tag_list:
>>>>         expressions += session.query(Document).filter(Tag.tag==tag)
>>>> documents =
>>>> session.query(Document).join(Document.tags).filter(and_(*expressions))
>>>>         
>>>> Doesn't work but it's progress! :)
>>>>         
>>>> On Mar 18, 2:37 pm, Stodge <sto...@gmail.com> wrote:
>>>>         
>>>>> Thanks that worked beautifully.
>>>>>           
>>>>> On a similar note, how would I match documents with only the tags that
>>>>> I specify in the list? My naive attempt is:
>>>>>           
>>>>> for tag in tag_list:
>>>>>     session.query(Document).join(Document.tags).filter_by(tag=tag)
>>>>>           
>>>>> But that doesn't work.
>>>>>           
>>>>> On Mar 15, 10:54 am, "Michael Bayer" <mike...@zzzcomputing.com> wrote:
>>>>>           
>>>>>> Stodgewrote:
>>>>>>             
>>>>>>> I have two classes with a third table:
>>>>>>>               
>>>>>>> document_tags = Table('document_tags', metadata,
>>>>>>>    Column('document_id', Integer, ForeignKey('documents.id')),
>>>>>>>    Column('tag_id', Integer, ForeignKey('tags.id'))
>>>>>>> )
>>>>>>>               
>>>>>>> class Document(Base):
>>>>>>>    __tablename__ = 'documents'
>>>>>>>               
>>>>>>>    id = Column(Integer, primary_key=True)
>>>>>>>    title = Column(String)
>>>>>>>    filename = Column(String)
>>>>>>>    tags = relation('Tag', secondary=document_tags, backref='tags')
>>>>>>>               
>>>>>>>    def __init__(self, title, filename):
>>>>>>>            self.title = title
>>>>>>>            self.filename = filename
>>>>>>>               
>>>>>>> class Tag(Base):
>>>>>>>    __tablename__ = 'tags'
>>>>>>>               
>>>>>>>    id = Column(Integer, primary_key=True)
>>>>>>>    tag = Column(String)
>>>>>>>               
>>>>>>>    def __init__(self, tag):
>>>>>>>            self.tag = tag
>>>>>>>               
>>>>>>> I want to find all documents with tags in a given list of tags:
>>>>>>>               
>>>>>>> documents =
>>>>>>> session.query(Document).filter(Document.tags.in_(tag_list))
>>>>>>>               
>>>>>>> except I get the familiar message that the "in_()" operator is not
>>>>>>> currently implemented for many-to-one-relations.
>>>>>>>               
>>>>>>> I've searched and found some alternatives but I can't get any to work.
>>>>>>> Is there an easy example that will make this work? Thanks
>>>>>>>               
>>>>>> if the error message says "many-to-one" then that's a bug.  Your relation
>>>>>> is many-to-many.
>>>>>>             
>>>>>> in this case the syntactically easiest method is to use any().
>>>>>> Document.tags.any(Tag.id.in_([t.id for t in tag_list])).
>>>>>>             
>>>>>> A join could be more performant, which would be:
>>>>>>             
>>>>>> query.join(Document.tags).filter(Tag.id.in_([t.id for t in tag_list]))

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Re: "in_()" operator is not currently implemented for many-to-one-relations - alternatives?

Reply via email to