[sqlalchemy] Re: wrong number of rows returned

DavidG Fri, 28 Aug 2009 15:54:58 -0700

Here is what I am doing now (even forgetting the limit):
---------------------
username = u'hal'


subq = SES.query(Bar).filter(Bar.username == username).\
     subquery()
valias = aliased(Bar, subq)
q = SES.query(Foo, valias).order_by(Foo.mdata).\
     outerjoin(Foo.bar, valias)

recs = q.all()
print "recs=", recs
print "len=", len(recs)
----------------------
results:

recs= [(Foo(u'f1'), Bar(u'b1')), (Foo(u'f2'), Bar(u'b2')),
(Foo(u'f3'), None), (Foo(u'f4'), None), (Foo(u'f5'), None)]
len= 5

This *looks* correct!

However, when you look at the sql, here is what you have:

SELECT foo.id AS foo_id, foo.mdata AS foo_mdata, anon_1.id
AS anon_1_id,
anon_1.mdata AS anon_1_mdata, anon_1.username
AS anon_1_username,
anon_1.foo_id AS anon_1_foo_id
FROM foo LEFT OUTER JOIN bar ON foo.id = bar.foo_id
LEFT OUTER JOIN (SELECT bar.id AS id, bar.mdata
AS mdata, bar.username
AS username, bar.foo_id AS foo_id
FROM bar
WHERE bar.username = 'hal') AS
anon_1 ON foo.id = anon_1.foo_id ORDER BY foo.mdata

And when you run this you get:

foo_id,foo_mdata,anon_1_id,anon_1_mdata,anon_1_username,anon_1_foo_id

1,f1,1,b1,hal,1
2,f2,2,b2,hal,2
2,f2,2,b2,hal,2
3,f3,NULL,NULL,NULL,NULL
4,f4,NULL,NULL,NULL,NULL
5,f5,NULL,NULL,NULL,NULL

6 rows in set (0.00 sec)

Note the duplicate records (foo_id=2) (Mike, you pointed
this out before that this could account for the record
descrepency).

However, this is *incorrect* (at least, not what is wanted).
So no wonder, when offset,limit is applied, we don't get
what we want!

If you are still here, thanks for paying attention this far....

David

On Aug 28, 12:31 pm, DavidG <dgel...@gmail.com> wrote:
> I guess I should restate the question.
>
> I have two tables:
>
> class Foo(Base):
>     __tablename__ = 'foo'
>
>     id = Column(Integer, primary_key=True)
>     mdata = Column(UnicodeText)
>
>     bar = relation("Bar")
>
>     def __repr__(self):
>         return "Foo(%r)" % self.mdata
>
> class Bar(Base):
>     __tablename__ = 'bar'
>     id = Column(Integer, primary_key=True)
>
>     mdata = Column(UnicodeText)
>     username = Column(UnicodeText)
>     foo_id = Column(Integer,
>              ForeignKey('foo.id'))
>
>     def __repr__(self):
>         return "Bar(%r)" % self.mdata
>
> Foo contains records:
>
> mysql> select * from foo;
> +----+-------+
> | id | mdata |
> +----+-------+
> |  1 | f1    |
> |  2 | f2    |
> |  3 | f3    |
> |  4 | f4    |
> |  5 | f5    |
> +----+-------+
> 5 rows in set (0.00 sec)
>
> Bar contains records:
>
> mysql> select * from bar;
> +----+-------+----------+--------+
> | id | mdata | username | foo_id |
> +----+-------+----------+--------+
> |  1 | b1    | hal      |      1 |
> |  2 | b2    | hal      |      2 |
> |  3 | b3    | homer    |      2 |
> +----+-------+----------+--------+
> 3 rows in set (0.00 sec)
>
> Note there are at most 1 Bar record for a given Foo record,
> for a given username (but there could be none).
>
> Given a particular username, I would like to select a range
> of Foo records (offset,limit), outer-joined with Bar
> records (for that username), and ordered by, say Foo.mdata
> (ordered *before* the range).
>
> For example, say I am looking at username = u'hal'. I would
> like to construct an efficient query q such that:
>
> q[0:3] gives me something like:
>
> [(Foo(u'f1'), Bar(u'b1')),
> (Foo(u'f2'), Bar(u'b2')),
> (Foo(u'f3'), None)]
>
> I have tried constructs similar to the ones in my original
> question, but so far nothing has worked quite right.
>
> Thank you for your time, and please, forgive my ignorance, but
> I am very confused!
>
> David
>
> On Aug 25, 2:02 am, DavidG <dgel...@gmail.com> wrote:
>
> > Thanks, Mike. I'll work on that. I do have a test case, but there
> > everything works fine. There must be something in my real data. I just
> > can't figure it out what it is!
>
> > D.
>
> > On Aug 24, 8:45 pm, Michael Bayer <mike...@zzzcomputing.com> wrote:
>
> > > On Aug 24, 2009, at 7:44 PM, DavidG wrote:
>
> > > > First, there is at most a single Feedback record (for a given user,
> > > > hence the subquery) per quote (and in the one case I have been banging
> > > > my head on, I am certain of this).
>
> > > > And this:  why would I get different results from pasting the echoed
> > > > sql into the online mysql query vs from sqlalchemy directly?
>
> > > for the reasons I've given - entities are uniqued, primary keys with  
> > > null columns are skipped by default.   for further explanation you  
> > > need to illustrate a test case illustrating the identical behavior here.
>
> > > > Thanks,
> > > > David.
>
> > > > On Aug 24, 6:39 pm, "Michael Bayer" <mike...@zzzcomputing.com> wrote:
> > > >> DavidG wrote:
>
> > > >>> Hi Mike -
>
> > > >>> Confused. Why would it be different with the limit() or not?
>
> > > >> well there's not enough detail to say exactly but you're applying the
> > > >> limit() to a query with outer join.  So if Quote number one had five
> > > >> related Feedback entries, you'd get one row back for all five of  
> > > >> those,
> > > >> unless the Feedback entries were part of the returned results.
>
> > > >> another thing that happens, but is probably not happening here, is if
> > > >> Quote is mapped to a join that might return NULL for some primary  
> > > >> keys,
> > > >> those aren't going to be turned into entities either unless the  
> > > >> mapping
> > > >> specifies allow_null_pks=True.  In 0.6 this option is just turned on
> > > >> permanently since it turned out nobody wants it the other way.
>
> > > >> Without
>
> > > >>> the limit() I get *all* the Quote records (>1000) which is  
> > > >>> correct. If
> > > >>> I have something like limit(10), I'll get *less then 10*.
>
> > > >>> Also, I didn't know about the "unique entities" limitation. In any
> > > >>> event, the Quote objects are all unique (via their unique  
> > > >>> primary_key
> > > >>> "id").
>
> > > >>> OK, more details. Here are the classes (summary):
>
> > > >>> class Quote(Base):
> > > >>>     __tablename__ = "quote"
>
> > > >>>     id = Column(Integer, primary_key=True)
> > > >>>     date_create = Column(DateTime)
> > > >>>     feedback = relation('Feedback')
>
> > > >>> class Feedback(Base):
> > > >>>     __tablename__ = "feedback"
>
> > > >>>     id = Column(Integer, primary_key=True)
> > > >>>     username = Column(Unicode(20))      # NOTE: this is a ForeignKey
> > > >>> also, but ignore for now.
> > > >>>     quote_id = Column(Integer,
> > > >>>                    ForeignKey('quote.id'))
> > > >>>     vote = Column(Integer, default=0)   # -1 or +1
>
> > > >>> This *should* be so simple: there are bunches of quotes. There *may*
> > > >>> be a (single) Feedback record for each user for each quote. For a
> > > >>> given username, I want to display a range of quotes, sorted a
> > > >>> particular way, with the Feedback record for each quote (when it
> > > >>> exists) tacked on (in a tuple is fine).
>
> > > >>> Thanks!
>
> > > >>> On Aug 24, 4:42 pm, "Michael Bayer" <mike...@zzzcomputing.com>  
> > > >>> wrote:
> > > >>>> Query(), when called with entity classes as arguments, returns only
> > > >>>> unique
> > > >>>> entities or unique combinations thereof.   to get the raw data call
> > > >>>> Query
> > > >>>> with columns/attributes as arguments instead.
>
> > > >>>> DavidG wrote:
>
> > > >>>>> Hi,
>
> > > >>>>> I can give all the details, but let's start with a simple  
> > > >>>>> question.
>
> > > >>>>> I have a query, and it is returning the wrong number of rows!
>
> > > >>>>> Not only is the number wrong compared to what I would expect, but,
> > > >>>>> more importantly, when I paste the *exact sql* (except for
> > > >>>>> substituting a param) printed on the console with "echo on" into  
> > > >>>>> the
> > > >>>>> mysql prompt, the results are exactly what I would expect.
>
> > > >>>>> What, if anything, is known to cause the "printed sql" to give a
> > > >>>>> different result then sqlalchemy itself?
>
> > > >>>>> I am using:
> > > >>>>> sqlalchemy-0.5.5
> > > >>>>> mysql
> > > >>>>> python 2.6.2
>
> > > >>>>> I am using the orm, and I am doing basically:
>
> > > >>>>> recs = q.all()
> > > >>>>> print "len(recs)=", len(recs)
>
> > > >>>>> where q is the query.
>
> > > >>>>> Sample:
>
> > > >>>>> username = u'steve'
> > > >>>>> subq = SES.query(Feedback).filter(Feedback.username ==
> > > >>>>> username).subquery()
> > > >>>>> valias = aliased(Feedback, subq)
> > > >>>>> q = SES.query(Quote,  
> > > >>>>> valias).order_by(desc(Quote.n_votes)).outerjoin
> > > >>>>> (Quote.feedback, valias).limit(2)
>
> > > >>>>> Without the limit(), I get all the records (>1000), which seem
> > > >>>>> correct. *With* the limit, the number of records is completely  
> > > >>>>> kookie
> > > >>>>> (to me!). It seems to be always *less* then what the actual  
> > > >>>>> limit is.
>
> > > >>>>> But again, the sql printed on the console gives me the correct
> > > >>>>> results!
>
> > > >>>>> Any help would be most appreciated! Thank you. I will happily  
> > > >>>>> furnish
> > > >>>>> more details if needed.
>
> > > >>>>> David
>
>
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---

[sqlalchemy] Re: wrong number of rows returned

Reply via email to