[web2py] Re: ORM (?) : A Revisit, NOT a Rebuttal

Anthony Fri, 03 May 2013 21:36:53 -0700


> def a_child_was_born_in(countryName, cityName):
>     city = db.Country(Name=countryName).City(Name=cityName).select().first
> ()
>     city.update_record(Population=city.Population + 1)
>
> def a_person_has_died_in(countryName, cityName):
>     city = db.Country(Name=countryName).City(Name=cityName).select().first
> ()
>     city.update_record(Population-city.Population - 1)
>
 
Technically, it would be db.Country(Name=countryName).City(db.City.Name == 
cityName).select().first().


# In context 1:
> a_child_was_born_in('France', 'Paris')
> ...
> # In context 2:
> a_person_has_died_in('France', 'Paris')
>
> This would issue 4 round-trips to the database - 2 selects and 2 updates.
>

The way you have coded it, it is actually 6 round trips -- there are 2 
selects per function -- one for the country, and a second for the city. But 
that's not how you would do it in web2py anyway. Instead, you would issue 
no selects and instead do it with just a single update -- so a total of 2 
round trips to the db (i.e., 2 updates) :

def a_child_was_born_in(countryName, cityName):
    query = (db.City.Name == cityName) & (db.City.Country.belongs(db.Country
.Name == countryName))
    db(query).update(Population=db.City.Population + 1)

So, in order to do the update, we do not first have to query the database 
to retrieve the record. This is actually an advantage over the ORM, which 
requires that you first retrieve the record before updating it. The ORM 
will issue two queries to get the record if lazy loading is used, or one if 
eager loading, a join, or a subquery is used.
 

> Now, lets say we want to optimize that, so we do a "Lazy" version of those 
> functions.
>

There's not much to optimize here. If you don't know ahead of time that you 
will be making two updates to the same record (which may possibly negate 
each other), I think the minimum number of db hits is two. You could 
retrieve the record twice, defer the first update, recognize that the 
second update cancels the first, and then make no update -- which is still 
2 hits (well, 1 hit if you cache the query). Or you could just make the 2 
updates (as above). In any case, I believe the ORM actually requires a 
minimum of 4 hits (see below), so web2py is still doing a lot better.
 

> Now, here is the same code, using an ORM:
>
> def a_child_was_born_in(countryName, cityName):
>     city = Country(Name=countryName).City(Name=cityName)
>     city.Population += 1
>
> def a_person_has_died_in(countryName, cityName):
>     city = Country(Name=countryName).City(Name=cityName)
>     city.Population -= 1
>

Assuming this is SQLA, I don't think that's quite the right syntax -- it 
appears you are creating object instances rather than issuing queries. I 
believe it should be something like this:

def a_child_was_born_in(countryName, cityName):
    city = session.query(City).join(Country)\
           .filter(Country.Name == countryName)\
           .filter(City.Name == cityName).first()
    city.Population += 1

The above does a join and therefore gets it down to a single query for the 
select. Otherwise, you could just query for the country, then access the 
"City" attribute, which would lazily issue a second query (though only when 
the first function is called).

The syntactic difference is small, but the semantic implication is profound.
>

Yes, but not quite in the way you think.
 

> The automatic cache-mechanism in the ORM will detect that we 
> are querying the same record, and so would not query the database in the 
> second function - just return the same object already in memory.
>

Again, assuming this is SQLA, that's not how it works. SQLA does not cache 
queries -- when you run a query, it doesn't know what record will be 
retrieved, so it doesn't know whether it already has the associated object 
in the session. Hence, it will re-run the query both times. (The exception 
to this is when you use .get() to fetch a record by primary key, which we 
are not doing here.)
 

> But an ORM can have an "Identity Mapper", that would make sure they the 
> same object would be returned,
> It would be bound to two different name-spaces, but it would be the same 
> object.
> Now we could implement a "Truely" lazy update. The increment that is done 
> in the first function, would be reflected in the second one, because the 
> same object would be returned,
>

Another problem here. Whenever you execute a new query, SQLA flushes the 
pending changes. So, when you run the query in the second function, it will 
first issue the update to the database from the first change. Once it has 
done that, it will ultimately also have to issue the update from the second 
function (though perhaps at some later time) in order to have the correct 
value in the database.

So, I believe we have a minimum of 4 database hits with the ORM (5 if you 
lazy load the cities when running the initial query).

To summarize:

   - The ORM doesn't do direct updates to the database, so it must first 
   select the records before updating them, unlike web2py, which can issue a 
   direct update.
   - The ORM doesn't cache queries, so it must re-run the query twice, even 
   though the record in question is already in the session after the first 
   query.
   - The ORM flushes pending changes before each query, so the first update 
   goes to the database before the second query is run, ultimately 
   necessitating both updates.

Note, I am not a SQLA expert, so I may be mistaken about something above, 
but this is how I understand it.

These are the kinds of benefits an ORM may have.
>

One thing to keep in mind. Although the web2py DAL does not implement an 
ORM design pattern, it is still object-oriented. We have Table, Field, Set, 
Rows, Row, etc. These objects have various methods and attributes that 
enable quite a bit of functionality and flexibility. I think most of the 
behaviors you like so much in the SQLA ORM could in principle be 
implemented within the DAL's objects (e.g., optional eager loading of 
records within relationship attributes, or persistence upon first access 
when lazily loaded).

Anthony

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to web2py+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

[web2py] Re: ORM (?) : A Revisit, NOT a Rebuttal

Reply via email to