[web2py] Re: Google datastore concepts - denormalising and list:reference fields

2011-08-21 Thread fishwebby
Ah ok, that makes sense. Is it really that easy to come up with a web
request that takes more than thirty seconds? I'll have to watch out
for that...

I'm thinking now that I'm going about this the wrong way as regards
the design for the datastore - from what I've read, contains is an
efficient way to query based on a list:reference field, so I'm
considering having a field on the course model for people, so I can
say something like

courses = db(db.course.people.contains(person.id)).select()

This would make it more complicated in the controller when I'm
updating and inserting rows, but the queries would be efficient. As
far as sorting goes, if there is a risk of the request timing out I
could even do it in the browser (using a YUI datatable or something to
display the data).

Thank you for your help!


On Aug 20, 8:15 pm, howesc how...@umich.edu wrote:
 right, so when i need to do a belongs query for more than 30 items:

     items=db(db.my_table.end_user.belongs(id_list[0:30])).select()
     for i in range(30, len(id_list)+30, 30):
         items = items  \
                   db(db.my_table.end_user.belongs(id_list[i-30:i])).select()

 i'd say that web2py should just do that for you, but on GAE you want to be
 aware that this is happening because it can easily eat up all 30 seconds
 that you have to service a web request, so i like that i have to do it
 explicitly (i only try to do queries like this in taskqueue tasks where i
 know i can process more data per request)

 for one to many relationship you can use the list:reference property that
 maps to the GAE list property.  then when you get a row you have a property
 that has all the IDs of the records that 'belong' to it, and you can query
 for them (perhaps using the above if there is more than 30 in the list)

 hope that helps a little!

 christian

 while it takes more queries, there is nothing stopping you from:

   item_refs = db(db.join_table.sidea=ref_a).select()
   items = db(db.sideb_table.id.belongs(item_refs.as_dict().keys()).select()

 it feels yucky by comparison to SQL but it works.  i do try and avoid it
 because it is inefficient.


[web2py] Re: Google datastore concepts - denormalising and list:reference fields

2011-08-21 Thread howesc
the 30 second timeout can be hit if you are doing something that triggers a 
few hundred sub-queries.

contains is very nice.  yes it's then a pain to add/remove items from the 
list, but GAE is all about query efficiency at the expense of write 
efficiency.

cfh


[web2py] Re: Google datastore concepts - denormalising and list:reference fields

2011-08-20 Thread howesc
right, so when i need to do a belongs query for more than 30 items:

items=db(db.my_table.end_user.belongs(id_list[0:30])).select()
for i in range(30, len(id_list)+30, 30):
items = items  \
  db(db.my_table.end_user.belongs(id_list[i-30:i])).select()

i'd say that web2py should just do that for you, but on GAE you want to be 
aware that this is happening because it can easily eat up all 30 seconds 
that you have to service a web request, so i like that i have to do it 
explicitly (i only try to do queries like this in taskqueue tasks where i 
know i can process more data per request)

for one to many relationship you can use the list:reference property that 
maps to the GAE list property.  then when you get a row you have a property 
that has all the IDs of the records that 'belong' to it, and you can query 
for them (perhaps using the above if there is more than 30 in the list)

hope that helps a little!

christian

while it takes more queries, there is nothing stopping you from:

  item_refs = db(db.join_table.sidea=ref_a).select()
  items = db(db.sideb_table.id.belongs(item_refs.as_dict().keys()).select()

it feels yucky by comparison to SQL but it works.  i do try and avoid it 
because it is inefficient.


[web2py] Re: Google datastore concepts - denormalising and list:reference fields

2011-08-19 Thread howesc
some thoughts.

 - belongs can only have 30 items in it per execution.  GAE limitation
 - i do many-to-many on GAE by using join tables, i just have to do more 
queries.  it does mean that you can't return 100 courses plus all the 
students in the course in a single query (if say each course had on average 
100 students), but for a single course with a few hundred students this 
would be ok.  so in this case query/view design can be used to help mitigate 
the issues.
 - i assert that though it will not be as pretty or SQL-like as you want it 
to be, you can do it on GAE.
 - i agree that it could be worth the design headaches upfront to not have 
to have a large IT staff to support the scaling of your servers.

good luck!

christian



[web2py] Re: Google datastore concepts - denormalising and list:reference fields

2011-08-19 Thread fishwebby
That's good to know, thanks - so for more than 30 items, something
like this

people = db(db.person.id.belongs(course.students)).select()

isn't possible?

Would it be possible for you to give me an example of using join
tables?

(I think my brain is still in SQL mode and I can't work out how I'd do
it in the Google datastore)

Many thanks
Dave


On Aug 19, 7:43 pm, howesc how...@umich.edu wrote:
 some thoughts.

  - belongs can only have 30 items in it per execution.  GAE limitation
  - i do many-to-many on GAE by using join tables, i just have to do more
 queries.  it does mean that you can't return 100 courses plus all the
 students in the course in a single query (if say each course had on average
 100 students), but for a single course with a few hundred students this
 would be ok.  so in this case query/view design can be used to help mitigate
 the issues.
  - i assert that though it will not be as pretty or SQL-like as you want it
 to be, you can do it on GAE.
  - i agree that it could be worth the design headaches upfront to not have
 to have a large IT staff to support the scaling of your servers.

 good luck!

 christian


[web2py] Re: Google datastore concepts - denormalising and list:reference fields

2011-08-17 Thread fishwebby
(posting Massimo's reply)

 e.g. 1, Physics 101, 4|5|6
 Is that how data has to be modelled using the GAE datastore?

No. On GAE 'list:reference' maps into a ListProperty of integers

 If so, is
 it possible to do the following:

 - paginate the denormalised data, for example show a paginated list of
 students on a course?

You cannot. ListProperty does not allow this.

 - have hundreds of IDs in the list:reference field (a course could
 have hundreds of students) - or is there an upper limit?

If there is it is a GAE limitation. I am not aware of it.


 Also, how are updates to these fields handled? For example, remove one
 student from a course that has hundreds enrolled? Are there mechanisms
 in web2py to handle this?

course = db.course(name='Phsysics 101')
course.students.remove(4) # remove student with id==4
course.update_record()

 Also, it would seem that the two list:reference fields would need to
 be kept synchronised somehow - if a student is deleted, any course
 that they were enrolled on needs to be updated too, which (with my
 mind still in relational database mode) would seem to be very
 inefficient.

I would use a many-2-many with a link table and I would not use GAE
for this.


[web2py] Re: Google datastore concepts - denormalising and list:reference fields

2011-08-17 Thread fishwebby
So it sounds like it's possible to do what I want, although if I do
want to do many to many perhaps GAE isn't the best fit. I'm going to
persevere and see if I can get it working, as GAE is very attractive
as a hosting option where I don't have to worry at all about scaling,
server maintenance or any of the usual hosting hassles.

In case it's useful to anyone reading this thread, this video (which
I'm watching at the moment) 
http://www.google.com/events/io/2009/sessions/BuildingScalableComplexApps.html

has a great explanation of the datastore in GAE. It's from 2009 so it
may have changed a bit but the concepts are clearly explained by the
presenter.

Best wishes
Dave


On Aug 17, 1:40 pm, fishwebby pastelva...@gmail.com wrote:
 (posting Massimo's reply)

  e.g. 1, Physics 101, 4|5|6
  Is that how data has to be modelled using the GAE datastore?

 No. On GAE 'list:reference' maps into a ListProperty of integers

  If so, is
  it possible to do the following:

  - paginate the denormalised data, for example show a paginated list of
  students on a course?

 You cannot. ListProperty does not allow this.

  - have hundreds of IDs in the list:reference field (a course could
  have hundreds of students) - or is there an upper limit?

 If there is it is a GAE limitation. I am not aware of it.



  Also, how are updates to these fields handled? For example, remove one
  student from a course that has hundreds enrolled? Are there mechanisms
  in web2py to handle this?

 course = db.course(name='Phsysics 101')
 course.students.remove(4) # remove student with id==4
 course.update_record()

  Also, it would seem that the two list:reference fields would need to
  be kept synchronised somehow - if a student is deleted, any course
  that they were enrolled on needs to be updated too, which (with my
  mind still in relational database mode) would seem to be very
  inefficient.

 I would use a many-2-many with a link table and I would not use GAE
 for this.


[web2py] Re: Google datastore concepts - denormalising and list:reference fields

2011-08-17 Thread fishwebby
  - paginate the denormalised data, for example show a paginated list of
  students on a course?

 You cannot. ListProperty does not allow this.

How about something like this?

limitby = (0, 10)
students = db(db.student.id.belongs(course.students)).select(limitby =
limitby)

(where course.students is potentially quite large, but would that
cause a problem?)