[sqlalchemy] Re: Questions about "session"

Az Tue, 08 Jun 2010 20:55:10 -0700

> memo = {}
> copied_students = copy.deepcopy(students, memo)
> copied_supervisors = copy.deepcopy(supervisors, memo)
> copied_projects = copy.deepcopy(projects, memo)
>
> After you do this, memo will contain a record of all copied objects. You
> should examine memo.values() to see if it is copying more than you
> expected. If it did copy just what you expected, then my worries were
> unfounded.


I'll let you know how that turns out soonish. While I know it's my
data, is there anything you can suggest from your experience that you
consider to be "unexpected"?

> Yes, session_id/trial_id and stud_id can repeat, and you can still group
> things together by run_id. Alternatively, you could add an
> autoincrementing primary key to SimAllocation, but I believe it is
> redundant since the combination (run_id, session_id/trial_id, stud_id)
> should be unique anyway. run_id can definitely be a datetime, but I'm
> not sure how well sqlite (it sounds like you're using sqlite) supports
> datetimes in queries 
> (seehttp://www.sqlalchemy.org/docs/reference/dialects/sqlite.html#date-an...).
> A GUID (or UUID) is just a 128-bit value (usually random); the benefit
> here is you can generate it on the client side and be confident that it
> will be unique on the server (to avoid duplicate primary key errors).
> Using datetimes or database sequences would also work. You can
> definitely pass the run_id as an argument to monteCarloBasic, or to each
> object's create_db_record method.

Also I get why you mention three keys: run_id/guid/uuid and session_id/
trial_id alone won't suffice... but since we know there are unique
students (within each single allocation run, etc. So I can get rid of
the "ident" then? It serves no other purpose really if I can get a key
combo that's unique and works for.

I am indeed using SQLite3. I take it take my physical database has to
something like:
engine = create_engine('sqlite:///Database/spalloc.sqlite3',
echo=False)?

Also I take it I should generate the UUID (http://docs.python.org/
library/uuid.html) when I call the MonteCarloBasic function right?
Since it should be the same for each "call", I take I'll have to
generate it before the loop. Additionally, how would I actually query
a 128-bit value? Say I have a bit in my GUI where the supervisor can
put in a UUID to pull the data off the Database. How would he actually
know which UUID to put in? Any ideas?

Also once I've got my stuff in the physical database and after my
program is done, I'd call session.close() right? How do I access the
DB data then? Would I have to write some separate functions that allow
me to access the data without using (for example)
'session.query(Student)...`? This way the user (i.e. my supervisor)
won't have to keep running the readData, monteCarloBasic, etc
functions just to access the DB (that would be poor indeed!).


On Jun 8, 3:07 pm, Conor <conor.edward.da...@gmail.com> wrote:
> On 06/07/2010 07:27 PM, Az wrote:
>
>
>
> >> By default, deepcopy will make one copy of everything in the object
> >> graph reachable by the object you feed it. The scary part is that,
> >> unless you also pass in a /memo/ argument to each call to deepcopy, it
> >> will copy the entire graph /every single call/. So if you deepcopy the
> >> students dictionary and then deepcopy the projects dictionary, each
> >> student's allocated_proj attribute will not match any instance in the
> >> projects dictionary. This is why a use-case-specific copy function is
> >> recommended: it is a lot easier to predict which objects will get copied
> >> and which objects will be shared.
>
> > Shouldn't it match? I mean the student can only get allocated a
> > project if it exists in the projects dictionary... or is that not the
> > point?
>
> > By use-case-specific, you mean I'll have to redefine deepcopy inside
> > each class like this: def __deepcopy__(self): something, something?
>
> > The only two places where this is an issue is for Supervisor's
> > "offered_proj" attribute (a set) where, naturally, each project is an
> > object and in Project where "proj_sup" is naturally a supervisor
> > object :D
>
> > The usefulness of my data structures comes back to bite me now...
>
> In theory, the following will work, ignoring ORM deepcopy issues
> discussed at the beginning of this thread:
>
> memo = {}
> copied_students = copy.deepcopy(students, memo)
> copied_supervisors = copy.deepcopy(supervisors, memo)
> copied_projects = copy.deepcopy(projects, memo)
>
> After you do this, memo will contain a record of all copied objects. You
> should examine memo.values() to see if it is copying more than you
> expected. If it did copy just what you expected, then my worries were
> unfounded.
>
> By use-case-specific, I meant define your own copy_objects function that
> explicitly specifies what is copied:
>
> def copy_objects(students, supervisors, projects):
>     memo = {}
>     copied_students = {}
>     copied_supervisors = {}
>     copied_projects = {}
>
>     def copy_student(student):
>         student_id = id(student)
>         if student_id in memo:
>             return memo[student_id]
>
>         copied_student = Student()
>         memo[student_id] = copied_student
>         copied_student.attr1 = student.attr1
>         [copy rest of student's attributes]
>         if you_need_to_copy_students_project:
>             copied_student.allocated_proj = 
> copy_project(student.allocated_proj)
>         return copied_student
>
>     [define copy_supervisor]
>     [define copy_project]
>
>     copied_students = dict((key, copy_student(student)) for (key, student) in 
> students.iteritems())
>     copied_supervisors = dict((key, copy_supervisor(supervisor)) for (key, 
> supervisor) in supervisors.iteritems())
>     copied_projects = dict((key, copy_project(project)) for (key, project) in 
> projects.iteritems())
>     return (copied_students, copied_supervisors, copied_projects)
>
> As you can see, this makes it clear which objects are copied and which
> are shared. In retrospect, I think I assumed you didn't want to make
> copies of your supervisors or projects when I recommended the
> use-case-specific approach, which kind of violates the spirit of
> deepcopy. Oh well, my bad.
>
> >> class Student(object):
> >>     [existing definitions]
>
> >>     def create_db_record(self):
> >>         result = StudentDBRecord()
> >>         result.ee_id = self.ee_id
> >>         [copy over other attributes]
> >>         return result
>
> >> class StudentDBRecord(object):
> >>     pass
>
> > The create_db_record function... does it have to called explicitly
> > somewhere or does it automatically run?
>
> You have to call it explicitly, e.g.:
>
> for unmapped_student in unmapped_students:
>     mapped_student = unmapped_student.create_db_record()
>     # I assume you want "find or create" behavior,
>     # so use session.merge instead of session.add.
>     mapped_student = session.merge(mapped_student)
>
>
>
> > [...]
>
> >> I think a primary key of
> >> (run_id, session_id/trial_id, stud_id) would be good
>
> > If I make them all primary keys I get a composite key right? Within an
> > entire M-C simulation the stud_id's would repeat in groups -- so if
> > there are 100 simulations, each stud_id appears 100 times in that
> > commit.
>
> > Run_id is a fantastic idea! I'd probably have it be the date and time?
> > Given that the simulation takes a while to run... the time will have
> > changed sufficiently for uniqueness. However, then querying becomes a
> > pain because of whatever format the date and time data will be in...
> > so in that case, what is a GUID and is that something we could give to
> > the Monte-Carlo ourselves before the run as some sort of argument? It
> > would be the same for an entire run but different from run to run (so
> > not unique from row to row, but unique from one run set to the other).
> > Any thoughts on this?
>
> Yes, session_id/trial_id and stud_id can repeat, and you can still group
> things together by run_id. Alternatively, you could add an
> autoincrementing primary key to SimAllocation, but I believe it is
> redundant since the combination (run_id, session_id/trial_id, stud_id)
> should be unique anyway. run_id can definitely be a datetime, but I'm
> not sure how well sqlite (it sounds like you're using sqlite) supports
> datetimes in queries 
> (seehttp://www.sqlalchemy.org/docs/reference/dialects/sqlite.html#date-an...).
> A GUID (or UUID) is just a 128-bit value (usually random); the benefit
> here is you can generate it on the client side and be confident that it
> will be unique on the server (to avoid duplicate primary key errors).
> Using datetimes or database sequences would also work. You can
> definitely pass the run_id as an argument to monteCarloBasic, or to each
> object's create_db_record method.
>
> -Conor

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

[sqlalchemy] Re: Questions about "session"

Reply via email to