Re: [sqlalchemy] convert subset to dictionary

Justvuur Tue, 07 Jul 2020 09:47:10 -0700

Nope, I'm running it on a Xubuntu VM. The VM only has 4 gigs of RAM and 4 
cores assigned to it.
Nope, the students table has created_on, updated_on, id, 3 integer columns 
and a boolean column.
The subjects table has student_id, code and name columns. The code column 
is character varying(100) and the name column is character varying.


Yes, so its not over the net, its running locally.
That's a good idea! Maybe I'll try that!
Thanks for your help on this, I really appreciate it.



On Tuesday, 7 July 2020 15:56:04 UTC+2, Simon King wrote:
>
> 2:25 minutes, even for 1,267,000 rows, sounds like quite a long time 
> to me. Are you physically a long way from the database, querying 
> across the internet? Do the tables contain many columns, or perhaps 
> large BLOB or TEXT fields? 
>
> If there are large fields that you don't need very often, you could 
> consider marking them as "deferred", then they wouldn't be loaded by 
> default: 
>
>
> https://docs.sqlalchemy.org/en/13/orm/loading_columns.html#deferred-column-loading
>  
>
> I suppose if your database server was underpowered, loading over a 
> million rows might exhaust its memory and cause it to start swapping, 
> which would hurt performance, in which case querying for smaller 
> amounts of data might be better. 
>
> Simon 
>
> On Tue, Jul 7, 2020 at 12:53 PM Justvuur <just...@gmail.com <javascript:>> 
> wrote: 
> > 
> > I'm currently testing with 7000 students with 181 subjects. 
> > I first went over to the DB to run the query directly on there. I did an 
> innerjoin between the two tables. 
> > It took about 2:25 minutes. I then selected specific columns from the 
> two tables and the time dropped to about 2 minutes or so. 
> > I then tried a "select subjects where student ID in (...)" query and 
> that executed a lot quicker than the others. 
> > 
> > So then on the python side of things, I first fetched the students and 
> only selected id, created_on and updated_on columns using the with_entities 
> option. 
> > That seemed to take less than 1 second. I then, for each student, 
> executed a DB query to fetch the subjects using the with_entities option. 
> > For some reason, this seems to work faster than using the 
>  eager-loading/students.subjects/attribute_mapped_collection. They seem to 
> take 2 minutes longer than the above (what should be inefficient) approach. 
> > 
> > On Tuesday, 7 July 2020 at 12:50:13 UTC+2 Simon King wrote: 
> >> 
> >> How long is it taking? You mentioned 2000 students with 100 subjects 
> >> each, so there are something like 200,000 rows in the Subjects table, 
> >> and you need to load all of it. I wouldn't expect that to take longer 
> >> than a couple of seconds though. 
> >> 
> >> Simon 
> >> 
> >> On Mon, Jul 6, 2020 at 7:34 PM Justvuur <just...@gmail.com> wrote: 
> >> > 
> >> > I added the eager-loading but it seems to slow down the SQL query 
> quite a lot. 
> >> > It's as if now, the SQL query is taking longer but the generating of 
> the file is quicker hehe... I guess now the queries are being fired before 
> to populate subjects. 
> >> > It's still taking relatively the same amount of time though. 
> >> > 
> >> > 
> >> > On Friday, 3 July 2020 17:07:43 UTC+2, Simon King wrote: 
> >> >> 
> >> >> Are you eager-loading the "student.subjects" relationship? If not, 
> >> >> that will give you the biggest performance increase. Without that, 
> you 
> >> >> will be issuing a separate DB query for each of the students, to 
> load 
> >> >> that student's subjects. Eager-loading allows you to preload the 
> >> >> subjects for every student in a single query: 
> >> >> 
> >> >> 
> https://docs.sqlalchemy.org/en/13/orm/loading_relationships.html#joined-eager-loading
>  
> >> >> 
> >> >> Simon 
> >> >> 
> >> >> On Fri, Jul 3, 2020 at 1:36 PM Justvuur <just...@gmail.com> wrote: 
> >> >> > 
> >> >> > Hi Simon, thanks for the help! I've never used that before, it's 
> quite handy. 
> >> >> > 
> >> >> > I'm looping through all the students and printing them and their 
> subject details to a CSV file. 
> >> >> > What makes things a tad complicated is the subjects must appear in 
> a specific order. 
> >> >> > There is a table that has the subject code and order number 
> (ordered_subjects used below is the resultset from it). 
> >> >> > I printed out the timing and found the problem to be with a nested 
> for loop. 
> >> >> > 
> >> >> > I was hoping to reduce that process time by using a map that 
> automatically gets populated instead of having to create it on the fly. 
> >> >> > 
> >> >> > Before - subjects_collection "attribute_mapped_collection": 
> >> >> > 
> ********************************************************************************************
>  
>
> >> >> > for row in students: 
> >> >> > row_no += 1 
> >> >> > 
> >> >> > for subject in row.subjects: 
> >> >> > student_subjects[subject.code] = subject.value 
> >> >> > 
> >> >> > csv_row = [row_no] 
> >> >> > csv_row += [student_subjects.get(x.code, '') for x in 
> ordered_subjects] 
> >> >> > csv_row += [row.created_on, row.updated_on] 
> >> >> > 
> >> >> > writer.writerow([x.encode('utf-8') if type(x) == unicode else x 
> for x in csv_row]) 
> >> >> > 
> >> >> > 
> >> >> > After adding the subjects_collection 
> "attribute_mapped_collection", I unfortunately did not see a change in 
> performance. 
> >> >> > 
> >> >> > After - subjects_collection "attribute_mapped_collection": 
> >> >> > 
> ********************************************************************************************
>  
>
> >> >> > for row in students: 
> >> >> > row_no += 1 
> >> >> > csv_row = [row_no] 
> >> >> > csv_row += [row.subjects_collection.get(x.code, '').value for x in 
> ordered_subjects] 
> >> >> > csv_row += [row.created_on, row.updated_on] 
> >> >> > 
> >> >> > writer.writerow([x.encode('utf-8') if type(x) == unicode else x 
> for x in csv_row]) 
> >> >> > 
> >> >> > 
> >> >> > class Subject(db.Model): 
> >> >> > __tablename__ = 'subjects' 
> >> >> > 
> >> >> > student_id = db.Column(db.Integer, db.ForeignKey('students.id'), 
> primary_key=True) 
> >> >> > 
> >> >> > code = db.Column(db.String(50), primary_key=True) 
> >> >> > 
> >> >> > value= db.Column(db.String) 
> >> >> > 
> >> >> > def __init__(self, code , value): 
> >> >> > self.code = code 
> >> >> > self.value = value 
> >> >> > 
> >> >> > 
> >> >> > class Student(ResourceMixin, db.Model): 
> >> >> > __tablename__ = 'students' 
> >> >> > 
> >> >> > subjects= db.relationship('Subject', backref='student') 
> >> >> > 
> >> >> > id = db.Column(db.Integer, primary_key=True) 
> >> >> > 
> >> >> > subjects_collection = relationship("Subject", 
> collection_class=attribute_mapped_collection('code')) 
> >> >> > 
> >> >> > Can you see a way I can optimize this? Any ideas? 
> >> >> > 
> >> >> > 
> >> >> > On Friday, 3 July 2020 12:31:03 UTC+2, Simon King wrote: 
> >> >> >> 
> >> >> >> Are you trying to optimise the database access (ie. minimize the 
> >> >> >> number of queries), or provide a nice dictionary-style API for 
> your 
> >> >> >> Student objects? What do you mean when you say that looping over 
> >> >> >> student.subjects is quite heavy? 
> >> >> >> 
> >> >> >> An association proxy can be used to get dict-style access to a 
> relationship: 
> >> >> >> 
> >> >> >> 
> https://docs.sqlalchemy.org/en/13/orm/extensions/associationproxy.html#proxying-to-dictionary-based-collections
>  
> >> >> >> 
> >> >> >> There are also a couple of examples in the SQLAlchemy docs that 
> >> >> >> provide a dictionary-style API: 
> >> >> >> 
> >> >> >> 
> https://docs.sqlalchemy.org/en/13/orm/examples.html#module-examples.dynamic_dict
>  
> >> >> >> 
> >> >> >> 
> https://docs.sqlalchemy.org/en/13/orm/examples.html#module-examples.vertical 
> >> >> >> 
> >> >> >> Hope that helps, 
> >> >> >> 
> >> >> >> Simon 
> >> >> >> 
> >> >> >> On Thu, Jul 2, 2020 at 8:46 PM Justvuur <just...@gmail.com> 
> wrote: 
> >> >> >> > 
> >> >> >> > Hi there, 
> >> >> >> > 
> >> >> >> > I'm struggling to find an efficient way to get a two columned 
> subset into dictionary form. 
> >> >> >> > 
> >> >> >> > I have an entity that has a subset of data. The subset is 
> linked to the entity via Id. The order of the subset of data is defined in 
> another table. 
> >> >> >> > 
> >> >> >> > Example: 
> >> >> >> > Student - Id, firstname, lastname 
> >> >> >> > Subjects - StudentId, SubjectCode, SubjectName 
> >> >> >> > 
> >> >> >> > At the moment I'm looping through the SqlAlchemy result of 
> "student.subjects" in python and creating a dictionary from that. It's 
> quite heavy, especially when there are 2000+ students with a potential of 
> 100+ subjects each. 
> >> >> >> > 
> >> >> >> > For each student, how do I get the subjects as a dictionary for 
> a student where the key is the SubjectCode and the value is the 
> SubjectName? 
> >> >> >> > Better yet, how can I get a result set: Id, firstname, lastname 
> SubjectCode x, SubjectCode y, etc etc (where the SubjectName becomes the 
> value and the SubjectCode becomes the column)? 
> >> >> >> > 
> >> >> >> > Regards, 
> >> >> >> > Justin 
> >> >> >> > 
> >> >> >> > -- 
> >> >> >> > SQLAlchemy - 
> >> >> >> > The Python SQL Toolkit and Object Relational Mapper 
> >> >> >> > 
> >> >> >> > http://www.sqlalchemy.org/ 
> >> >> >> > 
> >> >> >> > To post example code, please provide an MCVE: Minimal, 
> Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve 
> for a full description. 
> >> >> >> > --- 
> >> >> >> > You received this message because you are subscribed to the 
> Google Groups "sqlalchemy" group. 
> >> >> >> > To unsubscribe from this group and stop receiving emails from 
> it, send an email to sqlal...@googlegroups.com. 
> >> >> >> > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/sqlalchemy/57b74c9a-a6e5-494b-b468-d0bdcbcce60co%40googlegroups.com.
>  
>
> >> >> > 
> >> >> > -- 
> >> >> > SQLAlchemy - 
> >> >> > The Python SQL Toolkit and Object Relational Mapper 
> >> >> > 
> >> >> > http://www.sqlalchemy.org/ 
> >> >> > 
> >> >> > To post example code, please provide an MCVE: Minimal, Complete, 
> and Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description. 
> >> >> > --- 
> >> >> > You received this message because you are subscribed to the Google 
> Groups "sqlalchemy" group. 
> >> >> > To unsubscribe from this group and stop receiving emails from it, 
> send an email to sqlal...@googlegroups.com. 
> >> >> > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/sqlalchemy/67439e94-2666-402a-a642-7f4e7de8fbffo%40googlegroups.com.
>  
>
> >> > 
> >> > -- 
> >> > SQLAlchemy - 
> >> > The Python SQL Toolkit and Object Relational Mapper 
> >> > 
> >> > http://www.sqlalchemy.org/ 
> >> > 
> >> > To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description. 
> >> > --- 
> >> > You received this message because you are subscribed to the Google 
> Groups "sqlalchemy" group. 
> >> > To unsubscribe from this group and stop receiving emails from it, 
> send an email to sqlalchemy+...@googlegroups.com. 
> >> > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/sqlalchemy/9c86dfc4-3666-454e-8b5e-2b9e2a32166ao%40googlegroups.com.
>  
>
> > 
> > -- 
> > SQLAlchemy - 
> > The Python SQL Toolkit and Object Relational Mapper 
> > 
> > http://www.sqlalchemy.org/ 
> > 
> > To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description. 
> > --- 
> > You received this message because you are subscribed to the Google 
> Groups "sqlalchemy" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to sqlal...@googlegroups.com <javascript:>. 
> > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/sqlalchemy/29b184cc-9f5c-4fe2-a69d-4fd12651c88cn%40googlegroups.com.
>  
>
>

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/717c10af-95b3-4ec9-a139-e6f8dfd2923ao%40googlegroups.com.

Re: [sqlalchemy] convert subset to dictionary

Reply via email to