I'm assuming this is data from a flat file or XML or something.
1. Read the rows in chunks. say 1000 at a time.
2. organize the system by which you will regenerate the primary key from
a given row. For each chunk, run this method on each row and associate
the primary key with each row in the chunk. This can be done all in
memory because we are limiting the size of the dataset.
3. Organize the chunk of data into a hashtable (e.g. Python dictionary),
keyed the primary key value that we've generated for each data row.
4. Construct a list of those primary key values. Organize a SQL
statement along the lines of: "SELECT primary_key FROM table WHERE
primary_key IN (pk1, pk2, pk3, pk4, ...)", for the full 1000 elements of
the chunk. SQLAlchemy ORM or SQLAlchemy Core can be used for this.
5. Read the result from the query, and as it is read, split the chunk
into two sets - the INSERT set, and the UPDATE set. As each element is
read from the SQL result set, look up that record in the dictionary you
made in step 3. That record goes into the UPDATE set, because its
primary key already exists.
6. All records that have not been located go into the INSERT set; these
primary keys do not exist.
7. If you are using ORM mappings, use ORM bulk_insert_mappings and
bulk_update_mappings separately on these sets to establish the data in
the database.
http://docs.sqlalchemy.org/en/latest/orm/session_api.html?highlight=bulk_insert#sqlalchemy.orm.session.Session.bulk_insert_mappings
http://docs.sqlalchemy.org/en/latest/orm/session_api.html?highlight=bulk_insert#sqlalchemy.orm.session.Session.bulk_update_mappings
On 05/24/2016 12:01 PM, Shankar Ganesh wrote:
Hi,
I need to know about what is the best way for doing update session data
(session data is incomplete - not having the entire data) . Say data
doesn't have primary key itself, How we can populate the primary key in
the SQL Alchemy session ?
Use Case:
* Have the arbitrary data currently, need to produce the primary key
column based on the unique key . Data set size may be around 10k rows .
We can use get_or_create method , but it looks only cover single
instance .
(http://stackoverflow.com/questions/2546207/does-sqlalchemy-have-an-equivalent-of-djangos-get-or-create)
Suggested ways:
We can put the entire data into temp table and update primary key and
other columns based on the unique data we have.
Is there are any ways using SQLAlchemy ORM ?
Thanks,
Shankar.
--
You received this message because you are subscribed to the Google
Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to sqlalchemy+unsubscr...@googlegroups.com
<mailto:sqlalchemy+unsubscr...@googlegroups.com>.
To post to this group, send email to sqlalchemy@googlegroups.com
<mailto:sqlalchemy@googlegroups.com>.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.