Re: [sqlalchemy] Difficulties with parallel data insertion into the same table

Michael Bayer Thu, 22 Jan 2015 11:01:30 -0800


SF Markus Elfring <elfr...@users.sourceforge.net> wrote:


> Hello,
> 
> I try to write some data from a source code analysis
> which is performed by a few processor cores simultaneously
> into a single table.
> Now I stumble on a message like the following again.
> 
> "…
> sqlalchemy.exc.IntegrityError: (IntegrityError) duplicate key value violates 
> unique constraint "pg_type_typname_nsp_index"
> DETAIL:  Key (typname, typnamespace)=(positions_parallel1_line_seq, 2200) 
> already exists.
> '\nCREATE TABLE positions_parallel1 (\n\tfunction […], pattern)\n)\n\n' {}
> …"
> 
> 
> The following software components were involved for this
> application test on my openSUSE Tumbleweed system.
> 
> 1. SQLAlchemy 0.9.8-78.1
> 2. Psycopg 2.5.2-3.1
> 3. PostgreSQL 9.3.5-3.1
> 4. Python 2.7.9-2.1
> 5. Coccinelle spatch 1.0.0-rc23
> 6. make 4.1-2.2
> 
> 
> I have searched a bit on the internet.

Normally, this is an integrity constraint error which is raised by your 
database, which occurs when a process attempts to INSERT or UPDATE a row to 
include a key that already exists.   When performing an INSERT or UPDATE you 
need to ensure that the row will not duplicate a uniquely-constraint value that 
is already present in the table.

However in this case, this appears to be a side effect of attempting to issue 
CREATE TABLE statements in parallel, and in particular attempting to create a 
table of the same name twice in two different processes.   PG developers 
describe this expected behavior here: 
http://www.postgresql.org/message-id/ca+tgmozadyvtwbfp1fl2smzbihcwt4uprzrlnnx1nb30ku3...@mail.gmail.com.
  I’m not surprised that PG does not prioritize attempting to make this error 
more user friendly as this is a very odd and probably unnecessary use case.


> Are there any further
> software development challenges to consider for the parallel
> creation of database tables
> with the object-relational interface?

this has nothing to do with the ORM nor does it really have much to do with 
SQLAlchemy as a whole.   SQLAlchemy can be seen as a comprehensive system to 
automate the sending and receiving of messages along a database connection.   
You would have this issue in exactly the same way if you were emitting 
conflicting CREATE TABLE statements on the DBAPI cursor directly.

The concurrent creation of tables is an extremely unusual use case that I would 
not recommend, but especially when two different threads/processes are 
contending to create the same table, that suggests the system attempts to 
encode data within the database schema itself; that is, the existence of a 
table “XYZ” in fact represents the data “XYZ” being present.   There are some 
high volume websites that do things like this, like Reddit, but I think they’re 
nuts.   But certainly, for an application that isn’t handling millions of 
simultaneous users or hundreds of billions of rows, such an architecture is 
absolutely not called for.  Data should be stored in rows, not in the names of 
tables.

Otherwise, the PG developers in that thread suggest this use case can be 
facilitated using postgres advisory locks.  I have no experience with those, 
but see that thread for more details.










> 
> Regards,
> Markus
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> Visit this group at http://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Difficulties with parallel data insertion into the same table

Reply via email to