SF Markus Elfring <elfr...@users.sourceforge.net> wrote:
> Hello, > > I try to write some data from a source code analysis > which is performed by a few processor cores simultaneously > into a single table. > Now I stumble on a message like the following again. > > "… > sqlalchemy.exc.IntegrityError: (IntegrityError) duplicate key value violates > unique constraint "pg_type_typname_nsp_index" > DETAIL: Key (typname, typnamespace)=(positions_parallel1_line_seq, 2200) > already exists. > '\nCREATE TABLE positions_parallel1 (\n\tfunction […], pattern)\n)\n\n' {} > …" > > > The following software components were involved for this > application test on my openSUSE Tumbleweed system. > > 1. SQLAlchemy 0.9.8-78.1 > 2. Psycopg 2.5.2-3.1 > 3. PostgreSQL 9.3.5-3.1 > 4. Python 2.7.9-2.1 > 5. Coccinelle spatch 1.0.0-rc23 > 6. make 4.1-2.2 > > > I have searched a bit on the internet. Normally, this is an integrity constraint error which is raised by your database, which occurs when a process attempts to INSERT or UPDATE a row to include a key that already exists. When performing an INSERT or UPDATE you need to ensure that the row will not duplicate a uniquely-constraint value that is already present in the table. However in this case, this appears to be a side effect of attempting to issue CREATE TABLE statements in parallel, and in particular attempting to create a table of the same name twice in two different processes. PG developers describe this expected behavior here: http://www.postgresql.org/message-id/ca+tgmozadyvtwbfp1fl2smzbihcwt4uprzrlnnx1nb30ku3...@mail.gmail.com. I’m not surprised that PG does not prioritize attempting to make this error more user friendly as this is a very odd and probably unnecessary use case. > Are there any further > software development challenges to consider for the parallel > creation of database tables > with the object-relational interface? this has nothing to do with the ORM nor does it really have much to do with SQLAlchemy as a whole. SQLAlchemy can be seen as a comprehensive system to automate the sending and receiving of messages along a database connection. You would have this issue in exactly the same way if you were emitting conflicting CREATE TABLE statements on the DBAPI cursor directly. The concurrent creation of tables is an extremely unusual use case that I would not recommend, but especially when two different threads/processes are contending to create the same table, that suggests the system attempts to encode data within the database schema itself; that is, the existence of a table “XYZ” in fact represents the data “XYZ” being present. There are some high volume websites that do things like this, like Reddit, but I think they’re nuts. But certainly, for an application that isn’t handling millions of simultaneous users or hundreds of billions of rows, such an architecture is absolutely not called for. Data should be stored in rows, not in the names of tables. Otherwise, the PG developers in that thread suggest this use case can be facilitated using postgres advisory locks. I have no experience with those, but see that thread for more details. > > Regards, > Markus > > -- > You received this message because you are subscribed to the Google Groups > "sqlalchemy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to sqlalchemy+unsubscr...@googlegroups.com. > To post to this group, send email to sqlalchemy@googlegroups.com. > Visit this group at http://groups.google.com/group/sqlalchemy. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.