On Nov 30, 6:21 pm, imgrey <[EMAIL PROTECTED]> wrote:
>
> But how to determine if record is exists and update if it does or
> insert instead without executing select, which would be very slow ?
> I'm using nested begin_nested() here to avoid rollback of whole
> transaction.

you should select() for the record first, and if not present, insert()
it.  if youre concerned about lots of selects() you can implement a
simple cache in your application, such as a dictionary of ids that
already exist.  if theres concern over the cache growing too large,
you can use a least-recently-used type of cache which limits its size.

>
> I'll try, but now cannot get rid of this error, which happens every
> time:
>
nt, parameters, e)
> ProgrammingError: (ProgrammingError) current transaction is aborted,
> commands ignored until end of transaction block
>  'SELECT fs_file.id AS fs_file_id, fs_file.user_id AS fs_file_user_id,
> fs_file.path AS fs_file_path, fs_file.ls AS fs_file_ls \nFROM fs_file
> JOIN auth_user ON auth_user.id = fs_file.user_id \nWHERE auth_user.id
> = %(auth_user_id)s AND fs_file.path = %(fs_file_path)s ORDER BY
> fs_file.id \n LIMIT 1 OFFSET 0' {'fs_file_path': '/', 'auth_user_id':
> 1}
>
> the context :
> try:
>             session.execute(insert(f_table), {'user_id': theone.id,
>                 'path': relpath,
>                 'ls': ls})
>         except IntegrityError:
>             stuff =  session.query(...) #line 85

OK, i guess you are using the SAVEPOINT then since the postgres docs
seem to recommend that as a workaround for the transaction abort its
doing at the IntegrityError point...and the abort is bad for you
because you want to recover from it, meaning that you're essentially
using the IntegrityError as a method of checking for existing data.

I wouldn't write the application such that an IntegrityError is part
of the normal course of business.  I'd want to ensure that I don't
insert duplicate records in the first place by checking first, and if
performance of the selects is an issue there I'd look into some kind
of caching to reduce the overhead (heres a few LRU ideas, second one
seems more straightforward: 
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/498245
).  consider even loading the whole table into memory; if its only a
few thousand rows, memory is cheap.

Another thing to look at is to try to reduce or eliminate the prospect
of concurrent processes all looking for the same possibly nonexistent
record at the same time like that;  if theres some "central data
structure" of rows that many threads all need to modify frequently
(where they all continually modify data in overlapping rows), the
actual modifications to the structure can be serialized using a
producer/consumer model where a single event queue handles
modifications to the structure.  while this will produce a lot of
serialization of your threads, if its truly a really monolithic and
highly mutable structure then some kind of "serialization" is needed
in any case.  The producer/consumer is straightforward and not prone
to deadlocks, and might be faster than having all those threads
constantly compete to hold open transactions over the same set of
rows.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to