Niphlod, pkeys and code a trigger (ON INSERT). Could you provide an example of this? Just want to make sure I understand it properly.
On Monday, March 16, 2015 at 8:20:50 PM UTC-4, Niphlod wrote: > > I still don't see the need of a dict-like something holding 10m hashes to > discern in some 10k lines which one to insert....... > solutions: > 1) > if the files you're going to insert have less rows than the number of rows > in the table, revert the logic: fetch only table rows that could be > matching the files. instead of fetching + hashing 10m things, you hash 10k > of them > 2) > choose proper pkeys and code a trigger (ON INSERT). Let the backend do the > work (guess what, they're engineered to manage data!), not a single python > process that fills the memory > 3) > store the hash in a separate column (or a separate table). Instead of > fetching n rows * number of columns values, and then hash it, you fetch the > hashed value already. > > On Tuesday, March 17, 2015 at 12:14:20 AM UTC+1, LoveWeb2py wrote: >> >> Thank you for the feedback everyone. >> >> The main reason I fetch them all first is to make sure I'm not inserting >> duplicate records. We have a lot of files that have thousands of records >> and sometimes they're duplicates. I hash a few columns from each record and >> if the value is the same then I don't insert the record. If there is a more >> efficient way to do this please let me know. >> >> -- Resources: - http://web2py.com - http://web2py.com/book (Documentation) - http://github.com/web2py/web2py (Source code) - https://code.google.com/p/web2py/issues/list (Report Issues) --- You received this message because you are subscribed to the Google Groups "web2py-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to web2py+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.