Niphlod,

pkeys and code a trigger (ON INSERT). Could you provide an example of this? 
Just want to make sure I understand it properly.


On Monday, March 16, 2015 at 8:20:50 PM UTC-4, Niphlod wrote:
>
> I still don't see the need of a dict-like something holding 10m hashes to 
> discern in some 10k lines which one to insert.......
> solutions:
> 1)
> if the files you're going to insert have less rows than the number of rows 
> in the table, revert the logic: fetch only table rows that could be 
> matching the files. instead of fetching + hashing 10m things, you hash 10k 
> of them
> 2)
> choose proper pkeys and code a trigger (ON INSERT). Let the backend do the 
> work (guess what, they're engineered to manage data!), not a single python 
> process that fills the memory
> 3)
> store the hash in a separate column (or a separate table). Instead of 
> fetching n rows * number of columns values, and then hash it, you fetch the 
> hashed value already. 
>
> On Tuesday, March 17, 2015 at 12:14:20 AM UTC+1, LoveWeb2py wrote:
>>
>> Thank you for the feedback everyone.
>>
>> The main reason I fetch them all first is to make sure I'm not inserting 
>> duplicate records. We have a lot of files that have thousands of records 
>> and sometimes they're duplicates. I hash a few columns from each record and 
>> if the value is the same then I don't insert the record. If there is a more 
>> efficient way to do this please let me know.
>>
>>

-- 
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to web2py+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to