On Tuesday, March 17, 2015 at 12:03:05 PM UTC-7, LoveWeb2py wrote:
>
> Thank you for the advice everyone. Getting a lot of great input.
>
> Dave,
>
> I'm working with a large json file which I'm converting to a dictionary 
> and inserting the keys value. I want to use web2py's bulk_insert feature 
> but the data has duplicates so I must first remove all the duplicates from 
> the json, recreate the dictionary record, append it to a list, and then 
> bulk_insert from there. I'm using MySQL is there a preferred way to handle 
> bulk_inserts over web2py?
>
>
I'm not familiar enough with MySQL, so I'll let somebody else chime in.  
Aside from sqlite, my limited experience is with PostGres, and our class 
didn't get to bulk inserts, aside from initializing our "online store" with 
SQL statements.  I do rather small bulk inserts in Web2Py by using appadmin 
and csv files.

If you're on a Linux system, is pre-processing the 
duplicates-within-the-file using "sort | uniq" of any value in reducing the 
bulk?

/dps

 

> On Monday, March 16, 2015 at 7:58:15 PM UTC-4, Dave S wrote:
>>
>>
>>
>> On Monday, March 16, 2015 at 4:14:20 PM UTC-7, LoveWeb2py wrote:
>>>
>>> Thank you for the feedback everyone.
>>>
>>> The main reason I fetch them all first is to make sure I'm not inserting 
>>> duplicate records. We have a lot of files that have thousands of records 
>>> and sometimes they're duplicates. I hash a few columns from each record and 
>>> if the value is the same then I don't insert the record. If there is a more 
>>> efficient way to do this please let me know.
>>>
>>>
>> Are you doing bulk inserts from files generated elsewhere?  Merging two 
>> (or more) databases?
>>
>> If you are doing individual inserts, I'd get the maybe-new record in, do 
>> a select based on one or more fields in the record, and then do the column 
>> hash comparison on the results that were returned.
>>
>> For bulk inserts, I'd be inclined to do the import with the DB Engine's 
>> management tools, and maybe lean on having the DBE enforce "must be unique" 
>> rules.
>>
>> /dps
>>
>>  
>>
>>> On Monday, March 16, 2015 at 6:26:50 PM UTC-4, Niphlod wrote:
>>>>
>>>> I have 300m+ records too, but I don't need to fetch all of those in a 
>>>> single query. 
>>>> It's not a matter of indexes or anything else: the "system hang" you're 
>>>> incurring to is not the one that the database takes from the moment you 
>>>> send the query to the first row returned, but the time you take to 
>>>> "accumulate" 1m row into memory.
>>>>
>>>> On Monday, March 16, 2015 at 10:46:39 PM UTC+1, Dane Wright wrote:
>>>>>
>>>>> I have a table which currently contains 10m+ records (local government 
>>>>> spending in www.appgov.org). A native SQL count(*) will take 3-5 secs 
>>>>> but  reading all the records like this via the DAL will result in the 
>>>>> system hanging. I try not to read too many of these records in any one 
>>>>> online transaction by allowing only selections and calculating subtotals 
>>>>> offline. I also create MySQL indexes outside of Web2py to speed things up 
>>>>> where necessary. Of course I also can't ever say db.laspend.id>0 in 
>>>>> the Database Admin facility either!
>>>>>
>>>>>>
>>>>>>

-- 
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to web2py+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to