subject:"\[web2py\] Re\: Best way to insert 200k records\?"

Re: [web2py] Re: Best way to insert 200k records?

2012-08-20 Thread Mike Girard

Hi Andrew:

Thanks for the explanation.  I am going to assume that database neutrality 
is the main selling point of sticking with the DAL. 

I second your interest in a DAL method for genuine bulk loading, though I 
reckon that would be quite a beastly project. 

I appreciate everyone's input. This community is uniquely helpful. 

On Monday, August 20, 2012 1:49:46 AM UTC-4, Andrew wrote:
>
> HI Martin,
> It depends on the RDBMS.  Some are still one row at a time, which makes 
> insert and bulk_insert the same speed (it just makes the statement easier 
> to write.
>
> Hi MIke,
> One of the goals of the DAL is to make the api database neutral, allowing 
> you to switch between databases without changing your code (hopefully).  
> The thing you sacrifice if you use a bulk native loader (today) is that you 
> are locking yourself into a specific database platform.  The api for the 
> DAL doesn't have any platform specific features (I think), although some 
> features don't apply to all.
>
> What I was suggesting was a "Native Load" method which is defined within 
> each database adapter as they will all be different.  Just a thought 
> although unlike ANSI SQL, every separate platform probably has their own 
> syntax for their bulk loader.
>
> Reiterating,  I think bulk loading is a thing you would do as a batch / 
> scheduler process.  It's not what you'd use with your web end-user app.  If 
> that's the case, does it make sense for it to be a part of the DAL,  or as 
> perhaps separate (contrib) modules targetting specific platforms ?
>
>
> On Monday, August 20, 2012 1:36:14 PM UTC+12, Mike Girard wrote:
>>
>> "bulk insert is a way faster than regular insert when you have many rows"
>>
>> I think we need to clarify terms. By Massimo's own account in the web2py 
>> book, the DAL bulk insert is not faster than db.insert unless you are using 
>> the GAE. So are you talking about your db's native bulk methods or is the 
>> book wrong?
>>
>> Could someone just please answer what, if anything, is being sacrificed 
>> when you use your database's own bulk loading methods instead of using the 
>> DAL? Why the DAL religion about this?
>>
>> On Sunday, August 19, 2012 5:09:43 PM UTC-4, Martin.Mulone wrote:
>>>
>>> bulk insert is a way faster than regular insert when you have many rows. 
>>> If you are under mysql you can use load data infile, this is incredible 
>>> fast, but you need special privileges under mysql.
>>>
>>> 2012/8/19 Andrew 
>>>
 Is it possible that we add a "native bulk insert" function which is 
 coded up in each adapter.  Even bulk_insert is an odbc 1 row at a 
 time-slow 
 for big files.  I need to load huge files all the time and I am writing 
 custom modules to do this with a native loader.  Should this be a dal 
 option?  Worth noting that this type of operation is a batch, back end 
 thing,  I wouldn't do this for a end user web app.

 I would expect that each DBMS needs different info to start a bulk 
 load, so the interface may be tricky, or just pass a dict and let the 
 adapter work it out.
 What do you think?

 --




>>>
>>>
>>> -- 
>>>  http://www.tecnodoc.com.ar
>>>
>>>

--

Re: [web2py] Re: Best way to insert 200k records?

2012-08-19 Thread Andrew

HI Martin,
It depends on the RDBMS.  Some are still one row at a time, which makes 
insert and bulk_insert the same speed (it just makes the statement easier 
to write.

Hi MIke,
One of the goals of the DAL is to make the api database neutral, allowing 
you to switch between databases without changing your code (hopefully).  
The thing you sacrifice if you use a bulk native loader (today) is that you 
are locking yourself into a specific database platform.  The api for the 
DAL doesn't have any platform specific features (I think), although some 
features don't apply to all.

What I was suggesting was a "Native Load" method which is defined within 
each database adapter as they will all be different.  Just a thought 
although unlike ANSI SQL, every separate platform probably has their own 
syntax for their bulk loader.

Reiterating,  I think bulk loading is a thing you would do as a batch / 
scheduler process.  It's not what you'd use with your web end-user app.  If 
that's the case, does it make sense for it to be a part of the DAL,  or as 
perhaps separate (contrib) modules targetting specific platforms ?

On Monday, August 20, 2012 1:36:14 PM UTC+12, Mike Girard wrote:
>
> "bulk insert is a way faster than regular insert when you have many rows"
>
> I think we need to clarify terms. By Massimo's own account in the web2py 
> book, the DAL bulk insert is not faster than db.insert unless you are using 
> the GAE. So are you talking about your db's native bulk methods or is the 
> book wrong?
>
> Could someone just please answer what, if anything, is being sacrificed 
> when you use your database's own bulk loading methods instead of using the 
> DAL? Why the DAL religion about this?
>
> On Sunday, August 19, 2012 5:09:43 PM UTC-4, Martin.Mulone wrote:
>>
>> bulk insert is a way faster than regular insert when you have many rows. 
>> If you are under mysql you can use load data infile, this is incredible 
>> fast, but you need special privileges under mysql.
>>
>> 2012/8/19 Andrew 
>>
>>> Is it possible that we add a "native bulk insert" function which is 
>>> coded up in each adapter.  Even bulk_insert is an odbc 1 row at a time-slow 
>>> for big files.  I need to load huge files all the time and I am writing 
>>> custom modules to do this with a native loader.  Should this be a dal 
>>> option?  Worth noting that this type of operation is a batch, back end 
>>> thing,  I wouldn't do this for a end user web app.
>>>
>>> I would expect that each DBMS needs different info to start a bulk load, 
>>> so the interface may be tricky, or just pass a dict and let the adapter 
>>> work it out.
>>> What do you think?
>>>
>>> --
>>>
>>>
>>>
>>>
>>
>>
>> -- 
>>  http://www.tecnodoc.com.ar
>>
>>

--

Re: [web2py] Re: Best way to insert 200k records?

2012-08-19 Thread Mike Girard

"bulk insert is a way faster than regular insert when you have many rows"

I think we need to clarify terms. By Massimo's own account in the web2py 
book, the DAL bulk insert is not faster than db.insert unless you are using 
the GAE. So are you talking about your db's native bulk methods or is the 
book wrong?

Could someone just please answer what, if anything, is being sacrificed 
when you use your database's own bulk loading methods instead of using the 
DAL? Why the DAL religion about this?

On Sunday, August 19, 2012 5:09:43 PM UTC-4, Martin.Mulone wrote:
>
> bulk insert is a way faster than regular insert when you have many rows. 
> If you are under mysql you can use load data infile, this is incredible 
> fast, but you need special privileges under mysql.
>
> 2012/8/19 Andrew >
>
>> Is it possible that we add a "native bulk insert" function which is coded 
>> up in each adapter.  Even bulk_insert is an odbc 1 row at a time-slow for 
>> big files.  I need to load huge files all the time and I am writing custom 
>> modules to do this with a native loader.  Should this be a dal option? 
>>  Worth noting that this type of operation is a batch, back end thing,  I 
>> wouldn't do this for a end user web app.
>>
>> I would expect that each DBMS needs different info to start a bulk load, 
>> so the interface may be tricky, or just pass a dict and let the adapter 
>> work it out.
>> What do you think?
>>
>> --
>>
>>
>>
>>
>
>
> -- 
>  http://www.tecnodoc.com.ar
>
>

--

Re: [web2py] Re: Best way to insert 200k records?

2012-08-19 Thread Martín Mulone

bulk insert is a way faster than regular insert when you have many rows. If
you are under mysql you can use load data infile, this is incredible fast,
but you need special privileges under mysql.

2012/8/19 Andrew 

> Is it possible that we add a "native bulk insert" function which is coded
> up in each adapter.  Even bulk_insert is an odbc 1 row at a time-slow for
> big files.  I need to load huge files all the time and I am writing custom
> modules to do this with a native loader.  Should this be a dal option?
>  Worth noting that this type of operation is a batch, back end thing,  I
> wouldn't do this for a end user web app.
>
> I would expect that each DBMS needs different info to start a bulk load,
> so the interface may be tricky, or just pass a dict and let the adapter
> work it out.
> What do you think?
>
> --
>
>
>
>


-- 
 http://www.tecnodoc.com.ar

--

[web2py] Re: Best way to insert 200k records?

2012-08-19 Thread Andrew

Is it possible that we add a "native bulk insert" function which is coded up in 
each adapter.  Even bulk_insert is an odbc 1 row at a time-slow for big files.  
I need to load huge files all the time and I am writing custom modules to do 
this with a native loader.  Should this be a dal option?  Worth noting that 
this type of operation is a batch, back end thing,  I wouldn't do this for a 
end user web app.

I would expect that each DBMS needs different info to start a bulk load, so the 
interface may be tricky, or just pass a dict and let the adapter work it out.
What do you think?

--

[web2py] Re: Best way to insert 200k records?

2012-08-17 Thread howesc

if it is a one time data load and i have the proper constraints set in 
SQLLite, i would use raw SQL.

if this was repetitive use and i wanted to use web2py validation i would 
use the DAL with bulk insert (making sure that i validate the data before 
adding to the insert queue)

On Friday, August 17, 2012 12:38:56 PM UTC-7, Mike Girard wrote:
>
> 10 minutes is all right. Still, if there are two methods and they are 
> qualitatively equal apart from speed, I'd prefer to use the faster one.
>
> So posing the question, once again, is there a compelling reason to use 
> the DAL for bulk uploads. More specficially, what extras, if any, are being 
> added by the DAL for inserts that a bulk insert using SQLLite directly 
> won't add?
>
>
> On Friday, August 17, 2012 12:58:01 PM UTC-4, nick name wrote:
>>
>> On Friday, August 17, 2012 8:29:12 AM UTC-4, Mike Girard wrote:
>>>
>>> The data will be coming from a large XML file, so my script will parse 
>>> that and make inserts into several different tables. It's fairly 
>>> straightforward.
>>>
>>> So is it correct to say that -
>>>
>>> 1. There is no compelling reason to do this without the DAL 
>>> 2.  My options in the DAL are bulk_insert, looping db.query and csv 
>>> import and that performance wise they're similar? 
>>>
>> 2 is correct (as long as you are going through the DAL; db.executesql 
>> would be the non-DAL way from within web2py - and of course, you could use 
>> your DB's native facilities)
>>
>> 1. Is correct if you are not doing this often - e.g., it might take 10 
>> minutes as opposed to 1 minute without DAL (just assuming, not based on any 
>> actual measurement). So what?
>>
>> If you do this once an hour, then DAL processing and the individual 
>> record insertion (even if you use bulk_insert or csv) might make it too 
>> slow for you, and you would be better off looking at your database's native 
>> bulk loading facilities.
>>
>

--

[web2py] Re: Best way to insert 200k records?

2012-08-17 Thread Mike Girard

10 minutes is all right. Still, if there are two methods and they are 
qualitatively equal apart from speed, I'd prefer to use the faster one.

So posing the question, once again, is there a compelling reason to use the 
DAL for bulk uploads. More specficially, what extras, if any, are being 
added by the DAL for inserts that a bulk insert using SQLLite directly 
won't add?


On Friday, August 17, 2012 12:58:01 PM UTC-4, nick name wrote:
>
> On Friday, August 17, 2012 8:29:12 AM UTC-4, Mike Girard wrote:
>>
>> The data will be coming from a large XML file, so my script will parse 
>> that and make inserts into several different tables. It's fairly 
>> straightforward.
>>
>> So is it correct to say that -
>>
>> 1. There is no compelling reason to do this without the DAL 
>> 2.  My options in the DAL are bulk_insert, looping db.query and csv 
>> import and that performance wise they're similar? 
>>
> 2 is correct (as long as you are going through the DAL; db.executesql 
> would be the non-DAL way from within web2py - and of course, you could use 
> your DB's native facilities)
>
> 1. Is correct if you are not doing this often - e.g., it might take 10 
> minutes as opposed to 1 minute without DAL (just assuming, not based on any 
> actual measurement). So what?
>
> If you do this once an hour, then DAL processing and the individual record 
> insertion (even if you use bulk_insert or csv) might make it too slow for 
> you, and you would be better off looking at your database's native bulk 
> loading facilities.
>

--

[web2py] Re: Best way to insert 200k records?

2012-08-17 Thread nick name

On Friday, August 17, 2012 8:29:12 AM UTC-4, Mike Girard wrote:
>
> The data will be coming from a large XML file, so my script will parse 
> that and make inserts into several different tables. It's fairly 
> straightforward.
>
> So is it correct to say that -
>
> 1. There is no compelling reason to do this without the DAL 
> 2.  My options in the DAL are bulk_insert, looping db.query and csv import 
> and that performance wise they're similar? 
>
2 is correct (as long as you are going through the DAL; db.executesql would 
be the non-DAL way from within web2py - and of course, you could use your 
DB's native facilities)

1. Is correct if you are not doing this often - e.g., it might take 10 
minutes as opposed to 1 minute without DAL (just assuming, not based on any 
actual measurement). So what?

If you do this once an hour, then DAL processing and the individual record 
insertion (even if you use bulk_insert or csv) might make it too slow for 
you, and you would be better off looking at your database's native bulk 
loading facilities.

--

[web2py] Re: Best way to insert 200k records?

2012-08-17 Thread Mike Girard

The data will be coming from a large XML file, so my script will parse that and 
make inserts into several different tables. It's fairly straightforward.

So is it correct to say that -

1. There is no compelling reason to do this without the DAL 
2.  My options in the DAL are bulk_insert, looping db.query and csv import and 
that performance wise they're similar? 

--

[web2py] Re: Best way to insert 200k records?

2012-08-17 Thread weheh

You can enter either via db(query).insert(name=value, ...) or by a CSV 
file. Both are convenient. I suppose you could argue that CSV is convenient 
if your data are dumped from some other db without computing an 
intermediate value. If your data are less clean, perhaps, or you need to do 
some complex intermediate logic and computation on the fly to come up with 
a value to insert into the db, or just for all around flexibility, then 
db(query).insert(...) works great. Both are equally useful. I use both all 
the time. Mostly, it depends on the source of data and how ready they are 
to insert into the db without me having to muck with them.

--

Re: [web2py] Re: Best way to insert 200k records?

Re: [web2py] Re: Best way to insert 200k records?

Re: [web2py] Re: Best way to insert 200k records?

Re: [web2py] Re: Best way to insert 200k records?

[web2py] Re: Best way to insert 200k records?

[web2py] Re: Best way to insert 200k records?

[web2py] Re: Best way to insert 200k records?

[web2py] Re: Best way to insert 200k records?

[web2py] Re: Best way to insert 200k records?

[web2py] Re: Best way to insert 200k records?

10 matches

Site Navigation

Mail list logo

Footer information