Re: [web2py] Re: Best way to insert 200k records?
Hi Andrew: Thanks for the explanation. I am going to assume that database neutrality is the main selling point of sticking with the DAL. I second your interest in a DAL method for genuine bulk loading, though I reckon that would be quite a beastly project. I appreciate everyone's input. This community is uniquely helpful. On Monday, August 20, 2012 1:49:46 AM UTC-4, Andrew wrote: > > HI Martin, > It depends on the RDBMS. Some are still one row at a time, which makes > insert and bulk_insert the same speed (it just makes the statement easier > to write. > > Hi MIke, > One of the goals of the DAL is to make the api database neutral, allowing > you to switch between databases without changing your code (hopefully). > The thing you sacrifice if you use a bulk native loader (today) is that you > are locking yourself into a specific database platform. The api for the > DAL doesn't have any platform specific features (I think), although some > features don't apply to all. > > What I was suggesting was a "Native Load" method which is defined within > each database adapter as they will all be different. Just a thought > although unlike ANSI SQL, every separate platform probably has their own > syntax for their bulk loader. > > Reiterating, I think bulk loading is a thing you would do as a batch / > scheduler process. It's not what you'd use with your web end-user app. If > that's the case, does it make sense for it to be a part of the DAL, or as > perhaps separate (contrib) modules targetting specific platforms ? > > > On Monday, August 20, 2012 1:36:14 PM UTC+12, Mike Girard wrote: >> >> "bulk insert is a way faster than regular insert when you have many rows" >> >> I think we need to clarify terms. By Massimo's own account in the web2py >> book, the DAL bulk insert is not faster than db.insert unless you are using >> the GAE. So are you talking about your db's native bulk methods or is the >> book wrong? >> >> Could someone just please answer what, if anything, is being sacrificed >> when you use your database's own bulk loading methods instead of using the >> DAL? Why the DAL religion about this? >> >> On Sunday, August 19, 2012 5:09:43 PM UTC-4, Martin.Mulone wrote: >>> >>> bulk insert is a way faster than regular insert when you have many rows. >>> If you are under mysql you can use load data infile, this is incredible >>> fast, but you need special privileges under mysql. >>> >>> 2012/8/19 Andrew >>> Is it possible that we add a "native bulk insert" function which is coded up in each adapter. Even bulk_insert is an odbc 1 row at a time-slow for big files. I need to load huge files all the time and I am writing custom modules to do this with a native loader. Should this be a dal option? Worth noting that this type of operation is a batch, back end thing, I wouldn't do this for a end user web app. I would expect that each DBMS needs different info to start a bulk load, so the interface may be tricky, or just pass a dict and let the adapter work it out. What do you think? -- >>> >>> >>> -- >>> http://www.tecnodoc.com.ar >>> >>> --
Re: [web2py] Re: Best way to insert 200k records?
HI Martin, It depends on the RDBMS. Some are still one row at a time, which makes insert and bulk_insert the same speed (it just makes the statement easier to write. Hi MIke, One of the goals of the DAL is to make the api database neutral, allowing you to switch between databases without changing your code (hopefully). The thing you sacrifice if you use a bulk native loader (today) is that you are locking yourself into a specific database platform. The api for the DAL doesn't have any platform specific features (I think), although some features don't apply to all. What I was suggesting was a "Native Load" method which is defined within each database adapter as they will all be different. Just a thought although unlike ANSI SQL, every separate platform probably has their own syntax for their bulk loader. Reiterating, I think bulk loading is a thing you would do as a batch / scheduler process. It's not what you'd use with your web end-user app. If that's the case, does it make sense for it to be a part of the DAL, or as perhaps separate (contrib) modules targetting specific platforms ? On Monday, August 20, 2012 1:36:14 PM UTC+12, Mike Girard wrote: > > "bulk insert is a way faster than regular insert when you have many rows" > > I think we need to clarify terms. By Massimo's own account in the web2py > book, the DAL bulk insert is not faster than db.insert unless you are using > the GAE. So are you talking about your db's native bulk methods or is the > book wrong? > > Could someone just please answer what, if anything, is being sacrificed > when you use your database's own bulk loading methods instead of using the > DAL? Why the DAL religion about this? > > On Sunday, August 19, 2012 5:09:43 PM UTC-4, Martin.Mulone wrote: >> >> bulk insert is a way faster than regular insert when you have many rows. >> If you are under mysql you can use load data infile, this is incredible >> fast, but you need special privileges under mysql. >> >> 2012/8/19 Andrew >> >>> Is it possible that we add a "native bulk insert" function which is >>> coded up in each adapter. Even bulk_insert is an odbc 1 row at a time-slow >>> for big files. I need to load huge files all the time and I am writing >>> custom modules to do this with a native loader. Should this be a dal >>> option? Worth noting that this type of operation is a batch, back end >>> thing, I wouldn't do this for a end user web app. >>> >>> I would expect that each DBMS needs different info to start a bulk load, >>> so the interface may be tricky, or just pass a dict and let the adapter >>> work it out. >>> What do you think? >>> >>> -- >>> >>> >>> >>> >> >> >> -- >> http://www.tecnodoc.com.ar >> >> --
Re: [web2py] Re: Best way to insert 200k records?
"bulk insert is a way faster than regular insert when you have many rows" I think we need to clarify terms. By Massimo's own account in the web2py book, the DAL bulk insert is not faster than db.insert unless you are using the GAE. So are you talking about your db's native bulk methods or is the book wrong? Could someone just please answer what, if anything, is being sacrificed when you use your database's own bulk loading methods instead of using the DAL? Why the DAL religion about this? On Sunday, August 19, 2012 5:09:43 PM UTC-4, Martin.Mulone wrote: > > bulk insert is a way faster than regular insert when you have many rows. > If you are under mysql you can use load data infile, this is incredible > fast, but you need special privileges under mysql. > > 2012/8/19 Andrew > > >> Is it possible that we add a "native bulk insert" function which is coded >> up in each adapter. Even bulk_insert is an odbc 1 row at a time-slow for >> big files. I need to load huge files all the time and I am writing custom >> modules to do this with a native loader. Should this be a dal option? >> Worth noting that this type of operation is a batch, back end thing, I >> wouldn't do this for a end user web app. >> >> I would expect that each DBMS needs different info to start a bulk load, >> so the interface may be tricky, or just pass a dict and let the adapter >> work it out. >> What do you think? >> >> -- >> >> >> >> > > > -- > http://www.tecnodoc.com.ar > > --
Re: [web2py] Re: Best way to insert 200k records?
bulk insert is a way faster than regular insert when you have many rows. If you are under mysql you can use load data infile, this is incredible fast, but you need special privileges under mysql. 2012/8/19 Andrew > Is it possible that we add a "native bulk insert" function which is coded > up in each adapter. Even bulk_insert is an odbc 1 row at a time-slow for > big files. I need to load huge files all the time and I am writing custom > modules to do this with a native loader. Should this be a dal option? > Worth noting that this type of operation is a batch, back end thing, I > wouldn't do this for a end user web app. > > I would expect that each DBMS needs different info to start a bulk load, > so the interface may be tricky, or just pass a dict and let the adapter > work it out. > What do you think? > > -- > > > > -- http://www.tecnodoc.com.ar --
[web2py] Re: Best way to insert 200k records?
Is it possible that we add a "native bulk insert" function which is coded up in each adapter. Even bulk_insert is an odbc 1 row at a time-slow for big files. I need to load huge files all the time and I am writing custom modules to do this with a native loader. Should this be a dal option? Worth noting that this type of operation is a batch, back end thing, I wouldn't do this for a end user web app. I would expect that each DBMS needs different info to start a bulk load, so the interface may be tricky, or just pass a dict and let the adapter work it out. What do you think? --
[web2py] Re: Best way to insert 200k records?
if it is a one time data load and i have the proper constraints set in SQLLite, i would use raw SQL. if this was repetitive use and i wanted to use web2py validation i would use the DAL with bulk insert (making sure that i validate the data before adding to the insert queue) On Friday, August 17, 2012 12:38:56 PM UTC-7, Mike Girard wrote: > > 10 minutes is all right. Still, if there are two methods and they are > qualitatively equal apart from speed, I'd prefer to use the faster one. > > So posing the question, once again, is there a compelling reason to use > the DAL for bulk uploads. More specficially, what extras, if any, are being > added by the DAL for inserts that a bulk insert using SQLLite directly > won't add? > > > On Friday, August 17, 2012 12:58:01 PM UTC-4, nick name wrote: >> >> On Friday, August 17, 2012 8:29:12 AM UTC-4, Mike Girard wrote: >>> >>> The data will be coming from a large XML file, so my script will parse >>> that and make inserts into several different tables. It's fairly >>> straightforward. >>> >>> So is it correct to say that - >>> >>> 1. There is no compelling reason to do this without the DAL >>> 2. My options in the DAL are bulk_insert, looping db.query and csv >>> import and that performance wise they're similar? >>> >> 2 is correct (as long as you are going through the DAL; db.executesql >> would be the non-DAL way from within web2py - and of course, you could use >> your DB's native facilities) >> >> 1. Is correct if you are not doing this often - e.g., it might take 10 >> minutes as opposed to 1 minute without DAL (just assuming, not based on any >> actual measurement). So what? >> >> If you do this once an hour, then DAL processing and the individual >> record insertion (even if you use bulk_insert or csv) might make it too >> slow for you, and you would be better off looking at your database's native >> bulk loading facilities. >> > --
[web2py] Re: Best way to insert 200k records?
10 minutes is all right. Still, if there are two methods and they are qualitatively equal apart from speed, I'd prefer to use the faster one. So posing the question, once again, is there a compelling reason to use the DAL for bulk uploads. More specficially, what extras, if any, are being added by the DAL for inserts that a bulk insert using SQLLite directly won't add? On Friday, August 17, 2012 12:58:01 PM UTC-4, nick name wrote: > > On Friday, August 17, 2012 8:29:12 AM UTC-4, Mike Girard wrote: >> >> The data will be coming from a large XML file, so my script will parse >> that and make inserts into several different tables. It's fairly >> straightforward. >> >> So is it correct to say that - >> >> 1. There is no compelling reason to do this without the DAL >> 2. My options in the DAL are bulk_insert, looping db.query and csv >> import and that performance wise they're similar? >> > 2 is correct (as long as you are going through the DAL; db.executesql > would be the non-DAL way from within web2py - and of course, you could use > your DB's native facilities) > > 1. Is correct if you are not doing this often - e.g., it might take 10 > minutes as opposed to 1 minute without DAL (just assuming, not based on any > actual measurement). So what? > > If you do this once an hour, then DAL processing and the individual record > insertion (even if you use bulk_insert or csv) might make it too slow for > you, and you would be better off looking at your database's native bulk > loading facilities. > --
[web2py] Re: Best way to insert 200k records?
On Friday, August 17, 2012 8:29:12 AM UTC-4, Mike Girard wrote: > > The data will be coming from a large XML file, so my script will parse > that and make inserts into several different tables. It's fairly > straightforward. > > So is it correct to say that - > > 1. There is no compelling reason to do this without the DAL > 2. My options in the DAL are bulk_insert, looping db.query and csv import > and that performance wise they're similar? > 2 is correct (as long as you are going through the DAL; db.executesql would be the non-DAL way from within web2py - and of course, you could use your DB's native facilities) 1. Is correct if you are not doing this often - e.g., it might take 10 minutes as opposed to 1 minute without DAL (just assuming, not based on any actual measurement). So what? If you do this once an hour, then DAL processing and the individual record insertion (even if you use bulk_insert or csv) might make it too slow for you, and you would be better off looking at your database's native bulk loading facilities. --
[web2py] Re: Best way to insert 200k records?
The data will be coming from a large XML file, so my script will parse that and make inserts into several different tables. It's fairly straightforward. So is it correct to say that - 1. There is no compelling reason to do this without the DAL 2. My options in the DAL are bulk_insert, looping db.query and csv import and that performance wise they're similar? --
[web2py] Re: Best way to insert 200k records?
You can enter either via db(query).insert(name=value, ...) or by a CSV file. Both are convenient. I suppose you could argue that CSV is convenient if your data are dumped from some other db without computing an intermediate value. If your data are less clean, perhaps, or you need to do some complex intermediate logic and computation on the fly to come up with a value to insert into the db, or just for all around flexibility, then db(query).insert(...) works great. Both are equally useful. I use both all the time. Mostly, it depends on the source of data and how ready they are to insert into the db without me having to muck with them. --