[web2py] Re: CSV Import performance improvement idea

2012-02-14 Thread kenji4569
I use gluon.scheduler to import large csv files for production
environment, and execute db.commit() by 100 records in the task. This
works well for me.

Kenji

On 2月14日, 午後4:34, Johann Spies johann.sp...@gmail.com wrote:
 On 14 February 2012 00:54, Omi Chiba ochib...@gmail.com wrote:

  I have a problem with the performance of CSV import and I assume it
  generate INSERT statement for every record so it will be 8000
  statement if you have 8000 records in csv file.

  Can we use bulk_insert method instead so there will be always only one
  INSERT statement  which should reduce the performance significantly ?

  As I understand it the database (at least Postgresql) uses the COPY

 statement to import csv-files.  That is much quicker than a series of
 individual inserts.

 Regards
 Johann

 --
 Because experiencing your loyal love is better than life itself,
 my lips will praise you.  (Psalm 63:3)


[web2py] Re: CSV Import performance improvement idea

2012-02-14 Thread Omi Chiba
Johann and Kenji san, Thank you !
DB2 has CPYFRMIMPF command (I think it's same as COPY on postgres) so
I will try it.
On Feb 14, 6:48 am, kenji4569 hos...@s-cubism.jp wrote:
 I use gluon.scheduler to import large csv files for production
 environment, and execute db.commit() by 100 records in the task. This
 works well for me.

 Kenji

 On 2月14日, 午後4:34, Johann Spies johann.sp...@gmail.com wrote:







  On 14 February 2012 00:54, Omi Chiba ochib...@gmail.com wrote:

   I have a problem with the performance of CSV import and I assume it
   generate INSERT statement for every record so it will be 8000
   statement if you have 8000 records in csv file.

   Can we use bulk_insert method instead so there will be always only one
   INSERT statement  which should reduce the performance significantly ?

   As I understand it the database (at least Postgresql) uses the COPY

  statement to import csv-files.  That is much quicker than a series of
  individual inserts.

  Regards
  Johann

  --
  Because experiencing your loyal love is better than life itself,
  my lips will praise you.  (Psalm 63:3)


[web2py] Re: CSV Import performance improvement idea

2012-02-14 Thread GoldenTiger
 I remember I got that performance problem, importing 8.300 records from a 
csv file using appadmin and remote database   ( over 10 min )

Nest IMPROVEMENTS worked for me:

-if local database was fast from appadmin
-if remote database, A- using another DB manager to import csv file, 
phpmyadmin, sqlbuddy,etc
  B- coping file to remote host and calling 
importcsv from code




[web2py] Re: CSV Import performance improvement idea

2012-02-14 Thread Niphlod
you need to make differences between methods .

web2py stand in the middle, not all databases have facilities like the COPY 
command of postgresql.

Also, there are methods to export/import from csv at the database level and 
at the table level. If you need to restore a table from the admin app, you 
have to take into account the time to phisically transfer the csv file. 
web2py methods also can use uuid fields to do upserts , deals with 
reference, list:* types, etc.

If you have huge tables, or machines not performant enough, you shouldn't 
use web2py methods and code something tailored to your needsI don't 
have issues for normal tables with 50 records (file uploaded to remote, 
import from local file directly from the code, using DAL) .
Of course, if your table contains several columns or large blobs, YMMV.

PS @ Kenji: normally if you restore a table or import something, you want 
to know what records were committed and what records weren't, if you're not 
using pkeys or uuids for every single record committing 100 records at 
a time is normally considered a bad behaviour: if your file contain a bad 
record, you can't assure the consistency of the table
The time taken to import 8000 records with a commit every 100 is more than 
importing 8000 records and doing a single final commit, and you'll retain 
consistency, e.g. if something falls apart, you have a functioning table 
instead of a broken one.

Again, @ all, YMMV: if your csv's are millions of records and you don't 
mind database consistency (e.g., all records in the tables are then 
polished or filtered in a smart way), you can commit every n records and 
store the last committed line in some place else, so if your import 
crashes, you know where to start (be aware, writing performant checks and 
logics to import incrementally something to dbs and make it work in every 
possible condition is a quite daunting task). 
My point is: don't be shy using transactions, commits and rollbacks, they 
are simpler to use, very performant and maybe the most wonderful features 
in a relational database!


[web2py] Re: CSV Import performance improvement idea

2012-02-14 Thread kenji4569
@ Niphlod: The situation for me is that non-programmer operators
upload 10,000-100,000 product records mainly for inventory updates via
an admin interface. It's not so huge but large enoght to exceed a
server resposne time limit. The upload is not a bulk copy operation,
but insert/update operations for each record which has a pkey. I think
the upload keeps consistency when it aborted halfway. I am concerned
that the single commit would cause high memory usage which should be
avoided for live environment.

Regards,
Kenji

On 2月15日, 午前8:23, Niphlod niph...@gmail.com wrote:
 you need to make differences between methods .

 web2py stand in the middle, not all databases have facilities like the COPY
 command of postgresql.

 Also, there are methods to export/import from csv at the database level and
 at the table level. If you need to restore a table from the admin app, you
 have to take into account the time to phisically transfer the csv file.
 web2py methods also can use uuid fields to do upserts , deals with
 reference, list:* types, etc.

 If you have huge tables, or machines not performant enough, you shouldn't
 use web2py methods and code something tailored to your needsI don't
 have issues for normal tables with 50 records (file uploaded to remote,
 import from local file directly from the code, using DAL) .
 Of course, if your table contains several columns or large blobs, YMMV.

 PS @ Kenji: normally if you restore a table or import something, you want
 to know what records were committed and what records weren't, if you're not
 using pkeys or uuids for every single record committing 100 records at
 a time is normally considered a bad behaviour: if your file contain a bad
 record, you can't assure the consistency of the table
 The time taken to import 8000 records with a commit every 100 is more than
 importing 8000 records and doing a single final commit, and you'll retain
 consistency, e.g. if something falls apart, you have a functioning table
 instead of a broken one.

 Again, @ all, YMMV: if your csv's are millions of records and you don't
 mind database consistency (e.g., all records in the tables are then
 polished or filtered in a smart way), you can commit every n records and
 store the last committed line in some place else, so if your import
 crashes, you know where to start (be aware, writing performant checks and
 logics to import incrementally something to dbs and make it work in every
 possible condition is a quite daunting task).
 My point is: don't be shy using transactions, commits and rollbacks, they
 are simpler to use, very performant and maybe the most wonderful features
 in a relational database!