Dear Kees and Richard,

Much appreciated.


I tried the ordered-urls-insert the results were better, but it is still 
taking progressively longer time as the number of records increases.

A fundamental question to be asked here :

Shouldn't the time complexity (Big-O) of the insert operation be constant?

I even did a third test where the integer primary key is not auto increment; 
the same problem is observed.

Time complexity is clearly constant when there are no indexes at all (except 
for the implicit auto-increment integer primary key)

But otherwise, time taken increases incrementally (if not exponentially) with 
the number of existing records.

Acknowledging my ignorance on the subject; I really don't see a reason why 
this is happening except being a potential bug or performance improvement 
opportunity.

Regards,
- Kefah.



On Saturday 15 September 2007 12:51:02 Kees Nuyt wrote:
> Hi Kefah,
>
> On Sat, 15 Sep 2007 04:43:46 +0300, you wrote:
> >Thanks Kees,
> >
> >In fact using integer instead of string gives very similar results.
> >
> >Dropping the unique index constraint and then creating it again when
> > needed sounds interesting, as insertion without the unique index is
> > satisfactory and constact over time.
>
> Thanks for the feedback.
>
> >I tried this, but got a trivial question :
> >When creating the unique index, sqlite gives me :
> >SQL error: indexed columns are not unique
> >
> >What should be done here?
>
> Apparently the data in the text column is not unique.
> That is the disadvantage of building the index after the
> insertions: the database can't exercise the contraints on your
> data, so you would have to do that yourself, for example by a
> sort --unique step. My second suggestion for speeding things was
> sorting the input data, so now you have two reasons for a sort.
>
> Such a data cleaning step will take considerable time, so time
> gained in inserting may be lost again in preprocessing.
> It might be better to use the database constraints, and live
> with the slow insertions. Your benchmarks will tell you what's
> best.
>
> >Thanks again,
> >- Kefah.
>
> Good luck.
>
> >On Saturday 15 September 2007 00:25:03 Kees Nuyt wrote:
> >> On Fri, 14 Sep 2007 23:20:53 +0300, you wrote:
> >> >Dear All,
> >> >
> >> >I have been struggling with the performance of insertion in sqlite.
> >> >
> >> >Here we have a very simple case :
> >> >
> >> >A table with an integer autoincrement primary key and a text field that
> >> > is unique.
> >> >
> >> >CREATE TABLE my (id PRIMARY KEY, url);
> >> >
> >> >CREATE UNIQUE INDEX myurl ON my(url);
> >> >
> >> >
> >> >My application requires inserting up to 10 million records in batches
> >> > of 20 thousand records.
> >> >
> >> >I use :
> >> >    A. PRAGMA SYNCHRONOUS = OFF
> >> >    B. Prepared Statement
> >> >    C. setAutoCommit(false), then to true at the end of the batch.
> >> >
> >> >Using the above, the insertion starts off at a good speed, then
> >> > drastically slows down as more records are inserted.
> >> >
> >> >It goes like this :
> >> >
> >> >The first four inserstions (first 4*20K -> 60K records)
> >> >0        took    786
> >> >1        took    944
> >> >2        took    1001
> >> >3        took    1303
> >> >
> >> >After the first 1 Million records
> >> >50        took    2560
> >> >51        took    1921
> >> >55        took    1900
> >> >53        took    3990
> >> >
> >> >After the 2nd million records
> >> >2        took    22393
> >> >3        took    16789
> >> >4        took    29419
> >> >5        took    13993
> >> >
> >> >By the end of the the 3rd million records the insertion time goes up to
> >> > 30 seconds per 20K records.
> >> >
> >> >My app is running from a java code and I am using SqliteJDBC 0.37 (the
> >> > latest) on Sun JDK 1.6.0_02 on CentOS 5.0 4GB Ram, two duo core cpu
> >> > server with raid10 sata-II harddisk.
> >> >
> >> >
> >> >I know I might be stretching sqlite far beyond its limits, I just want
> >> > to verify that there is nothing more that can be done to make a case
> >> > for sqlite in this scenario. I am not sure whats the best next thing
> >> > to do either.
> >> >
> >> >Your feedback and input will be highly appreciated,
> >> >
> >> >- Kefah.
> >>
> >> Most probably the UNIQUE INDEX on the TEXT column is the
> >> culprit.
> >>
> >> My first try would be to create and fill the table first, and
> >> create the UNIQUE INDEX on the TEXT column afterwards.
> >>
> >> The second suggestion would be to INSERT the rows in sorted
> >> order, the sort key being the TEXT column.



-- 
Kefah T. Issa
Manager

>/. freesoft technologies llc
freesoft technologies, LLC.
Cell : +962 777 80 90 50
Office : +962 6 55 23 967
Fax : +962 6 55 61 967
Jabber IM (XMPP) : [EMAIL PROTECTED] 

-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to