You're right - it is a data related issue. I finally made it work with
the live data on high speed but in a way that puzzles me. Do you know
the explanation ? (I'm using mySQL 4.1.1a-max-nt on Win2K)

The index of the primary key is using the same data set (content and
size) for both the test and live data. That was the reason I didn't get
it why the performance dropped when using the live data. 

The difference was the order in which the rows were inserted into the
table. When the order is right, the performance is high. The test data
is programmatically generated in the ascending order of the primary key.
The live data has no particular order.

I did a simple test to reproduce this:

* Create the table that will contain the live data.
* Disable the table's keys
* Load the live data using LOAD DATA INFILE ..., sorted in the random
order.
* Enable the table's keys
* Run the unit tests => 2000 seconds
* Dump that table's content into a text file, sorted on the ascending
order of the primary key.
* Create a second table identical to the first one.
* disable keys/LOAD DATA INFILE new text file/enable keys
* Run the same unit tests using the new table => 40 seconds

It looks like the physical order of the rows has an influence on the way
the primary key is building its index. Is this normal in mySQL ? Any
idea ?

Thanks,

Jan.
 

-----Original Message-----
From: Tim Cutts [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, May 25, 2004 9:32 AM
To: Jan Goyvaerts (jgoyvaer)
Subject: Re: Endless primary key build


On 24 May 2004, at 10:49 am, Jan Goyvaerts ((jgoyvaer)) wrote:

> My only conclusion is that getting the data first from another
database
> does something to mySQL...

The other difference is the data.  Your test data was probably evenly 
distributed, or normally distributed, depending on your random number 
generator.  The live data is probably distributed very differently.  
Maybe it's sparse.  Maybe it's skewed or clustered in interesting ways. 
  Either way, the one thing this will have a large impact on is the 
arrangement and performance of the index.

Tim

-- 
Dr Tim Cutts
Informatics Systems Group
Wellcome Trust Sanger Institute
Hinxton, Cambridge, CB10 1SA, UK


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to