Hello, I'm in the midst of using mySQL for some genetic information searching based upon the GenBank data from the NCBI, National Center for Biotechnology Information. In doing some testing on using mySQL, and began to wonder if this data set would be of interest as a benchmark for the database?
The following information was taken from a recent run at loading in a portion (500k records) of the data. The full data set has almost 30M records so would not likely be pleasant to store and/or distribute. But the data is publicly available and substantial. Please take a look at the timings on some of the activities shown below. Brad Eacker ([EMAIL PROTECTED]) Load in the data (500,000) rows mysql> create table gb_locus ( -> gbl_id int primary key, -> gbl_fileID int, -> gbl_locus varchar(20), -> gbl_size int, -> gbl_date date, -> gbl_phylum char(3), -> gbl_foffset int -> ); Query OK, 0 rows affected (0.00 sec) mysql> load data infile '/hda3/beacker/gene/genbank/a' into table gb_locus -> fields terminated by ','; Query OK, 500000 rows affected (10.58 sec) Records: 500000 Deleted: 0 Skipped: 0 Warnings: 0 Storage used: -rw-rw---- 1 mysql mysql 18141068 Dec 3 20:03 gb_locus.MYD -rw-rw---- 1 mysql mysql 4098048 Dec 3 20:03 gb_locus.MYI Access data: mysql> select gbl_phylum, count(*) from gb_locus group by gbl_phylum; +------------+----------+ | gbl_phylum | count(*) | +------------+----------+ | BCT | 210778 | | CON | 11472 | | EST | 277750 | +------------+----------+ 3 rows in set (6.83 sec) Raw data: [EMAIL PROTECTED] genbank]$ ls -l a -rw-rw-r-- 1 beacker beacker 25758542 Dec 3 17:33 a -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]