Hello,
     I'm in the midst of using mySQL for some genetic information
searching based upon the GenBank data from the NCBI, National Center
for Biotechnology Information.  In doing some testing on using mySQL,
and began to wonder if this data set would be of interest as a benchmark
for the database?

     The following information was taken from a recent run at loading
in a portion (500k records) of the data.  The full data set has almost
30M records so would not likely be pleasant to store and/or distribute.
But the data is publicly available and substantial.

     Please take a look at the timings on some of the activities shown
below.
               Brad Eacker ([EMAIL PROTECTED])

Load in the data (500,000) rows

mysql> create table gb_locus (
    ->         gbl_id          int primary key,
    ->         gbl_fileID      int,
    ->         gbl_locus       varchar(20),
    ->         gbl_size        int,
    ->         gbl_date        date,
    ->         gbl_phylum      char(3),
    ->         gbl_foffset     int
    -> );
Query OK, 0 rows affected (0.00 sec)

mysql> load data infile '/hda3/beacker/gene/genbank/a' into table gb_locus
    ->     fields terminated by ',';
Query OK, 500000 rows affected (10.58 sec)
Records: 500000  Deleted: 0  Skipped: 0  Warnings: 0

Storage used:
-rw-rw----    1 mysql    mysql    18141068 Dec  3 20:03 gb_locus.MYD
-rw-rw----    1 mysql    mysql     4098048 Dec  3 20:03 gb_locus.MYI

Access data:
mysql> select gbl_phylum, count(*) from gb_locus group by gbl_phylum;
+------------+----------+
| gbl_phylum | count(*) |
+------------+----------+
| BCT        |   210778 |
| CON        |    11472 |
| EST        |   277750 |
+------------+----------+
3 rows in set (6.83 sec)

Raw data:

[EMAIL PROTECTED] genbank]$ ls -l a
-rw-rw-r--    1 beacker  beacker  25758542 Dec  3 17:33 a


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to