Fw: moving data

SGreen Fri, 10 Sep 2004 07:00:41 -0700

How fast do your disks rotate? What kind of disk controller are you using 
(ATA, SATA, SCSI, ...)? Are they in a RAID array or not. If they are what 
RAID scheme are you using?


How big are your buffers and other working areas? (show variables like 
'%buff%';  show variables like '%size%';). Your CPUs are probably up to 
the task but we need to make sure you are using as much available memory 
as you can. I see PowerPC in your system description... how fast is your 
system bus (or does anyone else think that would make that much of a 
difference here)?

On another thought....I was under the impression that you were trying to 
create a single "daily" table for testing purposes (hence the date in the 
table name) yet you seem to have 25M+ entries? Are you sure that's just 
one day's worth of information? In fact from the time you sent the 
earliest message in this post until you posted your table status, you 
added about 10M new rows. 

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

"J S" <[EMAIL PROTECTED]> wrote on 09/10/2004 05:56:12 AM:

> Hi Shawn,
> 
> Thanks for helping me out again! I've cut and pasted the answers below 
your 
> questions:
> 
>>
>>I would love to say how adequate your hardware should be but I have
>>forgotten what hardware you have, sorry! Would you mind re-posting all 
of
>>the specs for your DB server?
> 
> 2 x 500Mhz CPUs
> 8GB Memory
> mysql-max-4.0.20-ibm-aix4.3.3.0-powerpc
> 
> Also could you post the results of :
>>
>>EXPLAIN select * From t20040908 where uid=454
>>
>>to make sure the query is using the `uid` key (it should be but it never
>>hurts to check).
> 
> mysql> EXPLAIN select * From t20040909 where uid=454;
> +-----------+------+---------------+------+---------+-------+------
> +-------------+
> | table     | type | possible_keys | key  | key_len | ref   | rows | 
Extra 
>      |
> +-----------+------+---------------+------+---------+-------+------
> +-------------+
> | t20040909 | ref  | uid           | uid  |       4 | const | 4275 | 
Using 
> where |
> +-----------+------+---------------+------+---------+-------+------
> +-------------+
> 1 row in set (0.07 sec)
> 
> Could you also respond with the results of :
>>
>>SHOW TABLE STATUS LIKE 't20040908'
> 
> mysql> SHOW TABLE STATUS LIKE 't20040909';
> +-----------+--------+------------+----------+----------------
> +-------------+-----------------+--------------+-----------
> +----------------+---------------------+---------------------
> +---------------------+----------------+---------+
> | Name      | Type   | Row_format | Rows     | Avg_row_length | 
Data_length 
> | Max_data_length | Index_length | Data_free | Auto_increment | 
Create_time 
>         | Update_time         | Check_time          | Create_options | 
> Comment |
> +-----------+--------+------------+----------+----------------
> +-------------+-----------------+--------------+-----------
> +----------------+---------------------+---------------------
> +---------------------+----------------+---------+
> | t20040909 | MyISAM | Fixed      | 25209614 |             32 | 
806707648 
> |    137438953471 |    222495744 |         0 |           NULL | 
2004-09-09 
> 14:24:41 | 2004-09-10 01:01:35 | 2004-09-09 14:25:40 |                |  
 
>     |
> +-----------+--------+------------+----------+----------------
> +-------------+-----------------+--------------+-----------
> +----------------+---------------------+---------------------
> +---------------------+----------------+---------+
> 1 row in set (0.01 sec)
> 
> 
>>
>>Part of the time it takes to query any table is the time it takes to 
load
>>the correct index in from the disk so... maybe? you hardware could be 
too
>>slow for this volume of information? I just can't say with any degree of
>>confidence one way or the other yet.
>>
>>I did notice that this query returned in only about 11% of the time it
>>took a similar query on the full dataset to return (6.87 seconds  vs. 61
>>seconds a similar lookup on "internet_usage"). Better but not exactly
>>linearly related to the reduction in the data size. I would think this 
is
>>related to the fact that the time it takes to find records using a 
B-TREE
>>index degrades roughly logarithmically. I thought about putting a HASH
>>index on that column but those are only available to the MEMORY table 
type
>>(bummer).
>>
>>BTW,  if you and I take this thread back into the list everyone else 
will
>>also have a chance to review your hardware specs and throw in their two
>>cents, too!
>>
>>Shawn Green
>>Database Administrator
>>Unimin Corporation - Spruce Pine
>>
>>"J S" <[EMAIL PROTECTED]> wrote on 09/09/2004 04:48:47 AM:
>>
>>> Hi Shawn,
>>>
>>> I'm working at the moment on the new database! I had a quick question
>>for
>>> you. Here's my table structure:
>>>
>>> t20040908 | CREATE TABLE `t20040908` (
>>>   `uid` int(10) unsigned NOT NULL default '0',
>>>   `time` timestamp(14) NOT NULL,
>>>   `ip` int(10) unsigned NOT NULL default '0',
>>>   `urlid` int(10) unsigned NOT NULL default '0',
>>>   `timetaken` smallint(5) unsigned default '0',
>>>   `cs_size` int(10) unsigned default '0',
>>>   `sc_size` int(10) unsigned default '0',
>>>   `method_ID` tinyint(3) unsigned NOT NULL default '0',
>>>   `action_ID` tinyint(3) unsigned NOT NULL default '0',
>>>   `virus_ID` tinyint(3) unsigned NOT NULL default '0',
>>>   `useragent_ID` smallint(5) unsigned NOT NULL default '0',
>>>   KEY `uid` (`uid`)
>>> ) TYPE=MyISAM
>>>
>>> and it currently has 15263552 rows.
>>>
>>> I ran the command:
>>> select * from t20040908 where uid=454;
>>>
>>> which came back with 8347 rows in set (6.87 sec)
>>>
>>>
>>> Does 6.87 seconds seem rather slow to you ? It could be the box is 
just
>>too
>>> old and slow. But I just wanted to check in case there was anything 
else
>>I
>>> should check?
>>>
>>> Cheers,
>>>
>>> JS.
>>>
>>>>There are two ways that I can think of that will combine smaller 
tables
>>>>into one longer (not wider) table: MERGE tables and UNION queries. A
>>third
>>>>option is to manually add rows from multiple source tables to a 
staging
>>>>table using INSERT...SELECT... statements.
>>>>
>>>>Since we never know from day to day what you will need to query on,
>>>>creating one big MERGE table would probably be as impractical as your
>>>>existing setup. That leaves the other two options available to run
>>reports
>>>>against.
>>>>
>>>>Since your information comes in daily batches anyway, that makes not
>>only
>>>>logical sense, and business sense,  but physical sense too!  Remember 
a
>>>>few emails ago you asked how to move files from one table to another?
>>it's
>>>>pretty simple if we are creating identical tables. Just script this
>>into a
>>>>loop that traverses all of the dates in your range:
>>>>
>>>>CREATE TABLE `20040401` LIKE internet_usage;
>>>>
>>>>INSERT 20040401
>>>>SELECT * from internet_usage
>>>>WHERE time >= '20040401000000' and time <= '20040401235959';
>>>>
>>>>DELETE FROM internet_usage
>>>>WHERE time >= '20040401000000' and time <= '20040401235959';
>>>>
>>>>OPTIMIZE NO_WRITE_TO_BINLOG TABLE internet_usage;
>>>>
>>>>You have to run OPTIMIZE TABLE in order to recover the empty space 
from
>>>>the table created by the DELETE. This will take quite a while to in 
the
>>>>beginning but it will get logarithmically faster as you clear out the
>>>>table. (http://dev.mysql.com/doc/mysql/en/OPTIMIZE_TABLE.html)
>>>>
>>>>Like I said before, you can improve performance (and disk space usage)
>>>>during this re-organization if you disable all logging (the common
>>query
>>>>log and the binlog) while all of this is going on.
>>>>
>>>>To get an extract of a user's activity that spans multiple days you
>>could
>>>>(post 4.1):
>>>>
>>>>create temporary table  tmpUserActivity
>>>>SELECT *
>>>>FROM (
>>>>        (SELECT * from 20040401 where uid = @userid)
>>>>        UNION
>>>>        (SELECT * from 20040402 where [EMAIL PROTECTED])
>>>>        ... repeat for as many days as you need....
>>>>)
>>>>Where time between @starttime and @endtime
>>>>
>>>>What makes this work is that MySQL recognizes the outermost () pair as
>>>>something like a single-table UNION query (I think they refer to it in
>>the
>>>>docs as an "anonymous view"). Anyway if you can create UNION queries,
>>you
>>>>have that technique available.
>>>>
>>>>It's not going to be quick to break up that freakin' huge table of
>>yours
>>>>but I think you will be very happy with the performance when you do.
>>The
>>>>downside to that is that making some of  your queries will now take
>>more
>>>>effort (mulitple insert ... select... statements to populate a temp
>>table
>>>>or some UNION queries).
>>>>
>>>>Shawn
>>>>
----8<--- earlier responses clipped for room ---8<---------SG

Fw: moving data

Reply via email to