On 02Jul2019 15:23, Alex <mysqlstud...@gmail.com> wrote:
>I've since learned it takes entirely too long to copy 1.3TB to two >2TB
>disks. I can't keep the system down that long.

You don't need to.

The problem is the LSI hardware RAID. All eight ports are consumed
with the eight 240GB disks.

At least with mdadm you have control of the RAID fairly directly. Your LSI controller can also run the drives as a raidset, but it it harder to deal with. I have some scripts for monitoring and, to a limited extent, controlling these while the OS is up instead of via the BIOS interface.

However, the RAID arrangement is proprietry and different to mdadm and/or LVM. OTOH, I did once spent an hour on the phone with a very helpful LSI engineer trying to rescue one here.

So, using the LSI in JOB (just a bunch of discs) mode, yes?

The two 2TB disks are connected to the onboard SATA controllers. I
forget the reason why I didn't just use the onboard SATA controllers
when I installed the system seven years ago. I know there's only six
ports, and I'm using eight disks on the LSI controller, but that
wasn't the reason - the decision was made to use the LSI when there
was only four regular SATA disks installed. Maybe that was the reason
- the onboard were too slow.

The LSI stuff is pretty good in my experience. Ran them in several IBM boxes and also at home for years.

Using the LSI makes me nervous - there have been one or two times when
I almost lost the array, but I'll probably keep using it.

The important thing is to be able to monitor them. I've some scripts for that - put them in a 5 minute cronjob. Or in your monitoring system eg nagios. Then you will get timely emails if a problem occurs.

I wrote the cs.app.megacli Python module for this (see PyPI) and have some small auxiliary scripts which wrap it.

This means I have to use an interim server to hold the 2TB of data
while rebuilding and restore the data to the original server.

I usually use the USB bus for this (copying user data outside the system so that I've a backup and can copy it back) - it leaves the system SATA stuff available for whatever. Have you got USB3 on this machine? Otherwise it will be munch slower.

I'll
probably set it up with the two 2TB regular disks, shift all the
services to it, rebuild the existing production system, copy the data
back, then shift the IP and services back to the original production
machine.

Another problem - just saw one of the 2TB disks I'm using for backup is failing:

Are they a RAID? Or 2 independent drives and filesystems?

If you've got USB3, a pair of WD Elements external USB bus powered drives can be convenient. Or the like.

[411086.090668] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[411086.091908] ata6.00: irq_stat 0x40000001
[411086.093071] ata6.00: failed command: READ DMA EXT
[411086.094218] ata6.00: cmd 25/00:00:80:82:b9/00:05:49:00:00/e0 tag
16 dma 655360 in
                        res 53/40:00:80:82:b9/00:00:49:00:00/00 Emask
0x8 (media error)
[411086.096519] ata6.00: status: { DRDY SENSE ERR }
[411086.097699] ata6.00: error: { UNC }
[411086.099676] ata6.00: NCQ Send/Recv Log not supported
[411086.101691] ata6.00: NCQ Send/Recv Log not supported
[411086.102885] ata6.00: configured for UDMA/133
[411086.104086] sd 5:0:0:0: [sdb] tag#16 FAILED Result:
hostbyte=DID_OK driverbyte=DRIVER_SENSE
[411086.105329] sd 5:0:0:0: [sdb] tag#16 Sense Key : Vendor
Specific(9) [current]
[411086.105950] sd 5:0:0:0: [sdb] tag#16 <<vendor>>ASC=0x80 ASCQ=0x0
[411086.106522] sd 5:0:0:0: [sdb] tag#16 CDB: Read(16) 88 00 00 00 00
00 49 b9 82 80 00 00 05 00 00 00
[411086.107675] print_req_error: I/O error, dev sdb, sector 1236894336
[411086.108296] ata6: EH complete

# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb1[1] sda1[0]
     1953381440 blocks super 1.2 [2/2] [UU]
     [======>..............]  check = 31.6% (618322048/1953381440)
finish=115514.5min speed=192K/sec
     bitmap: 0/15 pages [0KB], 65536KB chunk

I think this recovered.

It also may not be the drive. I've hand probelms with dodgy SATA enclosures and old/frail SATA cables too.

Do you have the hardware to assemble the new raidset with the new drives
and have both online at once (with two machines I suppose)?

If so you can do the cp-then-rsync directly to the new drives without
the intermediate 2TB volume. Which means there's no time consuming copy
back.

Because of the hardware RAID controller, I cannot.

Alas. Still, if you've got on board SATA in addition to the LSI controller you could: copy the 1.3TB to external USB drives, install the new OS on the SATA bus, swap out the old raidset for the new raidset, copy back from the USB drives.

So you can still win on the copy-out phase with cp-then-rsync, but you'll still have the lond copy back phase. SATA is almost certainly more efficient than USB, though you coud measure that.

Cheers,
Cameron Simpson <c...@cskk.id.au>
_______________________________________________
users mailing list -- users@lists.fedoraproject.org
To unsubscribe send an email to users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org

Reply via email to