Note: When adapting the below procedure to your own circumstances, please be careful to modify the device names and other details as necessary, and to carefully double-check each command before executing it. One mistake here could potentially destroy large amounts of data. It would be extremely unwise to simply copy’n’paste the below commands into a shell.
Note: RAID 5 is no substitute for backups. You should have up-to-date backups of your data stored on RAID 5 in any case; but it is even more important to ensure these backups are up-to-date before you begin a potentially dangerous operation such as the online reshape describe here. Also bear in mind that what you think is a backup may not actually be a usable backup. A backup that you have not successfully performed a mock restore from is not a backup; it is merely a waste of storage.
The background: A server running Linux Software RAID and LVM
A server I administer contains a Linux software RAID 5 array
/dev/md1, consisting of three partitions on SATA disks:
Inside the VG
/dev/vgdata are various LVM Logical Volumes, in daily use for various purposes.
The problem: Lack of disk space
All of the space in VG
/dev/vgdata has been assigned to LVs. Most of these LVs contain filesystems which are full (or almost full) of data, so cannot be shrunk by useful amounts.
To shrink these filesystems, one would have to bring down services, create new smaller filesystems, copy data from old to new filesystems, bring services back up on the new filesystems, then delete the old filesystems. This would involve downtime, which is to be avoided if at all possible. Additionally, this would involve transient high demand for disk space during the copy operation, which cannot easily be arranged (else we would not be in the current situation). Even worse, this would be a laborious manual process, so would introduce significant potential for human error and consequent loss of time (or even loss of data, if backups were not available).
The other Volume Groups in this machine are in use for unrelated purposes, and experience significantly different workloads from that of
/dev/vgdata. While a solution, stealing some of their storage to expand the existing LVs would be neither clean nor optimal.
Thus, there is no good way to make better use of the currently-available storage without downtime.
The Solution: Add an additional disk to the RAID 5 array
There remain unpopulated SATA ports and disk trays in the server. SATA disks are already fairly inexpensive, and continue to become cheaper with time.
Therefore an additional SATA disk was purchased, and installed in the server’s disk enclosure. As SATA disks are hot-swappable, this operation requires no downtime. The additional disk appears to Linux as the device
The procedure: Growing the RAID 5 array
Confirm that the new disk has been detected, and obtain its device node, which in this case is
Obtain the partition table of one of the existing disks in the RAID array – but do not modify any partitions on this disk in any way. The tools ‘sfdisk’ and ‘fdisk’ are both sufficient for this task, but I prefer ‘fdisk’.
sudo fdisk /dev/sdc
Use fdisk (or sfdisk if you prefer) to create exactly the same-size partition on the new disk, with type
0xfd (Linux RAID auto-detect). The new partition need not have the same starting offset as the existing partitions, but it must not be smaller than those existing partitions.
sudo fdisk /dev/sdf
Note: Use of partition type
0xfd is appropriate for kernel auto-detection of software RAID arrays, which has recently been deprecated in favour of a new initramfs-based system. I still prefer the older and simpler autodetection system, despite its drawbacks.
Note: If your disk is one of the newer ones with differing logical and physical sector sizes, for optimal performance you will need to align the partition to the physical sector size. Failure to do this can result in a radical decrease in disk performance.
You will know that this is necessary if the disk is an ‘Advanced Format’ model, or if fdisk output contains something like:
Sector size (logical/physical): 512 bytes / 4096 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
Next, add the newly-created partition to the RAID 5 array, whereupon it will become a ‘spare’ volume:
sudo mdadm --add /dev/md1 /dev/sdf1
Instruct Linux to grow (aka ‘reshape’)
/dev/md1 from three RAID devices to four. This is an online but potentially lengthy operation, requiring a full pass over the member disk data:
sudo mdadm --grow /dev/md1 --raid-devices=4 --backup-file=/root/raid5-reshape-backup-file
In theory, after a (brief) critical time at the start of the operation has passed, the process can be interrupted and easily restarted without any loss of data; and even during the initial critical time, the backup file above can be used to manually resume the process, still without loss of data.
In practice, the ability to resume an interrupted reshape operation is something which I have never tested, and earnestly hope never to need to test.
Note: Further information on this step can be found in the Linux Software RAID Wiki, but that information refers to updating the file
/etc/mdadm/mdadm.conf, which depending on your set-up, you may not have or may not need to update.
The progress of the grow aka reshape operation can be observed with:
watch cat /proc/mdstat
When the operation has completed, expand the PV on the newly-expanded MD device to fill the MD device. This is an online and fast operation:
sudo pvresize /dev/md1
By default, the PV expands to fill the device containing it.
/dev/vgdata now has additional free Physical Extents:
sudo vgdisplay /dev/vgdata
Expand Logical Volume(s) as necessary:
sudo lvextend --size=<new_size> </logical/volume/path>
Expand filesystem(s) as necessary:
sudo resize2fs </mount/point/of/filesystem>
sudo xfs_growfs </block/device/node>