8/16/2008

LVM2 over MDADM over DM-Multipath

As mentioned in my previous post, we have faced several problems mirroring directly with LVM2 among them:

  1. Unable to lvextend a mirrored LV without breaking the mirror

  2. Unable to maintain a mirrored LV synch'ed across reboots without using a 3rd disk for the log

We have finally decided to explore the mdadm solution to find a solution to our problem.

Multipathing

The first question was whether to use DM-MP or MDADM for multipathing purposes

ü DM-MP :

It offers support for a lot of FC disk arrays and could be used (we didn't test it) with ISCSI devices

It supports real round-robin multipathing allowing better performance (multibus)

ü Multipathing with MDADM :

No round-robin multipathing option, only failover with manual fallback.

We decided to make it work with DM-MP

Mirroring

As mentioned above, the mirroring will be made through mdadm.

The question was about the necessarity to use "fd Linux raid autodetect" partitions or not.

We first setup things using fd partitions, but after rebooting the servers, we faced an unsolvable problem.

As mdadm is started in the rc.sysinit before DM-MP on RHEL 4 (and 5), we ended up with md raid1 arrays built with /dev/sdX devices rather than /dev/dm devices.

Instead of hacking the rc.sysinit script, we removed the fd partitions and created a /etc/init.d/mdadm script that starts mdadm after dm-multipath is loaded. The raid1 arrays are setup in the /etc/mdadm.conf file.

No "fd Linux RAID autodetect" partitions.

Volume management

LVM2 is the tool used for volume management for its flexibility over pure partitions.

Detailled setup

Multipath

Our test server is dual FC connected to an HP EVA 8100 Array. Both Luns (VDISK) are from the same disk array, but the goal in production is to present to the server a Lun from 2 different disks arrays located on 2 different sites.

Below the detail of /etc/multipath.conf

defaults {

polling_interval 5

path_grouping_policy multibus

getuid_callout "/sbin/scsi_id -g -u -s /block/%n"

no_path_retry fail

}

blacklist {

devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"

devnode "^hd[a-z]"

}
devices {

device {

vendor "(HITACHIHP)" product "OPEN-.*"

getuid_callout "/sbin/scsi_id -g -u -s /block/%n"

}

device {

vendor "HP" product "HSV2[10]0"

getuid_callout "/sbin/scsi_id -g -u -s /block/%n"

}

}

Check that Multipath is working fine:

root@hostname# multipath -ll -v2

3600508b40006baca0000c00002c80000
[size=5 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [prio=8][active]
\_ 0:0:2:4 sdr 65:16 [active][ready]
\_ 0:0:3:4 sdt 65:48 [active][ready]
\_ 0:0:4:4 sdv 65:80 [active][ready]
\_ 0:0:5:4 sdx 65:112 [active][ready]
\_ 1:0:2:4 sdz 65:144 [active][ready]
\_ 1:0:3:4 sdab 65:176 [active][ready]
\_ 1:0:4:4 sdad 65:208 [active][ready]
\_ 1:0:5:4 sdaf 65:240 [active][ready]

3600508b40006baca0000c00002c3000
[size=5 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [prio=8][active]
\_ 0:0:2:3 sdq 65:0 [active][ready]
\_ 0:0:3:3 sds 65:32 [active][ready]
\_ 0:0:4:3 sdu 65:64 [active][ready]
\_ 0:0:5:3 sdw 65:96 [active][ready]
\_ 1:0:2:3 sdy 65:128 [active][ready]
\_ 1:0:3:3 sdaa 65:160 [active][ready]
\_ 1:0:4:3 sdac 65:192 [active][ready]

\_ 1:0:5:3 sdae 65:224 [active][ready]


RAID with mdadm

Create the md array using the disks WWID (UUID) rather than the dm aliases, as the names could change upon reboot.

root@hostname# mdadm --create --verbose /dev/md1 --level=1 --raid-devices=2 /dev/mapper/3600508b40006baca0000c00002c80000 dev/mapper/3600508b40006baca0000c00002c30000

check that the mirror is ok
/dev/md1:
        Version : 00.90.01
     Creation Time : Tue Aug 12 18:27:28 2008
        Raid Level : raid1
         Array Size : 5242816 (5.00 GiB 5.37 GB)
         Device Size : 5242816 (5.00 GiB 5.37 GB)
         Raid Devices : 2
         Total Devices : 2
        Preferred Minor : 1
        Persistence : Superblock is persistent



        Update Time : Tue Aug 12 18:31:44 2008
        State : clean
         Active Devices : 2
        Working Devices : 2
        Failed Devices : 0
        Spare Devices : 0



UUID : 6bb39d9f:7f66b358:9a9584f1:6b3e0114
        Events : 0.1

Number Major Minor RaidDevice State
0                  253          3          0          active sync /dev/dm-3
1                  253          4          1          active sync /dev/dm-4



Put the raid1 Array in the /etc/mdadm.conf
DEVICE /dev/mapper/*
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=6bb39d9f:7f66b358:9a9584f1:6b3e0114



Be carefull to always use UUID's rather than dm-X or sdX names, as these can change upon reboot.

Volume management



The LVM part of the job is the most common one, except the fact that we will be using the /dev/md1 as PV.

root@hostname# pvcreate /dev/md1

Then the VG,

root@hostname# vgcreate ORAvg /dev/md1

and then the LV's

root@hostname# lvcreate lvoldata ORAvg

and finally the appropriate filesystem

root@hostname# mke2fs -j /dev/ORAvg/lvoldata
root@hostname# tune2fs -c 0 /dev/ORAvg/lvoldata
root@hostname# tune2fs -i 0 /dev/ORAvg/lvoldata

Do not forget to create a /etc/init.d/mdadm script that is started after /etc/init.d/multipath script.

As start, this script must contain mdadm -A -s $DEVICE, where DEVICE is each of your defined (/etc/mdadm.conf) arrays.

The stop case manually fails as you'll have to stop first LVM.

2 comments:

Anonymous said...

Thank'you !

bigzaqui said...

that was pretty cool, I'm thinking to do something like this: RAID (mdadm) + AoE + ZFS to create a big pool with all my drives with data integrity (using zfs) and hardware integrity(using RAID)