Table of Contents
Multipath on Debian
Installation
To make multipath working on Debian, you'll need 'multipath-tools-initramfs' and 'multipath-tools' packages. But as said in 'mulitpath-tools-initramfs' bug's list, you need to correct '/usr/share/initramfs/hooks/multipath_hook'.
When you look at the bug list http://bugs.debian.org/cgi-bin/pkgreport.cgi?pkg=multipath-tools-initramfs;dist=unstable, you see there are some tools missing.
The first very important things to add in 'multipath_hook' is :
manual_add_modules dm-multipath manual_add_modules dm-mod manual_add_modules dm-round-robin
Then you need to add this
for helper in /sbin/mpath_prio_*; do copy_exec $helper /sbin done
And finally if you want to use 'alias', add :
copy_exec /etc/multipath.conf /etc/
Optionally you can comment out the line
copy_exec /bin/readlink /bin/
And you're done with it. Your file should looks like :
#!/bin/sh # The environment contains at least: # # CONFDIR -- usually /etc/mkinitramfs, can be set on mkinitramfs # command line. # # DESTDIR -- The staging directory where we are building the image. # PREREQ="" prereqs() { echo "$PREREQ" } case $1 in # get pre-requisites prereqs) prereqs exit 0 ;; esac # You can do anything you need to from here on. # # Source the optional 'hook-functions' scriptlet, if you need the # functions defined within it. Read it to see what is available to # you. It contains functions for copying dynamically linked program # binaries, and kernel modules into the DESTDIR. # . /usr/share/initramfs-tools/hook-functions copy_exec /sbin/multipathd /sbin/ copy_exec /sbin/scsi_id /sbin/ copy_exec /sbin/kpartx /sbin/ copy_exec /bin/mountpoint /bin/ copy_exec /sbin/devmap_name /sbin/ copy_exec /sbin/multipath /sbin/ # Modified by tchetch #copy_exec /bin/readlink /bin/ # Added by tchetch copy_exec /etc/multipath.conf /etc/ for helper in /sbin/mpath_prio_*; do copy_exec $helper /sbin done manual_add_modules dm-multipath manual_add_modules dm-mod manual_add_modules dm-round-robin mkdir -p $DESTDIR/lib || true cp /lib/libgcc_s.so.1 $DESTDIR/lib/ exit 0
Configuration
This part depends on your hardware, I've been working only with a SAN from IBM. Now you'll need to configure your file '/etc/multipath.conf'. You'll first create alias for your device :
multipaths { multipath { wwid 3600a0b8000177d9400002e61463f2ed3 alias system } multipath { wwid 3600a0b8000177bcc0000256645f7f166 alias data } }
- alias name you want to give to the Logical Drive attached to your Blade.
- wwid World Wide ID. It's a unique ID assigned to each Logical Drive.
Now you can configure the some options, like devices. For each devices connected to your system you can define options. I've got only one SAN attached to my system so it's easy :
devices { device { vendor "IBM.*" product "1722-600" path_grouping_policy group_by_serial path_checker tur path_selector "round-robin 0" prio_callout "/sbin/mpath_prio_tpc /dev/%n" failback immediate features "1 queue_if_no_path" no_path_retry 300 } }
- vendor is the name of the vendor of your system. This will be used to identify your SAN. For IBM, IBM.* works.
- product product name of your SAN. Mine is a DS 4300, but in the Storage Manager reports Product ID : 1722-600.
- path_grouping_policy Depend on how you want to use your SAN. For example multibus doesn't work on my SAN. I use group_by_serial because I've seen a document for IBM SAN that use this. Other options are failover and multibus. To find the best one, test (I personnaly have tested all of them, and for me group_by_serial work best).
- path_checker Can be readsector0 and tur. On my SAN, readsector0 make path switching and so my SAN is not happy with it and it reports a problem.
- prio_callout this where I've spend much of my time testing. To know which prio_callout you've got, go to /sbin and list all the mpath_prio_*. Then test them and choose the one which return the right value, but do the test in the initrd environment, because behavior is different than in initrd. I'll explain more later.
- failback define when to come back to the original path when it comes up. Set to immediate, a value in second or manual if you want to disable path failback.
- features I don't know what it is, but it was used by someone working on IBM SAN.
- no_path_retry how many times before failling. Can be a number of try, fail for immediate failling and queue to keep trying forever.
Now you can add default values for all the devices :
defaults { udev_dir /dev polling_interval 2 default_getuid_callout "/sbin/scsi_id -g -u -s /block/%n" user_friendly_names yes }
- udev_dir where is the devfs.
- polling_interval time in second between two check on a path
- default_getuid_callout command to get WWID.
- user_friendly_names if no aliases are set, this define if the name choose will be user friendly (mpathX) or system friendly (using the WWID instead).
And finally you should add this, taken from the example file from Debian :
devnode_blacklist { devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*" devnode "^hd[a-z][[0-9]*]" devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]" }
This just set devices that won't be taken into account when building the multipath.
Your files should looks like this :
## ## This is a template multipath-tools configuration file ## Uncomment the lines relevent to your environment ## defaults { udev_dir /dev polling_interval 2 default_getuid_callout "/sbin/scsi_id -g -u -s /block/%n" user_friendly_names yes } devnode_blacklist { devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*" devnode "^hd[a-z][[0-9]*]" devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]" } devices { device { vendor "IBM.*" product "1722-600" path_grouping_policy group_by_serial path_checker tur path_selector "round-robin 0" prio_callout "/sbin/mpath_prio_tpc /dev/%n" failback immediate features "1 queue_if_no_path" no_path_retry 300 } } multipaths { multipath { wwid 3600a0b8000177d9400002e61463f2ed3 alias system } multipath { wwid 3600a0b8000177bcc0000256645f7f166 alias data } }
How to get the WWID
This is done with scsi_id. For example with sda device, you'll do like this :
/sbin/scsi_id -g -u -s /block/sda
Don't ask me why it's /block and not /dev !
You might notice something. For example on my system sda and sdc report the same ID. That's normal, sda is the first path and sdc is the second path, but the logical drive is the same.
Building initrd
Now you're ready to build the initrd. Use, if possible, the same tools that is used by your distribution when upgrading the kernel. For Debian you can do this :
dpkg-reconfigure linux-image-2.6.18-4-686
linux-image-2.6.18-4-686 is the package I installed for the kernel.
Modifying your grub/fstab
Now you have to change grub and fstab to point to the right device. If you used aliases, your device will likely be accessible with /dev/mapper/aliasX, where alias is the name you choose and X is the partition number.
In Debian don't forget to change the kopt value in /boot/grub/menu.lst, so on the next kernel upgrade it won't break the hard work you've done :
## ## Start Default Options ## ## default kernel options ## default kernel options for automagic boot options ## If you want special options for specific kernels use kopt_x_y_z ## where x.y.z is kernel version. Minor versions can be omitted. ## e.g. kopt=root=/dev/hda1 ro ## kopt_2_6_8=root=/dev/hdc1 ro ## kopt_2_6_8_2_686=root=/dev/hdc2 ro # kopt=root=/dev/mapper/system1 ro
Now reboot
When you reboot it might fails and the root file system might not be mounted. So wait until the initrd shell shows up and try to test behaviour of the different parameters you set before until you have only good values.
For example on my system if I call mpath_prio_balance_units on the running system, it will return correct value, but in the initrd it won't return anything, so the environment is different, you have to find a solution in initrd and then adapt your configuration.
You can always chroot into your root filesystem from the iniramfs shell. Use it to reconfigure your initrd.
If your system boot the first time, it means that you're much more lucky than me. Now you can do :
bladeTest:~# multipath -ll system (3600a0b8000177d9400002e61463f2ed3) dm-0 IBM,1722-600 [size=5.0G][features=1 queue_if_no_path][hwhandler=0] \_ round-robin 0 [prio=1][enabled] \_ 0:0:0:0 sda 8:0 [active][ready] \_ round-robin 0 [prio=6][active] \_ 0:0:1:0 sdc 8:32 [active][ready] data (3600a0b8000177bcc0000256645f7f166) dm-1 IBM,1722-600 [size=9.0G][features=1 queue_if_no_path][hwhandler=0] \_ round-robin 0 [prio=6][active] \_ 0:0:0:1 sdb 8:16 [active][ready] \_ round-robin 0 [prio=1][enabled] \_ 0:0:1:1 sdd 8:48 [active][ready]
And see all your path. For me with a configuration for data as multibus would give me this kind of output :
bladeTest:~# multipath -ll system (3600a0b8000177d9400002e61463f2ed3) dm-0 IBM,1722-600 [size=5.0G][features=1 queue_if_no_path][hwhandler=0] \_ round-robin 0 [prio=1][enabled] \_ 0:0:0:0 sda 8:0 [active][ready] \_ round-robin 0 [prio=6][active] \_ 0:0:1:0 sdc 8:32 [active][ready] data (3600a0b8000177bcc0000256645f7f166) dm-1 IBM,1722-600 [size=9.0G][features=1 queue_if_no_path][hwhandler=0] \_ round-robin 0 [prio=7][enabled] \_ 0:0:0:1 sdb 8:16 [active][ready] \_ 0:0:1:1 sdd 8:48 [active][ready]
Testing configuration
When your system boots perfectly, but you want to try other configuration without rebooting every time, just attached another partition to your system and work on it. When a partition is mounted you cannot change its multipath table, but if it's not mounted you can clear the table with multipath -f alias and then rebuild a new one with multipath alias.
For example on my test system I've got system and data. So when the system is running I cannot change system because this is the root file system, but data can be used to test.
Hot adding host to the system (qlogic)
You've got a little script made by qlogic that will scan for new host avaible there : http://download.qlogic.com/ms/56615/readme_dynamic_lun_22.html.
This script will scan new for new host. Then just run :
bladeTest:~# multipath sdb: checker msg is "tur checker reports path is down" sdd: checker msg is "tur checker reports path is down" sdf: checker msg is "tur checker reports path is down" sdg: checker msg is "tur checker reports path is down" sdb: checker msg is "tur checker reports path is down" sdf: checker msg is "tur checker reports path is down" sdg: checker msg is "tur checker reports path is down"
and then :
bladeTest:~# multipath -ll mpath2 (3600a0b8000177bcc0000256545f7aa8a) dm-4 IBM,1722-600 [size=5.0G][features=1 queue_if_no_path][hwhandler=0] \_ round-robin 0 [prio=6][enabled] \_ 0:0:0:2 sdf 8:80 [active][ready] \_ round-robin 0 [prio=1][enabled] \_ 0:0:1:2 sdg 8:96 [active][ready] system (3600a0b8000177d9400002e61463f2ed3) dm-0 IBM,1722-600 [size=5.0G][features=1 queue_if_no_path][hwhandler=0] \_ round-robin 0 [prio=1][enabled] \_ 0:0:0:0 sda 8:0 [active][ready] \_ round-robin 0 [prio=6][active] \_ 0:0:1:0 sdc 8:32 [active][ready] data (3600a0b8000177bcc0000256645f7f166) dm-1 IBM,1722-600 [size=9.0G][features=1 queue_if_no_path][hwhandler=0] \_ round-robin 0 [prio=6][enabled] \_ 0:0:0:1 sdb 8:16 [active][ready] \_ round-robin 0 [prio=1][enabled] \_ 0:0:1:1 sdd 8:48 [active][ready]
As seen above, I'have a new host and I can configure it or just use it with /dev/mapper/mpath2 if it's for a one time use ! Nice !
Using XFS
Resize partition
In order to resize a partition on the SAN, there is a solution pretty simple. We have data and system as multipath partition, we added 1G to data, so we need to rescan the whole stuff. First check scsi bus on which the partition is on :
bladeTest:/# multipath -ll sdc: checker msg is "readsector0 checker reports path is down" sdd: checker msg is "readsector0 checker reports path is down" system (3600a0b8000177d9400002e61463f2ed3) dm-0 IBM,1722-600 [size=5.0G][features=0][hwhandler=0] \_ round-robin 0 [prio=1][active] \_ 0:0:0:0 sda 8:0 [active][ready] \_ 0:0:1:0 sdc 8:32 [failed][faulty] data (3600a0b8000177bcc0000256645f7f166) dm-1 IBM,1722-600 [size=8.0G][features=0][hwhandler=0] \_ round-robin 0 [prio=1][active] \_ 0:0:0:1 sdb 8:16 [active][ready] \_ 0:0:1:1 sdd 8:48 [failed][faulty]
So we see that data uses bus 0:0:0:1 and 0:0:1:1. On the SAN side, the resizing process must be completed. We assume data is mounted on /srv.
We need to rescan the device like this :
bladeTest:/# echo 1 > /sys/bus/scsi/devices/0\:0\:0\:1/rescan bladeTest:/# echo 1 > /sys/bus/scsi/devices/0\:0\:1\:1/rescan
Then we umount the partition and rebuild the multipath
bladeTest:/# umount /srv/ bladeTest:/# multipath -f data bladeTest:/# multipath data sdc: checker msg is "readsector0 checker reports path is down" sdd: checker msg is "readsector0 checker reports path is down" sdd: checker msg is "readsector0 checker reports path is down" create: data (3600a0b8000177bcc0000256645f7f166) IBM,1722-600 [size=9.0G][features=0][hwhandler=0] \_ round-robin 0 [prio=1][undef] \_ 0:0:0:1 sdb 8:16 [undef][ready] \_ 0:0:1:1 sdd 8:48 [undef][faulty] bladeTest:/# mount srv/
This process is quite short but we can see that the size went to 9G, so that what we wanted. But if we look at the disk usage we see :
bladeTest:/# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/system1 4.7G 618M 3.9G 14% / tmpfs 1015M 0 1015M 0% /lib/init/rw udev 10M 84K 10M 1% /dev tmpfs 1015M 0 1015M 0% /dev/shm /dev/mapper/data 8.0G 384K 8.0G 1% /srv
The partition as not been resize … Why ? Well if you resize the underlying disk this doesn't mean that the filesystem on it has been resized. If you use filesystem like XFS, you can resize it when mounted :
bladeTest:/# xfs_growfs /srv/ meta-data=/dev/mapper/data isize=256 agcount=11, agsize=196608 blks = sectsz=512 attr=0 data = bsize=4096 blocks=2097152, imaxpct=25 = sunit=0 swidth=0 blks, unwritten=1 naming =version 2 bsize=4096 log =internal bsize=4096 blocks=2560, version=1 = sectsz=512 sunit=0 blks realtime =none extsz=65536 blocks=0, rtextents=0 data blocks changed from 2097152 to 2359296
And that's done. For othe filesystem see filesystem documentation.
Filesystem freeze
Filesystem freeze are designed to be used with system as snapshot/flashcopy. It makes the filesystem hangs all IO while something is working to backup. The data on the filesystem are not lost and when unfreezing takes action, the system runs normally. To freeze the filesystem, we just do
xfs_freeze -f /srv
and when we are finished with it, we unfreeze with
xfs_freeze -u /srv
And all it's ok!
See also
A lot of link, but I've got over 50 bookmarks just for multipath configuration. I kept only thoses I've been using really