Multipathing mit RHEL4 an DataCore SAN Symphony

First Steps

  • Minimal Installation RHEL 4
  • Installation/Update von
    • lsscsi-0.17-1tom.4EL
    • sysfsutils-1.2.0-1
    • device-mapper-multipath-0.4.5-16.1.RHEL4
    • device-mapper-1.02.07-4.0.RHEL4
    • kernel-smp-devel-2.6.9-42.0.10.EL
    • kernel-smp-2.6.9-42.0.10.EL, mkinitrd-4.2.1.8-1 1).
  • Reboot mit aktuellem Kernel

Siehe auch: http://lsky.info/?id=MultipathIBMDS4000

lsscsi

Wir wollen feststellen, was aktuell über SCSI gesehen wird. (Fiber Devices sind hier ebenfalls enthalten, also einfach alles, was unter /proc/scsi zu sehen ist).

lsscsi

[0:0:1:0]    no dev  DataCore SANsymphony            -
[1:0:0:0]    no dev  DataCore SANsymphony            -

qlogic Treiber Version

Im o.a. Kernel enthaltene qlogic Treiber Version ist 8.01.04-d7.

modprobe.conf

/etc/modprobe.conf

alias eth0 tg3
alias eth1 tg3
alias eth2 e1000
alias scsi_hostadapter cciss
alias scsi_hostadapter1 qla2300
options qla2xxx ql2xextended_error_logging=1 displayConfig=1 ql2xretrycount=30 ql2xfailover=0 ql2xlbType=0
alias usb-controller ehci-hcd
alias usb-controller1 uhci-hcd

Da der Failover vom multipathd durchgeführt werden soll, ist NICHT der ql2xfailover zu verwenden. (Alternativ könnte auch der qlogic non-failover driver verwendet werden).

qlogic Treiber Optionen

vgl. auch qlogic README

modinfo ql2xxx

filename:       /lib/modules/2.6.9-42.0.10.ELsmp/kernel/drivers/scsi/qla2xxx/qla2xxx.ko
parm:           ql2xmaxqdepth:Maximum queue depth to report for target devices.
parm:           ql2xlogintimeout:Login timeout value in seconds.
parm:           qlport_down_retry:Maximum number of command retries to a port that returnsa PORT-DOWN status.
parm:           ql2xretrycount:Maximum number of mid-layer retries allowed for a command.  Default value is 20,
parm:           displayConfig:If 1 then display the configuration used in /etc/modprobe.conf.
parm:           ql2xplogiabsentdevice:Option to enable PLOGI to devices that are not present after a Fabric scan.  This is needed for several broken switches.Default is 0 - no PLOGI. 1 - perfom PLOGI.
parm:           ql2xenablezio:Option to enable ZIO:If 1 then enable it otherwise use the default set in the NVRAM. Default is 0 : disabled
parm:           ql2xintrdelaytimer:ZIO: Waiting time for Firmware before it generates an interrupt to the host to notify completion of request.
parm:           ConfigRequired:If 1, then only configured devices passed in through theql2xopts parameter will be presented to the OS
parm:           Bind:Target persistent binding method: 0 by Portname (default); 1 by PortID; 2 by Nodename.
parm:           ql2xsuspendcount:Number of 6-second suspend iterations to perform while a target returns a <NOT READY> status.  Default is 10 iterations.
parm:           ql2xdoinitscan:Signal mid-layer to perform scan after driver load: 0 -- no signal sent to mid-layer.
parm:           ql2xloginretrycount:Specify an alternate value for the NVRAM login retry count.
parm:           ql2xprocessnotready:Option to disable handling of NOT-READY in the driver. Default is 1 - Handled by the driver. Set to 0 - Disable the handling inside the driver
parm:           ql2xprocessrscn:Option to enable port RSCN handling via a series of lessfabric intrusive ADISCs and PLOGIs.
parm:           extended_error_logging:Option to enable extended error logging, Default is 0 - no logging. 1 - log errors.
parm:           ql2xfwloadbin:Option to specify location in which to load ISP24xx firmware:  2 -- load firmware via the request_firmware() (hotplug)       interface.  A file, ql2400_fw.bin, (containing the       firmware image) should be hotplug accessible. 1 -- load firmware from flash. 0 -- load firmware embedded with driver (default).
parm:           ql2xfdmienable:Enables FDMI registratons Default is 0 - no FDMI. 1 - perfom FDMI.
author:         QLogic Corporation
description:    QLogic Fibre Channel HBA Driver
license:        GPL
version:        8.01.04-d7 1036BB22AA28C8D7180B7E1
vermagic:       2.6.9-42.0.10.ELsmp SMP 686 REGPARM 4KSTACKS gcc-3.4
depends:        scsi_mod,scsi_transport_fc

initrd anpassen

mkinitrd -v --with=dm-multipath /boot/initrd-2.6.9-42.0.10.ELsmp-qlx.img 2.6.9-42.0.10.ELsmp

Creating initramfs
Looking for deps of module scsi_mod
Looking for deps of module sd_mod        scsi_mod
Looking for deps of module scsi_mod
Looking for deps of module unknown
Looking for deps of module cciss         scsi_mod
Looking for deps of module scsi_mod
Looking for deps of module qla2300       qla2xxx scsi_transport_fc scsi_mod
Looking for deps of module qla2xxx       scsi_transport_fc scsi_mod
Looking for deps of module scsi_transport_fc     scsi_mod
Looking for deps of module scsi_mod
Looking for deps of module scsi_mod
Looking for deps of module scsi_transport_fc     scsi_mod
Looking for deps of module scsi_mod
Looking for deps of module scsi_mod
Looking for deps of module ide-disk
Looking for deps of module ext3  jbd
Looking for deps of module jbd
Looking for deps of module dm-multipath  dm-mod
Looking for deps of module dm-mod
Using modules:  ./kernel/drivers/scsi/scsi_mod.ko ./kernel/drivers/scsi/sd_mod.ko ./kernel/drivers/block/cciss.ko ./kernel/drivers/scsi/scsi_transport_fc.ko ./kernel/drivers/scsi/qla2xxx/qla2xxx.ko ./kernel/drivers/scsi/qla2xxx/qla2300.ko ./kernel/fs/jbd/jbd.ko ./kernel/fs/ext3/ext3.ko ./kernel/drivers/md/dm-mod.ko ./kernel/drivers/md/dm-multipath.ko
/sbin/nash -> /tmp/initrd.nu4456/bin/nash
/sbin/insmod.static -> /tmp/initrd.nu4456/bin/insmod
/sbin/udev.static -> /tmp/initrd.nu4456/sbin/udev
/etc/udev/udev.conf -> /tmp/initrd.nu4456/etc/udev/udev.conf
copy from /lib/modules/2.6.9-42.0.10.ELsmp/./kernel/drivers/scsi/scsi_mod.ko(elf32-i386) to /tmp/initrd.nu4456/lib/scsi_mod.ko(elf32-i386)
copy from /lib/modules/2.6.9-42.0.10.ELsmp/./kernel/drivers/scsi/sd_mod.ko(elf32-i386) to /tmp/initrd.nu4456/lib/sd_mod.ko(elf32-i386)
copy from /lib/modules/2.6.9-42.0.10.ELsmp/./kernel/drivers/block/cciss.ko(elf32-i386) to /tmp/initrd.nu4456/lib/cciss.ko(elf32-i386)
copy from /lib/modules/2.6.9-42.0.10.ELsmp/./kernel/drivers/scsi/scsi_transport_fc.ko(elf32-i386) to /tmp/initrd.nu4456/lib/scsi_transport_fc.ko(elf32-i386)
copy from /lib/modules/2.6.9-42.0.10.ELsmp/./kernel/drivers/scsi/qla2xxx/qla2xxx.ko(elf32-i386) to /tmp/initrd.nu4456/lib/qla2xxx.ko(elf32-i386)
copy from /lib/modules/2.6.9-42.0.10.ELsmp/./kernel/drivers/scsi/qla2xxx/qla2300.ko(elf32-i386) to /tmp/initrd.nu4456/lib/qla2300.ko(elf32-i386)
copy from /lib/modules/2.6.9-42.0.10.ELsmp/./kernel/fs/jbd/jbd.ko(elf32-i386) to /tmp/initrd.nu4456/lib/jbd.ko(elf32-i386)
copy from /lib/modules/2.6.9-42.0.10.ELsmp/./kernel/fs/ext3/ext3.ko(elf32-i386) to /tmp/initrd.nu4456/lib/ext3.ko(elf32-i386)
copy from /lib/modules/2.6.9-42.0.10.ELsmp/./kernel/drivers/md/dm-mod.ko(elf32-i386) to /tmp/initrd.nu4456/lib/dm-mod.ko(elf32-i386)
copy from /lib/modules/2.6.9-42.0.10.ELsmp/./kernel/drivers/md/dm-multipath.ko(elf32-i386) to /tmp/initrd.nu4456/lib/dm-multipath.ko(elf32-i386)
Loading module scsi_mod
Loading module sd_mod
Loading module cciss
Loading module scsi_transport_fc
Loading module qla2xxx
Loading module qla2300
Loading module jbd
Loading module ext3
Loading module dm-mod
Loading module dm-multipath

und reboot....

 Bootvorgang mit dm-multipath etc.

SANsympony - SAN Manager

  • Via Channels → AppServers dem passenden Application Server den Typ Linux oder VMware zuweisen.

Linux

  • 3PAP Mirror Volume über SAN Mappings dem AppServer zuweisen

SCSI Bus Rescan

rescan-scsi.sh

rescan-scsi-bus.sh - syslog

Apr 18 20:00:23 localhost kernel: qla2300 0000:07:01.0: scsi(0:0:0:0): Enabled tagged queuing, queue depth 32.
Apr 18 20:00:23 localhost kernel:   Vendor: DataCore  Model: SANsymphony       Rev: DCS
Apr 18 20:00:23 localhost kernel:   Type:   Direct-Access                      ANSI SCSI revision: 04
Apr 18 20:00:23 localhost kernel: qla2300 0000:07:01.0: scsi(0:0:1:0): Enabled tagged queuing, queue depth 32.
Apr 18 20:00:23 localhost kernel: SCSI device sda: 50331648 512-byte hdwr sectors (25770 MB)
Apr 18 20:00:23 localhost kernel: SCSI device sda: drive cache: write back
Apr 18 20:00:23 localhost kernel: SCSI device sda: 50331648 512-byte hdwr sectors (25770 MB)
Apr 18 20:00:23 localhost kernel: SCSI device sda: drive cache: write back
Apr 18 20:00:23 localhost kernel:  sda: sda1
Apr 18 20:00:23 localhost kernel: Attached scsi disk sda at scsi0, channel 0, id 1, lun 0
Apr 18 20:00:23 localhost kernel: qla2300 0000:07:01.0: scsi(0:0:2:0): Enabled tagged queuing, queue depth 32.
Apr 18 20:00:23 localhost scsi.agent[4850]: disk at /devices/pci0000:00/0000:00:04.0/0000:06:00.0/0000:07:01.0/host0/target0:0:1/0:0:1:0
Apr 18 20:00:23 localhost kernel:   Vendor: DataCore  Model: SANsymphony       Rev: DCS
Apr 18 20:00:23 localhost kernel:   Type:   Direct-Access                      ANSI SCSI revision: 04
Apr 18 20:00:23 localhost kernel: qla2300 0000:07:01.1: scsi(1:0:0:0): Enabled tagged queuing, queue depth 32.
Apr 18 20:00:23 localhost kernel: SCSI device sdb: 50331648 512-byte hdwr sectors (25770 MB)
Apr 18 20:00:23 localhost kernel: SCSI device sdb: drive cache: write back
Apr 18 20:00:23 localhost kernel: SCSI device sdb: 50331648 512-byte hdwr sectors (25770 MB)
Apr 18 20:00:23 localhost kernel: SCSI device sdb: drive cache: write back
Apr 18 20:00:23 localhost scsi.agent[5021]: disk at /devices/pci0000:00/0000:00:04.0/0000:06:00.0/0000:07:01.1/host1/target1:0:0/1:0:0:0
Apr 18 20:00:23 localhost kernel:  sdb: sdb1
Apr 18 20:00:23 localhost kernel: Attached scsi disk sdb at scsi1, channel 0, id 0, lun 0

scsi_id

scsi_id

[root@localhost bin]# scsi_id -g -u -s /block/sda
360030d904d4952524f525f4150000000
[root@localhost bin]# scsi_id -g -u -s /block/sdb
360030d904d4952524f525f4150000000

Die Platten sda und sdb sind - wie zu erwarten - identisch, sie sind ja über 2 Pfade angeschlossen.

Auszug aus der man Page

In order to generate unique values for either page 0×80 or page 0×83, the serial numbers or world wide names are prefixed as follows.

Identifiers based on page 0×80 are prefixed by the character ‘S’, the SCSI vendor, the SCSI product (model) and then the the serial number returned by page 0×80. For example:

$ multipath -ll
mpathDS (3600a0b800017738900000008436a049c)
[size=100 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 2:0:0:0 sdb 8:16 [active][ready]
\_ round-robin 0 [enabled]
 \_ 3:0:0:0 sdc 8:32 [active][ready]

$ scsi_id -p 0x83 -g -u -s /block/sdb
3600a0b800017738900000008436a049c
$ scsi_id -p 0x83 -g -u -s /block/sdc
3600a0b800017738900000008436a049c
$ scsi_id -p 0x80 -g -u -s /block/sdb
SIBM_____1722-600_______1T45243281______
$ scsi_id -p 0x80 -g -u -s /block/sdc
SIBM_____1722-600_______1T53681398______

Hier sieht man deutlich, mit welchem Befehl die Gruppierung vorgenommen werden sollte. In diesem Falle wählt man den “-p 0×83” Befehl, beim “-p 0×80” wird die Seriennummer des Controllers der DS4300 angezeigt. Dies kann auch hilfreich sein, um einen bestimmten Pfad zu finden.

Identifiers based on page 0×83 are prefixed by the identifier type followed by the page 0×83 identifier. For example, a device with a NAA (Name Address Authority) type of 3 (also in this case the page 0×83 identifier starts with the NAA value of 6):

# /sbin/scsi_id -p 0x83 -s /block/sdg
3600a0b80000b174b000000d63efc5c8c

Beispiel:

scsi_id -g -p 0x80 -u -s /block/sda

multipath.conf

multipath policies

failover 1 path per priority group (active/passive)
multibus all paths in 1 priority group (active/active)
group_by_serial1 priority group per serial
group_by_prio1 priority group per priority value. Priorities are determined by callout programs specified as a global, per-controler or per-multi-path option in the configuration file
group_by_node_name 1 priority group per target node name. Target node names are fetched in /sys/class/fc_transport/target*/node_name.

bindings

/var/lib/multipath/bindings enthält die Zuordnung welcher user-friendly-name zu einer wwid gehört. Diese Datei wird vom Multipath Programm selbst gewartet, man kann aber diese Datei auch selbst bearbeiten, um auf allen Knoten immer diesselbe Zuordung zu erhalten.

bindings

# alias wwid
#
3pap0  360030d90335041500000000000000000

Test

multipath -v 2 -d

[size=24 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 0:0:1:0 sda 8:0  [active][ready]
\_ round-robin 0 [enabled]
 \_ 1:0:0:0 sdb 8:16 [active][ready]

Vorbereitung Produktion - rc.sysinit

  • chkconfig multipathd on

Da beim Systemstart die multipath-Devices erzeugt werden, und der multipath.static auf das Dateisystem schreiben muss, ist die Datei /etc/rc.sysinit (alternativ auch das init-script) anzupassen.

:!: Achtung: Da, wo “Setting up LVM” ausgeführt ist, ist nur / (das root-Filesystem) gemountet. Daher ist – sofern man die bindings in der /etc/fstab verwendet, die Datei ins Root-Filesystem zu legen (nicht wie in default==/var/lib/multipath/bindings).
touch /etc/multipath.bindings
ln -s /etc/multipath.bindings /var/lib/multipath/bindings

/etc/rc.sysinit

# Multipath initialization
if [ -x /sbin/multipath.static ]; then
    if ! LC_ALL=C fgrep -q "dm_multipath" /proc/modules 2>/dev/null ; then
        modprobe dm-multipath >/dev/null 2>&1
    fi
    if [ -x /sbin/multipath.static ]; then
        if /sbin/multipath.static -v0 -b /etc/multipath.bindings > /dev/null 2>&1 ; then
            action $"Setting up Multipathing:" /sbin/multipath.static -v0 -b /etc/multipath.bindings
        fi
    fi
fi

Produktion

multipath device sind da

multipath -ll

mpath0 (360030d904d4952524f525f4150000000)
[size=24 GB][features="1 queue_if_no_path"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 0:0:1:0 sda 8:0  [active][ready]
\_ round-robin 0 [enabled]
 \_ 1:0:0:0 sdb 8:16 [active][ready]

LC_ALL=C fdisk -l /dev/mapper/mpath0

Disk /dev/mapper/mpath0: 25.7 GB, 25769803776 bytes
255 heads, 63 sectors/track, 3133 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

              Device Boot      Start         End      Blocks   Id  System
/dev/mapper/mpath0p1               1        3132    25157758+   7  HPFS/NTFS

Failover Test

Zu diesem Zweck wird ein SDS vom Netz genommen bzw. ein “Local Stop” durchgeführt.

Während des gesamten Test werden rekursiv Daten auf die gemountete Multipath Partition geschrieben. Zu KEINER Zeit wurde der Schreibprozess unter- bzw. abgebrochen. Die Datenintegrität blieb gewährleistet.

multipath_checker.sh

#!/bin/sh

A=0
while :
do
  multipath -ll > A_$A.log
  sleep 10
  multipath -ll > B_$A.log
  diff B_$A.log A_$A.log
  if [ "$?" != "0" ] ; then
        A=$((A + 1))
  fi
done

** initial alles ok **

3pap0 (360030d90335041500000000000000000)
[size=6 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 0:0:1:0 sda 8:0  [active][ready]
\_ round-robin 0 [enabled]
 \_ 1:0:0:0 sdb 8:16 [active][ready]

**spiegel defekt**

3pap0 (360030d90335041500000000000000000)
[size=6 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [enabled]
 \_ 0:0:1:0 sda 8:0  [failed][faulty]
\_ round-robin 0 [active]
 \_ 1:0:0:0 sdb 8:16 [active][ready]

**wieder da**

3pap0 (360030d90335041500000000000000000)
[size=6 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 0:0:1:0 sda 8:0  [active][ready]
\_ round-robin 0 [enabled]
 \_ 1:0:0:0 sdb 8:16 [active][ready]

syslog

Apr 19 15:22:07 localhost multipathd: 8:0: tur checker reports path is up
Apr 19 15:22:07 localhost multipathd: 8:0: reinstated
Apr 19 15:22:07 localhost multipathd: 3pap0: remaining active paths: 2
Apr 19 15:22:17 localhost multipathd: 3pap0: switch to path group #1

Jetzt wird der andere SDS gestoppt.

syslog

Apr 19 15:35:19 localhost kernel: device-mapper: dm-multipath: Failing path 8:16.
Apr 19 15:35:19 localhost multipathd: 8:16: tur checker reports path is down
Apr 19 15:35:19 localhost multipathd: checker failed path 8:16 in map 3pap0
Apr 19 15:35:19 localhost multipathd: 3pap0: remaining active paths: 1

der zweite pfad ist weg

3pap0 (360030d90335041500000000000000000)
[size=6 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 0:0:1:0 sda 8:0  [active][ready]
\_ round-robin 0 [enabled]
 \_ 1:0:0:0 sdb 8:16 [active][faulty]

3pap0 (360030d90335041500000000000000000)
[size=6 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 0:0:1:0 sda 8:0  [active][ready]
\_ round-robin 0 [enabled]
 \_ 1:0:0:0 sdb 8:16 [failed][faulty]

recovery in progress

recovery finished

es ist alles wieder ok

3pap0 (360030d90335041500000000000000000)
[size=6 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 0:0:1:0 sda 8:0  [active][ready]
\_ round-robin 0 [enabled]
 \_ 1:0:0:0 sdb 8:16 [active][ready]

syslog

Apr 19 15:44:27 localhost multipathd: 8:16: tur checker reports path is up
Apr 19 15:44:27 localhost multipathd: 8:16: reinstated
Apr 19 15:44:27 localhost multipathd: 3pap0: remaining active paths: 2

... und noch mal den anderen Server weg ...

syslog

Apr 19 15:48:55 localhost kernel: SCSI error : <0 0 1 0> return code = 0x20000
Apr 19 15:48:55 localhost kernel: end_request: I/O error, dev sda, sector 9165679
Apr 19 15:48:55 localhost kernel: device-mapper: dm-multipath: Failing path 8:0.
Apr 19 15:48:55 localhost kernel: end_request: I/O error, dev sda, sector 9165687
Apr 19 15:48:55 localhost kernel: SCSI error : <0 0 1 0> return code = 0x20000
...
Apr 19 15:49:39 localhost kernel: SCSI error : <0 0 1 0> return code = 0x10000
Apr 19 15:49:39 localhost kernel: end_request: I/O error, dev sda, sector 10055935
Apr 19 15:49:39 localhost kernel: end_request: I/O error, dev sda, sector 10055943
Apr 19 15:50:04 localhost multipathd: 3pap0: switch to path group #1

multipath -ll

3pap0 (360030d90335041500000000000000000)
[size=6 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [enabled]
 \_ 0:0:1:0 sda 8:0  [failed][faulty]
\_ round-robin 0 [active]
 \_ 1:0:0:0 sdb 8:16 [active][ready]

SDS ist wieder in Produktion...

syslog

Apr 19 15:59:06 localhost multipathd: 8:0: tur checker reports path is up
Apr 19 15:59:06 localhost multipathd: 8:0: reinstated
Apr 19 15:59:06 localhost multipathd: 3pap0: remaining active paths: 2
Apr 19 15:59:16 localhost multipathd: 3pap0: switch to path group #1

**alles wieder prima**

3pap0 (360030d90335041500000000000000000)
[size=6 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 0:0:1:0 sda 8:0  [active][ready]
\_ round-robin 0 [enabled]
 \_ 1:0:0:0 sdb 8:16 [active][ready]
 
wissen/datacore/multipathing_mit_redhat_enterprise_linux_und_device_mapper_an_datacore_sansymphony_sds.txt · Zuletzt geändert: 05.09.2010 20:31
 
Recent changes RSS feed Donate Valid XHTML 1.0 Valid CSS Recent cached RSS feed cacert-signed web site: inhalt.serviert.de