Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 953343 - [RFE] Hypervisor RHEV-H only connects to the Dell Equallogic using one connection. it is expected to do multipathing and have 4 connection to it.
Summary: [RFE] Hypervisor RHEV-H only connects to the Dell Equallogic using one connec...
Keywords:
Status: CLOSED DUPLICATE of bug 1053900
Alias: None
Product: oVirt
Classification: Retired
Component: ovirt-engine-core
Version: 3.1 RC
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: 3.4.0
Assignee: Sergey Gotliv
QA Contact: Leonid Natapov
URL:
Whiteboard: storage
Depends On:
Blocks: 1053900
TreeView+ depends on / blocked
 
Reported: 2013-04-18 01:40 UTC by Simon Kong Win Chang
Modified: 2014-03-10 12:13 UTC (History)
25 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
: 1053900 (view as bug list)
Environment:
Last Closed: 2014-01-25 18:52:32 UTC
oVirt Team: ---


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Bugzilla 753541 None None None Never

Description Simon Kong Win Chang 2013-04-18 01:40:05 UTC
Description of problem:

Hypervisor RHEV-H only connects to the Dell Equallogic using one connection. it is expected to do multipathing and have 4 connection to it.


Version-Release number of selected component (if applicable):
rhevh-6.4-20120318.1.el6_4


How reproducible:


Steps to Reproduce:
1. install rhev hypervisor
2. connect to rhev-m
3. configure iscsi storage
  
Actual results:

hypervisor only have one connection to Equallogic (i.e not using multipath)

Expected results:

suppose to have 4 connection using multipathing to the Dell Equallogic

Additional info:

=======================
the way that this was "fixed" (probably not the optimal config, but it works)

configure from console (F2) of the Hypervisor

-----------------------------------------------------------------------
[root@vh-7 network-scripts]# vi /var/lib/iscsi/ifaces/iSCSI:1
# BEGIN RECORD 6.2.0-873.2.el6
iface.iscsi_ifacename = iSCSI:1
iface.net_ifacename = iSCSI
iface.transport_name = tcp
iface.vlan_id = 0
iface.vlan_priority = 0
iface.iface_num = 0
iface.mtu = 0
iface.port = 0
# END RECORD

-----------------------------------------------------------------------
[root@vh-7 network-scripts]# vi /var/lib/iscsi/ifaces/iSCSI:2
# BEGIN RECORD 6.2.0-873.2.el6
iface.iscsi_ifacename = iSCSI:2
iface.net_ifacename = iSCSI
iface.transport_name = tcp
iface.vlan_id = 0
iface.vlan_priority = 0
iface.iface_num = 0
iface.mtu = 0
iface.port = 0
# END RECORD

-----------------------------------------------------------------------
[root@vh-7 admin]# vi /var/lib/iscsi/ifaces/iSCSI:3
# BEGIN RECORD 6.2.0-873.2.el6
iface.iscsi_ifacename = iSCSI:3
iface.net_ifacename = iSCSI
iface.transport_name = tcp
iface.vlan_id = 0
iface.vlan_priority = 0
iface.iface_num = 0
iface.mtu = 0
iface.port = 0
# END RECORD

-----------------------------------------------------------------------
# Add the folowing under section "devices {}"
[root@vh-7 ~]# vi /etc/multipath.conf
device {
	vendor			"EQLOGIC"
	product			"100E-00"
	path_grouping_policy	multibus
	getuid_callout          "/sbin/scsi_id --whitelisted --replace-whitespace --device=/dev/%n"
	features		"1 queue_if_no_path"
	path_checker		readsector0
	path_selector		"round-robin 0"
	failback		"immediate"
	rr_min_io		10
	rr_weight		priorities
}

-----------------------------------------------------------------------
vi /etc/mount.sh
#!/bin/sh

sleep 2m

iscsiadm -m discovery -t st -p 192.168.XXX.XXX:3260

iscsiadm -m node -l -T iqn.2001-05.com.equallogic:0-XXXXX-XXXX-XXXXXXXa-stor -p 192.168.XXX.XXX:3260

-----------------------------------------------------------------------
vi /etc/rc.local
nohup sh /etc/mount.sh > /dev/null &


-----------------------------------------------------------------------
vi /config/files
/etc/rc.local
/etc/mount.sh
/etc/multipath.conf
/var/lib/iscsi/ifaces/iSCSI:1
/var/lib/iscsi/ifaces/iSCSI:2
/var/lib/iscsi/ifaces/iSCSI:3

-----------------------------------------------------------------------
chmod +x /etc/mount.sh
cp /etc/rc.local /config/etc/rc.local
cp /etc/mount.sh /config/etc/mount.sh
cp /etc/multipath.conf /config/etc/multipath.conf
cp /var/lib/iscsi/ifaces/iSCSI:1 /config/var/lib/iscsi/ifaces/iSCSI:1
cp /var/lib/iscsi/ifaces/iSCSI:2 /config/var/lib/iscsi/ifaces/iSCSI:2
cp /var/lib/iscsi/ifaces/iSCSI:3 /config/var/lib/iscsi/ifaces/iSCSI:3

Comment 1 Mike Burns 2013-04-18 13:04:11 UTC
We do have RFE's filed (don't have the numbers handy) for allowing changes to multipath.conf from the TUI.  

How does RHEL/Fedora work?  Do you have to write those files manually?

This seems more like something that needs to be configured in the oVirt Engine/RHEV-M configuration and pushed down to all hypervisors, rather than in ovirt-node/RHEV-H directly.

Comment 2 Simon Kong Win Chang 2013-04-18 23:13:10 UTC
i configured the storage on the manager first, then the hypervisor was only have one connection to the Dell EqualLogic.

i then did the modification above to have it have 4 connection to the Equallogic.

yes the config should be from the manager, hence why i submitted this. also to help anyone with an equallogic make their setup work while waiting for this feature to be implemented to the manager.

also, the disadvantage now, is, every time i am to upgrade the hypervisor, i will need to manually go in and change those config again.

we are running the "Red Hat Enterprise Virtualization Manager" but i don't have access to submit a bug in that section, hence why i submitted it in the ovirt section.

Comment 3 Mike Burns 2013-04-25 15:17:20 UTC
Moving this to ovirt-engine for triage.

As for issues with RHEV-M, you should file a support ticket with RH Support.  They can get a bug filed for you.

Comment 4 Simon Kong Win Chang 2013-04-25 23:26:05 UTC
i have, case number 00811800

Comment 5 Simon Kong Win Chang 2013-05-09 00:07:22 UTC
p.s upgrading the hypervisor using the RHEV-Manager did not wipe the config. still worked fine after the upgrade.

Comment 6 Simon Kong Win Chang 2013-05-09 00:12:17 UTC
added the 2 mkdir line in this section


chmod +x /etc/mount.sh
cp /etc/rc.local /config/etc/rc.local
cp /etc/mount.sh /config/etc/mount.sh
cp /etc/multipath.conf /config/etc/multipath.conf
==mkdir /config/var/lib/iscsi==
==mkdir /config/var/lib/iscsi/ifaces==
cp /var/lib/iscsi/ifaces/iSCSI:1 /config/var/lib/iscsi/ifaces/iSCSI:1
cp /var/lib/iscsi/ifaces/iSCSI:2 /config/var/lib/iscsi/ifaces/iSCSI:2
cp /var/lib/iscsi/ifaces/iSCSI:3 /config/var/lib/iscsi/ifaces/iSCSI:3

Comment 7 Sebastian Antunez 2013-06-23 17:53:14 UTC
Hello

I have installed RHEV 3.1 connected to an Equallogic storage, but only a connection is active. When configuring RHEV as the standard process under Redhat problem persists.

Is there any additional procedure to have 2 visible and not one path.

Regards

Comment 8 Simon Kong Win Chang 2013-06-25 01:39:41 UTC
the workaround posted above is to add 3 path, therefore a total of 4.

if you only need two path, create all the above except for /var/lib/iscsi/ifaces/iSCSI:2 and /var/lib/iscsi/ifaces/iSCSI:3

Comment 9 Donald Williams 2013-07-02 02:31:00 UTC
One thing I didn't see is the contents of the /etc/sysctl.conf file. 

When you have multiple NICs on the same subnet you need to add the following to the /etc/sysctl.conf file. 

As a test you should be able to ping -I <iscsi eth ports> <EQL Group IP>  Try all the ports. Without the settings below, only one interface will work. 

# EQL entries
net.ipv4.conf.all.arp_ignore = 1
net.ipv4.conf.all.arp_announce = 2
net.ipv4.conf.all.rp_filter = 2

Optionally I have found the following helpful with Oracle OVMS and SQL servers. 
Also added to the /etc/sysctl.conf

# Increase network buffer sizes
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 8192 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.core.wmem_default = 262144
net.core.rmem_default = 262144

Run sysctl -p afterwards and try the ping again.  If that works then re-do the Discovery and login.

Comment 10 Donald Williams 2013-07-02 02:37:41 UTC
Something else I just noticed, you ifaces iscsi_ifacenanme & net_ifacename aren't the same. I've never seen that work. Though I'm used to RHEL or Oracle Linux, ubuntu, etc... 

iface.iscsi_ifacename = iSCSI:3
iface.net_ifacename = iSCSI

On a standard config they are: 

# cat /etc/iscsi/ifaces/eth0
# BEGIN RECORD 2.0-873
iface.iscsi_ifacename = eth0
iface.net_ifacename = eth0
iface.transport_name = tcp
iface.vlan_id = 0
iface.vlan_priority = 0
iface.iface_num = 0
iface.mtu = 0
iface.port = 0
# END RECORD

Comment 11 Simon Kong Win Chang 2013-07-02 05:31:04 UTC
Sorry, i should have given a bit more network configuration information with that.


the "iSCSI" is actually the name of the network interface which is configured with an ip address. the iSCSI interface is actually a bridge interface build on top of a LACP bond interface, which in turn is build on top of the physical interface. the config might make is less confusing.

emX interface are the physical interface.

bondX.X are the LACP bonding interface. we are actually bonding 4 physical interface.

iSCSI interface is actually the bridge interface. it could have been labled as br0.130 instead. if this was the case, then the config files would have been

# cat /etc/iscsi/ifaces/br0.130:1
# cat /etc/iscsi/ifaces/br0.130:2
# cat /etc/iscsi/ifaces/br0.130:3

the ":X" after the interface are usually use to configure alias interface in the network config. although in this case, no alias interface was configured in /etc/sysconfig/network-scripts/


=============network config files==============

# cat /etc/sysconfig/network-scripts/ifcfg-em1
DEVICE=em1
ONBOOT=yes
HWADDR=XX:XX:XX:XX:XX:XX
MASTER=bond0
SLAVE=yes
MTU=9000
NM_CONTROLLED=no
STP=no
PEERDNS=no

# cat /etc/sysconfig/network-scripts/ifcfg-em2
DEVICE=em2
ONBOOT=yes
HWADDR=XX:XX:XX:XX:XX:XX
MASTER=bond0
SLAVE=yes
MTU=9000
NM_CONTROLLED=no
STP=no
PEERDNS=no

# cat /etc/sysconfig/network-scripts/ifcfg-bond0
DEVICE=bond0
ONBOOT=yes
BONDING_OPTS=mode=4
MTU=9000
NM_CONTROLLED=no
STP=no
PEERDNS=no

# cat /etc/sysconfig/network-scripts/ifcfg-bond0.130
DEVICE=bond0.130
ONBOOT=yes
VLAN=yes
BRIDGE=iSCSI
MTU=9000
NM_CONTROLLED=no
STP=no
PEERDNS=no

# cat /etc/sysconfig/network-scripts/ifcfg-iSCSI
DEVICE=iSCSI
ONBOOT=yes
TYPE=Bridge
DELAY=0
IPADDR=192.168.XXX.XXX
NETMASK=255.255.255.0
BOOTPROTO=none
MTU=9000
NM_CONTROLLED=no
STP=no
PEERDNS=no

====================================================
p.s ping works fine

a small extract of the output of
# iscsiadm -m session -P3

Target: iqn.2001-05.com.equallogic:0-XXXX
        Current Portal: 192.168.XXX.201:3260,1
        Persistent Portal: 192.168.XXX.200:3260,1
                **********
                Interface:
                **********
                Iface Name: default
                Iface Transport: tcp
                Iface Initiatorname: iqn.1994-05.com.XXXX
                Iface IPaddress: 192.168.XXX.148
                Iface HWaddress: <empty>
                Iface Netdev: <empty>
                SID: 1
                iSCSI Connection State: LOGGED IN
                iSCSI Session State: LOGGED_IN
.
.
.
       Current Portal: 192.168.XXX.202:3260,1
        Persistent Portal: 192.168.XXX.200:3260,1
                **********
                Interface:
                **********
                Iface Name: iSCSI:1
                Iface Transport: tcp
                Iface Initiatorname: iqn.1994-05.com.XXXX
                Iface IPaddress: 192.168.XXX.148 
                Iface HWaddress: <empty> 
                Iface Netdev: iSCSI
                SID: 2
.
.
.
        Current Portal: 192.168.XXX.203:3260,1
        Persistent Portal: 192.168.XXX.200:3260,1
                **********
                Interface:
                **********
                Iface Name: iSCSI:2
                Iface Transport: tcp
                Iface Initiatorname: iqn.1994-05.com.XXXX
                Iface IPaddress: 192.168.XXX.148
                Iface HWaddress: <empty>
                Iface Netdev: iSCSI
                SID: 3
.
.
.
        Current Portal: 192.168.XXX.204:3260,1
        Persistent Portal: 192.168.XXX.200:3260,1
                **********
                Interface:
                **********
                Iface Name: iSCSI:3
                Iface Transport: tcp
                Iface Initiatorname: iqn.1994-05.com.XXXX
                Iface IPaddress: 192.168.XXX.148
                Iface HWaddress: <empty>
                Iface Netdev: iSCSI
                SID: 4
.
.
.


===============
statistics info from the equallogic confirms that there is data transfer between all the interface.

hope this helps.

Comment 12 Simon Kong Win Chang 2013-07-02 05:33:19 UTC
just to make is less confusing
the 192.168.XXX.148 is assagined to the iSCSI network interface


# cat /etc/sysconfig/network-scripts/ifcfg-iSCSI
DEVICE=iSCSI
ONBOOT=yes
TYPE=Bridge
DELAY=0
IPADDR=192.168.XXX.148
NETMASK=255.255.255.0
BOOTPROTO=none
MTU=9000
NM_CONTROLLED=no
STP=no
PEERDNS=no

Comment 13 Donald Williams 2013-07-02 13:30:27 UTC
I have never heard of anyone tring to run MPIO over a bonded interface with an EQL array.  That does explain why ping works.

You're not generating new iSCSI sessions. EQL doesn't support bonding, so traffic over that bond isn't being spread over the available ports. One session ends up on one physical interface on the group. 

With access to each interface, open-iscsi will be able create unique sessions over those ports. Then devmapper can create multipath devices correct. 

I have two NICs definted for iSCSI.  eth0/eth1
When I discover them I get one entry for each defined interface. 

for example: 
# iscsiadm -m node
172.23.151.120:3260,1 iqn.2001-05.com.equallogic:4-52aed6-d8e3e0689-74c004a89f851c27-ubuntu-4k-vol-test
172.23.151.120:3260,1 iqn.2001-05.com.equallogic:4-52aed6-d8e3e0689-74c004a89f851c27-ubuntu-4k-vol-test

When I login, I get two logins, so now I have two iSCSI connections. 

iscsiadm -m node -T iqn.2001-05.com.equallogic:4-52aed6-d8e3e0689-74c004a89f851c27-ubuntu-4k-vol-test -l
Logging in to [iface: eth0, target: iqn.2001-05.com.equallogic:4-52aed6-d8e3e0689-74c004a89f851c27-ubuntu-4k-vol-test, portal: 172.23.151.120,3260] (multiple)
Logging in to [iface: eth1, target: iqn.2001-05.com.equallogic:4-52aed6-d8e3e0689-74c004a89f851c27-ubuntu-4k-vol-test, portal: 172.23.151.120,3260] (multiple)
Login to [iface: eth0, target: iqn.2001-05.com.equallogic:4-52aed6-d8e3e0689-74c004a89f851c27-ubuntu-4k-vol-test, portal: 172.23.151.120,3260] successful.
Login to [iface: eth1, target: iqn.2001-05.com.equallogic:4-52aed6-d8e3e0689-74c004a89f851c27-ubuntu-4k-vol-test, portal: 172.23.151.120,3260] successful.


#multipath -ll
mpath4 (364ed2a9568e0e3d8271c859fa804c074) dm-2 EQLOGIC,100E-00
size=20G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  |- 15:0:0:0 sdd 8:48 active ready  running
  `- 16:0:0:0 sde 8:64 active ready  running

Now multipathd will round robin IO across those two iSCSI sessions. 

Compare my session output to yours. 

# iscsiadm -m session -P3
iSCSI Transport Class version 2.0-870
version 2.0-873
Target: iqn.2001-05.com.equallogic:4-52aed6-d8e3e0689-74c004a89f851c27-ubuntu-4k-vol-test
        Current Portal: 172.23.151.121:3260,1
        Persistent Portal: 172.23.151.120:3260,1
                **********
                Interface:
                **********
                Iface Name: eth0
                Iface Transport: tcp
                Iface Initiatorname: iqn.1993-08.org.debian:8ab9cf5340f0
                Iface IPaddress: 172.23.71.231
                Iface HWaddress: <empty>
                Iface Netdev: eth0
                SID: 11
                iSCSI Connection State: LOGGED IN
                iSCSI Session State: LOGGED_IN
                Internal iscsid Session State: NO CHANGE
                *********
                Timeouts:
                *********
                Recovery Timeout: 120
                Target Reset Timeout: 30
                LUN Reset Timeout: 30
                Abort Timeout: 15
                *****
                CHAP:
                *****
                username: <empty>
                password: ********
                username_in: <empty>
                password_in: ********
                ************************
                Negotiated iSCSI params:
                ************************
                HeaderDigest: None
                DataDigest: None
                MaxRecvDataSegmentLength: 262144
                MaxXmitDataSegmentLength: 65536
                FirstBurstLength: 65536
                MaxBurstLength: 262144
                ImmediateData: Yes
                InitialR2T: No
                MaxOutstandingR2T: 1
                ************************
                Attached SCSI devices:
                ************************
                Host Number: 15 State: running
                scsi15 Channel 00 Id 0 Lun: 0
                        Attached scsi disk sdd          State: running
        Current Portal: 172.23.151.122:3260,1
        Persistent Portal: 172.23.151.120:3260,1
                **********
                Interface:
                **********
                Iface Name: eth1
                Iface Transport: tcp
                Iface Initiatorname: iqn.1993-08.org.debian:8ab9cf5340f0
                Iface IPaddress: 172.23.74.186
                Iface HWaddress: <empty>
                Iface Netdev: eth1
                SID: 12
                iSCSI Connection State: LOGGED IN
                iSCSI Session State: LOGGED_IN
                Internal iscsid Session State: NO CHANGE
                *********
                Timeouts:
                *********
                Recovery Timeout: 120
                Target Reset Timeout: 30
                LUN Reset Timeout: 30
                Abort Timeout: 15
                *****
                CHAP:
                *****
                username: <empty>
                password: ********
                username_in: <empty>
                password_in: ********
                ************************
                Negotiated iSCSI params:
                ************************
                HeaderDigest: None
                DataDigest: None
                MaxRecvDataSegmentLength: 262144
                MaxXmitDataSegmentLength: 65536
                FirstBurstLength: 65536
                MaxBurstLength: 262144
                ImmediateData: Yes
                InitialR2T: No
                MaxOutstandingR2T: 1
                ************************
                Attached SCSI devices:
                ************************
                Host Number: 16 State: running
                scsi16 Channel 00 Id 0 Lun: 0
                        Attached scsi disk sde          State: running

Comment 14 Donald Williams 2013-07-02 17:33:40 UTC
OK, I see what you did, but why do you have to create the BOND?  Why can't you use the network interfaces via their HyperVisor object names?

Comment 15 Stefan Schueffler 2014-01-20 09:55:47 UTC
From my experience, having a bonding interface in 802.3ad mode (having for example 4 real network devices) does not work well with an EqualLogic. 

In this case, there will just be one iscsi session, distributed on the server side according to the bonding logic. But on the EqualLogic-side the session will just be assigned to _one_ of the interfaces (as it is only one iscsi-session instead of 4) as the EqualLogic does not support "distribution of one session to multiple interfaces), so one would not gain performance improvements at all.

A better approach is to have 4 distinct paths as in comment 13 by Donald Williams. This works well as long as you only have _one_ EqualLogic. As soon as you have more than one stacked up, this setup again does not work and scale well. 
Explanation: as soon as your data is distributed on more than one EqualLogic, there will be extra "intra-equallogic-traffic" if you request some data without any further knowledge on which EqualLogic the actual block of data is stored. If your iscsi session is connected to EQL 1, and you are requesting some blocks of data located at EQL 2, then EQL 1 is requesting this data by EQL2, and sending it back to the requesting server resulting in lot of intra-eql-traffic.

The only available approach (as to my knowledge) is to use the "intelligent" Dell EqualLogic driver called "Linux Host Integration Tool Kit" (Aka Dell Linux HIT Kit). This driver is some optimized kind of multipath which maintains server side a mapping table which block of data is stored on which EQL, and directly connects the iscsi session to the appropriate one thus avoiding all the extra intra eql traffic at all.

This driver is available for RHEL 6.x and - as far as it is on my personal whichlist - should be integrated into rhev-h (and actively be used if the user indicates an EQL as iscsi storage array).

Beside the intra-traffic optimization the HIT-kit also optimizes some low level iscsi and network settings to optimize throughput similar to the ones in comment 9 (net.ipv4.conf.all.arp_ignore, arp_announce, rp_filter, ...)

Comment 16 Donald Williams 2014-01-20 13:05:06 UTC
Re: Stephan comment about MPIO with mulitple members in a pool. 
While HIT/LE optimizes MPIO in a multiple member pool, so that IO requests are sent directly to the member with data, even without it, the intergroup connection, he mentioned, known as the MESH connection, works very efficiently.  Especially on writes and when Jumbo Frames are available.  The array knows where the blocks need to go and move them w/o much overhead.  This has been the design since day one.  

The next version of HIT/LE v1.3, (Host Integration Tools/Linux Edition) is due out soon.  That will support RHEL v5.x/6.x, OEL 5.x/6.x, and SuSE.

Comment 17 Sergey Gotliv 2014-01-25 18:52:32 UTC

*** This bug has been marked as a duplicate of bug 1053900 ***


Note You need to log in before you can comment on or make changes to this bug.