Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1357793 - udpong fails over orcdma
Summary: udpong fails over orcdma
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: librdmacm
Version: 7.3
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Jarod Wilson
QA Contact: zguo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-07-19 07:56 UTC by zguo
Modified: 2017-05-19 13:04 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-05-19 13:04:32 UTC


Attachments (Terms of Use)

Description zguo 2016-07-19 07:56:30 UTC
Description of problem:


Version-Release number of selected component (if applicable):
[root@rdma-qe-05 ~]$ uname -r
3.10.0-461.el7.x86_64
[root@rdma-qe-05 ~]$ rpm -qa | grep libocrdma
libocrdma-1.0.8-1.el7.x86_64
[root@rdma-qe-05 ~]$ rpm -qa | grep librdmacm
librdmacm-1.1.0-2.el7.x86_64
librdmacm-devel-1.1.0-2.el7.x86_64
librdmacm-utils-1.1.0-2.el7.x86_64
[root@rdma-qe-05 ~]$ grep -i distro /etc/motd
                           DISTRO=RHEL-7.3-20160707.2


How reproducible:
Always

Steps to Reproduce:
1.Server side: 
[root@rdma-qe-04 ~]$ udpong -b roce-qe-04

2.Client side: 
[root@rdma-qe-05 ~]$ udpong -s roce-qe-04 -S 1024 -C 10240000
name      bytes   xfers   total       time     Gb/sec    usec/xfer
rconnect: No such file or directory

3.

Actual results:


Expected results:


Additional info:

Comment 1 Jarod Wilson 2016-07-19 14:36:13 UTC
Doug, is this expected to work, or is there a known reason why it doesn't?

Comment 2 Doug Ledford 2016-07-19 15:04:28 UTC
When using RoCE instead of native IB, an app generally needs to support one of two things:

1) Use of librdmacm for connection establishment
2) Ability to specify the GID index when not using librdmacm to establish connections

The udpong application is part of librdmacm, so I can't imagine it not being written to use librdmacm to establish connections.  As such, it should work.  That it doesn't is either a bug in the stack, or a bug in udpong.  Given that I logged into rdma-qe-04/05 and used qperf to successfully run RoCE tests over the RoCE links, I would say the bug is in udpong and should be investigated there.  A simple strace of udpong shows that at some point during the setup of the connection, the RDMA stack returns EINVAL for something that udpong tried to do, and then udpong give the error this bug lists.  We just need to track down what command failed and then find out why (probably because udpong made an assumption about device capabilities or some such and requested something the ocrdma cards don't support) and then fix it.

Comment 3 Jarod Wilson 2016-08-23 22:38:43 UTC
Okay, so I've traced this down to a failed call to ibv_create_ah() inside of librdamcm/src/rsocket.c's ds_add_qp_dest()... I haven't looked into the libibverbs code yet to trace it further, but I'd guess this might jump from libibverbs over into libocrdma, since other hardware is working fine.

Comment 4 Jarod Wilson 2016-08-29 03:18:32 UTC
From ibv_create_ah, we get to ocrdma_create_ah, and in there, the call to ibv_cmd_create_ah is failing.


Note You need to log in before you can comment on or make changes to this bug.