Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1365750 - perftest test failed with XRC connection type
Summary: perftest test failed with XRC connection type
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: perftest
Version: 7.3
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: rc
: ---
Assignee: Jarod Wilson
QA Contact: zguo
URL:
Whiteboard:
Depends On:
Blocks: 1274397
TreeView+ depends on / blocked
 
Reported: 2016-08-10 07:15 UTC by zguo
Modified: 2017-06-01 19:07 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-04 03:27:41 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2016:2309 normal SHIPPED_LIVE RDMA stack bug fix and enhancement update 2016-11-03 13:40:45 UTC

Description zguo 2016-08-10 07:15:26 UTC
Description of problem:


Version-Release number of selected component (if applicable):
mlx5(rdma-qe-06/07)
perftest-3.0-3.el7.x86_64

How reproducible:


Steps to Reproduce:
1. Test job: https://beaker.engineering.redhat.com/recipes/2945342#task44144413
2. Client log: https://beaker.engineering.redhat.com/recipes/2945342/tasks/44144413/results/218012874/logs/test_log--kernel-infiniband-perftest-client.log
3.

Actual results:
echo -e 'rdma_lat\t\t |passed |passed |passed'
+ cat /tmp/result_matrix.txt
prog			 |RC	 |XRC	 |DC	
ib_atomic_bw		 |failed |failed |failed
ib_atomic_lat		 |failed |failed |failed
ib_read_bw		 |passed |failed |failed
ib_read_lat		 |passed |failed |failed
ib_send_bw		 |passed |failed |failed
ib_send_lat		 |passed |failed |failed
ib_write_bw		 |passed |failed |failed
ib_write_bw_postlist     |passed |passed |passed
ib_write_lat		 |passed |failed |failed
rdma_bw		         |passed |passed |passed
rdma_lat		 |passed |passed |passed

+ timeout 10m ib_read_bw ib0-qe-06 -a -c XRC -F
Unknown connection type 
 Unable to create QP.
Failed to create QP.
 Couldn't create IB resources
---------------------------------------------------------------------------------------
                    RDMA_Read BW Test
 Dual-port       : OFF		Device         : mlx5_0
 Number of qps   : 1		Transport type : IB
 Connection type : XRC		Using SRQ      : ON
 TX depth        : 128
 CQ Moderation   : 100
 Mtu             : 4096[B]
 Link type       : IB
 Outstand reads  : 16
 rdma_cm QPs	 : OFF
 Data ex. method : Ethernet

Expected results:


Additional info:
On RHEL-7.2 release distro, perftest test passed with XRC connection type
https://beaker.engineering.redhat.com/jobs/1440994
https://beaker.engineering.redhat.com/recipes/2950689/tasks/44198059/results/218087251/logs/test_log--kernel-infiniband-perftest-client.log

Comment 2 Jarod Wilson 2016-08-12 18:29:48 UTC
Okay, so this is a lack of feature implementation in perftest. The qp_create functions lack a switch case for XRC, thus the "Unknown connection type" message, and no XRC queue pairs can be created. There's some additional code that was missing an XRC-specific code path, which I just added in, and now:

[root@rdma-qe-07 perftest (master *)]$ ./ib_read_bw ib0-qe-06 -a -c XRC -F
---------------------------------------------------------------------------------------
                    RDMA_Read BW Test
 Dual-port       : OFF          Device         : mlx5_0
 Number of qps   : 1            Transport type : IB
 Connection type : XRC          Using SRQ      : ON
 TX depth        : 128
 CQ Moderation   : 100
 Mtu             : 4096[B]
 Link type       : IB
 Outstand reads  : 16
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0x34 QPN 0x0123 PSN 0x35a519 OUT 0x10 RKey 0x00fd4d VAddr 0x002b23febf4000 SRQn 0x000122
 remote address: LID 0x36 QPN 0x00f2 PSN 0x9ae18b OUT 0x10 RKey 0x00daf3 VAddr 0x002b92913eb000 SRQn 0x0000f1
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]
 2          1000             15.51              14.05              7.366099
 4          1000             31.25              31.24              8.190245
 8          1000             62.51              62.50              8.191350
 16         1000             123.23             120.64             7.906333
 32         1000             248.82             248.26             8.134886
 64         1000             482.60             481.79             7.893613
 128        1000             947.58             945.85             7.748387
 256        1000             1820.35            1817.00            7.442447
 512        1000             3320.31            3313.77            6.786603
 1024       1000             5570.99            5521.33            5.653842
 2048       1000             6120.39            6119.02            3.132939
 4096       1000             6214.90            6214.54            1.590922
 8192       1000             6230.94            6229.90            0.797427
 16384      1000             6236.06            6235.92            0.399099
 32768      1000             6239.36            6239.31            0.199658
 65536      1000             6240.82            6240.71            0.099851
 131072     1000             6241.56            6241.55            0.049932
 262144     1000             6241.97            6241.97            0.024968
 524288     1000             6242.17            6242.17            0.012484
 1048576    1000             6242.29            6242.28            0.006242
 2097152    1000             6242.32            6242.31            0.003121
 4194304    1000             6242.35            6242.35            0.001561
 8388608    1000             6242.37            6242.37            0.000780
---------------------------------------------------------------------------------------

I'll send this patch upstream shortly for merge consideration.

Comment 3 Jarod Wilson 2016-08-12 18:40:53 UTC
Patch sent upstream to linux-rdma@vger.kernel.org for review.

Comment 8 zguo 2016-08-15 06:36:15 UTC
1) With this patch, hit Call Trace

https://beaker.engineering.redhat.com/jobs/1448078
https://beaker.engineering.redhat.com/recipes/2964831/tasks/44401323/results/219055606/logs/test_log--kernel-infiniband-perftest-client-dmesg.log
===
Checking dmesg for specific failures!
[  195.358614] ------------[ cut here ]------------
[  195.380690] WARNING: at drivers/infiniband/core/rw.c:652 rdma_rw_init_qp+0x8b/0xa0 [ib_core]()

[  195.419736] Modules linked in: rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm intel_powerclamp ib_ucm ib_uverbs coretemp ib_umad rdma_cm ib_cm iw_cm intel_rapl iosf_mbi mlx5_ib ib_core kvm_intel iTCO_wdt hpwdt iTCO_vendor_support gpio_ich hpilo kvm ipmi_ssif shpchp ie31200_edac irqbypass sg edac_core crc32_pclmul ghash_clmulni_intel ipmi_devintf aesni_intel lrw gf128mul pcspkr lpc_ich glue_helper pcc_cpufreq ablk_helper ipmi_si cryptd ipmi_msghandler acpi_power_meter nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci mlx5_core libahci tg3 drm libata crct10dif_pclmul

[  195.753647]  ptp crct10dif_common i2c_core serio_raw crc32c_intel pps_core fjes dm_mirror dm_region_hash dm_log dm_mod

[  195.797228] CPU: 5 PID: 13781 Comm: ib_atomic_lat Not tainted 3.10.0-489.el7.x86_64 #1

[  195.834338] Hardware name: HP ProLiant DL320e Gen8, BIOS J05 11/09/2013

[  195.864736]  0000000000000000 00000000276d221b ffff8803d36ebbb0 ffffffff8168dcb5

[  195.899480]  ffff8803d36ebbe8 ffffffff81085570 ffff8803d36ebcd8 ffff880181a58000

[  195.934517]  0000000000000000 ffff880181a58000 ffff8803d36ebde8 ffff8803d36ebbf8

[  195.969693] Call Trace:

[  195.981764]  [<ffffffff8168dcb5>] dump_stack+0x19/0x1b

[  196.006819]  [<ffffffff81085570>] warn_slowpath_common+0x70/0xb0

[  196.034452]  [<ffffffff810856ba>] warn_slowpath_null+0x1a/0x20

[  196.062896]  [<ffffffffa06e260b>] rdma_rw_init_qp+0x8b/0xa0 [ib_core]

[  196.094467]  [<ffffffffa06e0b1b>] ib_create_qp+0x13b/0x250 [ib_core]

[  196.125671]  [<ffffffffa0627064>] create_qp.isra.13+0x814/0x860 [ib_uverbs]

[  196.159865]  [<ffffffffa06262e0>] ? copy_wc_to_user+0xc0/0xc0 [ib_uverbs]

[  196.193980]  [<ffffffffa0625e50>] ? ib_uverbs_cq_event_handler+0x40/0x40 [ib_uverbs]

[  196.232172]  [<ffffffffa0629ba2>] ib_uverbs_create_qp+0x152/0x190 [ib_uverbs]

[  196.265632]  [<ffffffffa0624400>] ib_uverbs_write+0x1a0/0x400 [ib_uverbs]

[  196.297595]  [<ffffffff812ae924>] ? selinux_file_permission+0xc4/0x120

[  196.328929]  [<ffffffff812a7223>] ? security_file_permission+0x23/0xa0

[  196.358997]  [<ffffffff811fbfed>] vfs_write+0xbd/0x1e0

[  196.383603]  [<ffffffff811fcb0f>] SyS_write+0x7f/0xe0

[  196.407697]  [<ffffffff8169e249>] system_call_fastpath+0x16/0x1b

[  196.437108] ---[ end trace ce0c4d31a68b2d14 ]---


End of log.
===
+ cat /tmp/result_matrix.txt
prog			 |RC	 |XRC	 |DC	
ib_atomic_bw		 |failed |failed |failed
ib_atomic_lat		 |failed |failed |failed
ib_read_bw		 |passed |passed |failed
ib_read_lat		 |passed |passed |failed
ib_send_bw		 |passed |passed |failed
ib_send_lat		 |passed |passed |failed
ib_write_bw		 |passed |passed |failed
ib_write_bw_postlist     |passed |passed |passed
ib_write_lat		 |passed |passed |failed
rdma_bw		         |passed |passed |passed
rdma_lat		 |passed |passed |passed

2) Without this patch, no Call Trace.
https://beaker.engineering.redhat.com/jobs/1448134

+ cat /tmp/result_matrix.txt
prog			 |RC	 |XRC	 |DC	
ib_atomic_bw		 |failed |failed |failed
ib_atomic_lat		 |failed |failed |failed
ib_read_bw		 |passed |failed |failed
ib_read_lat		 |passed |failed |failed
ib_send_bw		 |passed |failed |failed
ib_send_lat		 |passed |failed |failed
ib_write_bw		 |passed |failed |failed
ib_write_bw_postlist     |passed |passed |passed
ib_write_lat		 |passed |failed |failed
rdma_bw		         |passed |passed |passed
rdma_lat		 |passed |passed |passed

Comment 9 zguo 2016-08-15 06:40:57 UTC
Adding Regression keyword for no this issue on RHEL-7.2.

Comment 11 zguo 2016-08-15 08:32:29 UTC
Test in comment 8 is:

RHEL-7.3-20160811.0 + 3.10.0-489.el7 + patch, call trace,    XRC passed
RHEL-7.3-20160811.0 + 3.10.0-489.el7 - patch, no call trace, XRC failed.

Then I did below testing:

RHEL-7.2 + 3.10.0-327.el7 + patch, no call trace, and XRC passed.
https://beaker.engineering.redhat.com/jobs/1448227
prog			 |RC	 |XRC	 |DC	
ib_atomic_bw		 |failed |failed |failed
ib_atomic_lat		 |failed |failed |failed
ib_read_bw		 |passed |passed |failed
ib_read_lat		 |passed |passed |failed
ib_send_bw		 |passed |passed |failed
ib_send_lat		 |passed |passed |failed
ib_write_bw		 |passed |passed |failed
ib_write_bw_postlist	 |passed |passed |passed
ib_write_lat		 |passed |passed |failed
rdma_bw		         |passed |passed |passed
rdma_lat		 |passed |passed |passed

Comment 12 Gil Rockah 2016-08-15 08:50:04 UTC
Hi, This is a known issue in this perftest package.
It was fixed in the latest version. Can you please verify ?
http://www.openfabrics.org/downloads/perftest/perftest-3.0-3.1.gb36a595.tar.gz

Thanks,
Gil

Comment 13 zguo 2016-08-15 09:23:32 UTC
(In reply to Gil Rockah from comment #12)
> Hi, This is a known issue in this perftest package.
> It was fixed in the latest version. Can you please verify ?
> http://www.openfabrics.org/downloads/perftest/perftest-3.0-3.1.gb36a595.tar.
> gz
> 
> Thanks,
> Gil

Submitted test job: https://beaker.engineering.redhat.com/jobs/1448518

Comment 14 Jarod Wilson 2016-08-15 14:41:14 UTC
(In reply to zguo from comment #13)
> (In reply to Gil Rockah from comment #12)
> > Hi, This is a known issue in this perftest package.
> > It was fixed in the latest version. Can you please verify ?
> > http://www.openfabrics.org/downloads/perftest/perftest-3.0-3.1.gb36a595.tar.
> > gz
> > 
> > Thanks,
> > Gil
> 
> Submitted test job: https://beaker.engineering.redhat.com/jobs/1448518

From that job:

+ cat /tmp/result_matrix.txt
prog			 |RC	 |XRC	 |DC	
ib_atomic_bw		 |passed |passed |passed
ib_atomic_lat		 |passed |passed |passed
ib_read_bw		 |passed |passed |passed
ib_read_lat		 |passed |passed |passed
ib_send_bw		 |passed |passed |passed
ib_send_lat		 |passed |passed |passed
ib_write_bw		 |passed |passed |passed
ib_write_bw_postlist		 |passed |passed |passed
ib_write_lat		 |passed |passed |passed
rdma_bw		 |passed |passed |passed
rdma_lat		 |passed |passed |passed
+ echo '--- client finishes.'
--- client finishes.
+ final_result
+ '[' '!' -f /root/perftest.txt ']'
+ result=PASS

I'll get our perftest package updated accordingly. Apologies for not looking at upstream changes prior to the patch I sent off last week! :)

Comment 16 Jarod Wilson 2016-08-15 15:01:10 UTC
(In reply to Jarod Wilson from comment #14)
> (In reply to zguo from comment #13)
> > (In reply to Gil Rockah from comment #12)
> > > Hi, This is a known issue in this perftest package.
> > > It was fixed in the latest version. Can you please verify ?
> > > http://www.openfabrics.org/downloads/perftest/perftest-3.0-3.1.gb36a595.tar.
> > > gz
> > > 
> > > Thanks,
> > > Gil
> > 
> > Submitted test job: https://beaker.engineering.redhat.com/jobs/1448518
> 
> From that job:
> 
> + cat /tmp/result_matrix.txt
> prog			 |RC	 |XRC	 |DC	
> ib_atomic_bw		 |passed |passed |passed
> ib_atomic_lat		 |passed |passed |passed
> ib_read_bw		 |passed |passed |passed
> ib_read_lat		 |passed |passed |passed
> ib_send_bw		 |passed |passed |passed
> ib_send_lat		 |passed |passed |passed
> ib_write_bw		 |passed |passed |passed
> ib_write_bw_postlist		 |passed |passed |passed
> ib_write_lat		 |passed |passed |passed
> rdma_bw		 |passed |passed |passed
> rdma_lat		 |passed |passed |passed
> + echo '--- client finishes.'
> --- client finishes.
> + final_result
> + '[' '!' -f /root/perftest.txt ']'
> + result=PASS
> 
> I'll get our perftest package updated accordingly.

Please hold, it seems there is some question as to whether or not the tests actually executed with the intended version of perftest. Going to manually re-run the test job as soon as I can get my hands on these systems.

Comment 18 Jarod Wilson 2016-08-15 16:31:54 UTC
(In reply to Jarod Wilson from comment #16)
> (In reply to Jarod Wilson from comment #14)
> > (In reply to zguo from comment #13)
> > > (In reply to Gil Rockah from comment #12)
> > > > Hi, This is a known issue in this perftest package.
> > > > It was fixed in the latest version. Can you please verify ?
> > > > http://www.openfabrics.org/downloads/perftest/perftest-3.0-3.1.gb36a595.tar.
> > > > gz
> > > > 
> > > > Thanks,
> > > > Gil
> > > 
> > > Submitted test job: https://beaker.engineering.redhat.com/jobs/1448518
> > 
> > From that job:
> > 
> > + cat /tmp/result_matrix.txt
> > prog			 |RC	 |XRC	 |DC	
> > ib_atomic_bw		 |passed |passed |passed
> > ib_atomic_lat		 |passed |passed |passed
> > ib_read_bw		 |passed |passed |passed
> > ib_read_lat		 |passed |passed |passed
> > ib_send_bw		 |passed |passed |passed
> > ib_send_lat		 |passed |passed |passed
> > ib_write_bw		 |passed |passed |passed
> > ib_write_bw_postlist		 |passed |passed |passed
> > ib_write_lat		 |passed |passed |passed
> > rdma_bw		 |passed |passed |passed
> > rdma_lat		 |passed |passed |passed
> > + echo '--- client finishes.'
> > --- client finishes.
> > + final_result
> > + '[' '!' -f /root/perftest.txt ']'
> > + result=PASS
> > 
> > I'll get our perftest package updated accordingly.
> 
> Please hold, it seems there is some question as to whether or not the tests
> actually executed with the intended version of perftest. Going to manually
> re-run the test job as soon as I can get my hands on these systems.

Hrm. Not so good with 3.0-3.1 and/or current git head of perftest:


server$ ./ib_read_bw -c XRC -F

************************************
* Waiting for client to connect... *
************************************
---------------------------------------------------------------------------------------
                    RDMA_Read BW Test
 Dual-port       : OFF          Device         : mlx5_0
 Number of qps   : 1            Transport type : IB
 Connection type : XRC          Using SRQ      : ON
 CQ Moderation   : 100
 Mtu             : 4096[B]
 Link type       : IB
 Outstand reads  : 16
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0x36 QPN 0x004d PSN 0xf5c74f OUT 0x10 RKey 0x008553 VAddr 0x002ac089ec6000 SRQn 0x00004c
 remote address: LID 0x34 QPN 0x0045 PSN 0xe614ef OUT 0x10 RKey 0x009b80 VAddr 0x002b4653e0f000 SRQn 0x000044
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]
ethernet_read_keys: Couldn't read remote address
 Unable to read to socket/rdam_cm
 Failed to exchange data between server and clients



client$ ./ib_read_bw ib0-qe-06 -a -c XRC -F
---------------------------------------------------------------------------------------
                    RDMA_Read BW Test
 Dual-port       : OFF          Device         : mlx5_0
 Number of qps   : 1            Transport type : IB
 Connection type : XRC          Using SRQ      : ON
 TX depth        : 128
 CQ Moderation   : 100
 Mtu             : 4096[B]
 Link type       : IB
 Outstand reads  : 16
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0x34 QPN 0x0045 PSN 0xe614ef OUT 0x10 RKey 0x009b80 VAddr 0x002b4653e0f000 SRQn 0x000044
 remote address: LID 0x36 QPN 0x004d PSN 0xf5c74f OUT 0x10 RKey 0x008553 VAddr 0x002ac089ec6000 SRQn 0x00004c
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]
 2          1000             15.44              13.85              7.263809
 4          1000             31.10              31.05              8.140242
 8          1000             62.36              62.23              8.156000
 16         1000             124.71             122.05             7.998551
 32         1000             247.05             246.99             8.093523
 64         1000             487.14             486.33             7.968071
 128        1000             943.27             942.05             7.717240
 256        1000             1820.35            1818.44            7.448333
 512        1000             3340.35            3337.55            6.835311
 1024       1000             5599.18            5531.08            5.663825
 2048       1000             6120.39            6119.48            3.133173
 4096       1000             6214.90            6214.34            1.590871
 8192       1000             6230.94            6230.25            0.797472
 16384      1000             6236.79            6236.08            0.399109
 32768      1000             6239.36            6239.35            0.199659
 65536      1000             6240.82            6240.74            0.099852
mlx5: rdma-qe-07: got completion with error:
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00008813 10000045 46808fd0
Problems with warm up

Comment 19 Jarod Wilson 2016-08-15 19:02:20 UTC
Oops. Error on my part. One side had -a, the other did not. All good with 3.0-3.1 after all, with XRC for all but ib_atomic_{bw,lat}.

Comment 20 Jarod Wilson 2016-08-15 19:49:10 UTC
On our hosts, both ib_atomic_bw and ib_atomic_lat are still getting an error 38 (ENOSYS) back from libibverbs ibv_post_send() calls.

Comment 21 Jarod Wilson 2016-08-15 19:56:29 UTC
Per comments #22-24 on bug 1288821, our hardware doesn't support atomic mode. Our test hosts have:

05:00.0 Infiniband controller: Mellanox Technologies MT27600 [Connect-IB]

Only the ConnectX-4 and ConnectX-4LX support atomic mode.

Comment 25 zguo 2016-09-01 05:21:37 UTC
Perftest with XRC connection type works now.
https://beaker.engineering.redhat.com/jobs/1482285

Comment 27 errata-xmlrpc 2016-11-04 03:27:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2309.html


Note You need to log in before you can comment on or make changes to this bug.