Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.

Bug 236771

Summary: [RHEL5 RT][OPENIB]Stopping openSM process on RT kernel gives a kernel backtrace
Product: Red Hat Enterprise MRG Reporter: Gurhan Ozen <gozen>
Component: realtime-kernelAssignee: Doug Ledford <dledford>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 1.0CC: dledford, jburke
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: -35 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-10-02 14:52:19 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description Gurhan Ozen 2007-04-17 16:17:23 UTC
Description of problem:
 When shutting down opensm service on RT kernel , I get the following backtrace:

------------[ cut here ]------------
kernel BUG at kernel/rt.c:344!
invalid opcode: 0000 [1] PREEMPT SMP 
CPU 1 
Modules linked in: autofs4 hidp l2cap bluetooth nfs lockd nfs_acl sunrpc
iscsi_tcp ib_iser libiscsi scsi_transport_iscsi ib_ucm rdma_ucm ib_srp ib_sdp
rdma_cm iw_cm ib_addr ib_local_sa ib_ipoib ib_cm ib_sa ipv6 ib_uverbs ib_umad
loop dm_multipath video sbs i2c_ec i2c_core dock button battery asus_acpi
backlight ac parport_pc lp parport sg pcspkr ib_ipath ata_generic ib_mthca
ib_mad ib_core shpchp bnx2 serio_raw ide_cd cdrom dm_snapshot dm_zero dm_mirror
dm_mod ata_piix libata megaraid_sas sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd
uhci_hcd
Pid: 4198, comm: opensm Not tainted 2.6.20-19.el5rt #1
RIP: 0010:[<ffffffff810b8e0a>]  [<ffffffff810b8e0a>] rt_downgrade_write+0x4/0x8
RSP: 0000:ffff81005d999c18  EFLAGS: 00010282
RAX: ffff81007cece828 RBX: ffff810076c907f8 RCX: ffff810076c90828
RDX: ffff81007cece828 RSI: 0000000000000000 RDI: ffff81007cece780
RBP: ffff81005d999c18 R08: 0000000000000000 R09: 0000000000000001
R10: ffff81005d96f6c0 R11: 0000000000000000 R12: ffff810076c90800
R13: ffff810076c907f8 R14: 0000000000000000 R15: ffff81007cece6c0
FS:  0000000000000000(0000) GS:ffff81000d510540(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000003c40946ae8 CR3: 0000000001001000 CR4: 00000000000006e0
Process opensm (pid: 4198, threadinfo ffff81005d998000, task ffff81005d996700)
Stack:  ffff81005d999c58 ffffffff883614e3 ffff8100786b5d30 0000000000000008
 ffff8100786b57a0 ffff81005d96f6c0 ffff8100786b57a0 ffff8100019b1180
 ffff81005d999c98 ffffffff81012db7 ffff81007805a378 ffff81005d96f6c0
Call Trace:
 [<ffffffff883614e3>] :ib_umad:ib_umad_close+0xb7/0x10f
 [<ffffffff81012db7>] __fput+0xdd/0x1af
 [<ffffffff8102f10c>] fput+0x17/0x19
 [<ffffffff81025d81>] filp_close+0x6c/0x77
 [<ffffffff8103b01e>] put_files_struct+0x6d/0xc1
 [<ffffffff81015dbb>] do_exit+0x27f/0x8c5
 [<ffffffff8104d3b7>] cpuset_exit+0x0/0x6e
 [<ffffffff8102d772>] get_signal_to_deliver+0x432/0x483
 [<ffffffff8105fa88>] do_notify_resume+0xc2/0x7d3
 [<ffffffff81062667>] ptregscall_common+0x67/0xb0
 [<ffffffff810622d6>] sysret_signal+0x21/0x31
 [<0000003c406c48c6>]

---------------------------
| preempt count: 00000001 ]
| 1-level deep critical section nesting:
----------------------------------------
.. [<ffffffff81069e97>] .... __spin_trylock+0x16/0x71
.....[<ffffffff8106b0ea>] ..   ( <= oops_begin+0x28/0x77)


Code: 0f 0b eb fe 55 48 89 e5 53 48 8d 5f 08 48 83 ec 08 85 f6 89 
RIP  [<ffffffff810b8e0a>] rt_downgrade_write+0x4/0x8
 RSP <ffff81005d999c18>
 <1>Fixing recursive fault but reboot is needed!


Version-Release number of selected component (if applicable):
# uname -a
Linux dell-pe1950-02.rhts.boston.redhat.com 2.6.20-19.el5rt #1 SMP PREEMPT Mon
Apr 16 12:14:21 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux


How reproducible:
Everytime.

Steps to Reproduce:
1. You'll need a system with IB hardware for this. Do service opensmd start ;
service opensmd stop. 
2. 
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Gurhan Ozen 2007-04-17 23:46:45 UTC
This behavior can be observed with ibping program as well. Just run ibping. 

Comment 2 Doug Ledford 2007-07-10 20:50:51 UTC
This was resolved with the OFED 1.2 final code and updated rt port patch used to
build the kernel-rt-2.6.21-32.ofed.3.el5rt kernel (this was a scratch build, but
the updated patches were submitted to Clark Williams to be included in his rt
kernel).

Comment 3 Clark Williams 2007-07-26 14:48:46 UTC
applied to -35; testing

Comment 4 Gurhan Ozen 2007-08-13 20:57:40 UTC
Verified with -35:

[root@dell-pe1950-02 ~]# service opensmd start
Starting IB Subnet Manager                                 [  OK  ]
[root@dell-pe1950-02 ~]# service opensmd stop ; service opensmd start
Stopping IB Subnet Manager.......                          [  OK  ]
Starting IB Subnet Manager                                 [  OK  ]
[root@dell-pe1950-02 ~]# uname -a
Linux dell-pe1950-02.rhts.boston.redhat.com 2.6.21-35.el5rt #1 SMP PREEMPT RT
Thu Jul 26 11:59:02 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux