Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1356615 - Kdump fails to start due to the customer using a tap device on openvpn.
Summary: Kdump fails to start due to the customer using a tap device on openvpn.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kexec-tools
Version: 6.7
Hardware: x86_64
OS: Linux
urgent
high
Target Milestone: rc
: ---
Assignee: Xunlei Pang
QA Contact: Qiao Zhao
URL:
Whiteboard:
Depends On:
Blocks: 1375890
TreeView+ depends on / blocked
 
Reported: 2016-07-14 13:18 UTC by Billy Woods
Modified: 2017-03-21 09:12 UTC (History)
9 users (show)

Fixed In Version: kexec-tools-2.0.0-301.el6
Doc Type: If docs needed, set a value
Doc Text:
undefined
Clone Of:
: 1375890 (view as bug list)
Environment:
Last Closed: 2017-03-21 09:12:35 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:0584 normal SHIPPED_LIVE kexec-tools bug fix and enhancement update 2017-03-21 12:24:25 UTC

Description Billy Woods 2016-07-14 13:18:18 UTC
Description of problem:

Customer is running kexec-tools-2.0.0-300.el6.x86_64. Kdump fails to start due to the customer using a tap device on openvpn. Case number 01620459

Version-Release number of selected component (if applicable):

kexec-tools-2.0.0-300.el6.x86_64

How reproducible:

Every time

Steps to reproduce:

 Created By: Robb Manes  (6/20/2016 3:48 PM)

// Working Notes - these notes are not intended as a meaningful communication
// but rather an indicator of current thought processes and reference.
// Please feel free to comment and ask questions concerning them.  

My steps to reproduce, which worked without problems:

Make the tap device manually:

	# tunctl -t tap0
	Set 'tap0' persistent and owned by uid 0

Set up SSH kdump:

	/etc/kdump.conf
	ssh kdump@waffle.usersys.redhat.com
	path /share/

Set up kdump keys:

	# service kdump propagate
	Using existing keys...
	kdump@waffle.usersys.redhat.com's password: 
	/root/.ssh/kdump_id_rsa has been added to ~kdump/.ssh/authorized_keys on waffle.usersys.redhat.com

Restart kdump:

	# service kdump restart
	Stopping kdump:                                            [  OK  ]
	Detected change(s) the following file(s):
	  
	  /etc/kdump.conf
	Rebuilding /boot/initrd-2.6.32-642.el6.x86_64kdump.img
	Starting kdump:                                            [  OK  ]

Ensure kdump kernel is loaded:

	# grep -i crash /proc/iomem 
	  03000000-0b0fffff : Crash kernel

Crash the system:

	# echo 'c' > /proc/sysrq-trigger

From the console, I can see:

	$ virsh cosole rhel6-kdump-test
	- - - - - - - - - 8< - - - - - - - - - 
	mapping eth0 to eth0
	udhcpc (v1.15.1) started
	Sending discover...
	Sending select for 10.12.212.85...
	Lease of 10.12.212.85 obtained, lease time 43200
	deleting routers
	adding dns 10.11.5.4
	adding dns 10.11.5.3
	Saving to remote location kdump@waffle.usersys.redhat.com
	Saving vmcore-dmesg.txt
	reverse mapping checking getaddrinfo for unused [10.12.213.189] failed - POSSIBLE BREAK-IN ATTEMPT!
	63+1 records in
	63+1 records out
	32270 bytes (32 kB) copied, 0.000106353 s, 303 MB/s
	Saved vmcore-dmesg.txt
	Free memory/Total memory (free %): 66724 / 114296 ( 58.3782 )
	Excluding unnecessary pages        : [100.0 %] |reverse mapping checking getaddrinfo for unused [10.12.213.189] failed - POSSIBLE BREAK-IN ATTEMPT!
	Copying data                       : [100.0 %] \
	59550+465 records in
	59566+1 records out
	30497992 bytes (30 MB) copied, 2.0862 s, 14.6 MB/s
	Saving core complete
	Restarting system.

From the SSH host, I see the core:

	$ file /share/10.12.212.85-2016-06-20-15\:29\:05/vmcore.flat 
	/share/10.12.212.85-2016-06-20-15:29:05/vmcore.flat: data

So, on my host it works as expected.  All I did was create a tunnel without configuration.

I note that in the last attempts provided to us, tap0 is missing a 'device' parameter in /sys:

	# service kdump restart
	Stopping kdump:                                            [  OK  ]
	No kdump initial ramdisk found.                            [WARNING]
	Rebuilding /boot/initrd-2.6.32-573.22.1.el6.x86_64kdump.img
	ls: cannot access /sys/class/net/tap0/device: No such file or directory
	Starting kdump:                                            [  OK  ]

My host does not have this either, but here was no issue or complaint when I rebuilt the ramdisk:

	# ls /sys/class/net/tap0/dev*
	/sys/class/net/tap0/dev_id  /sys/class/net/tap0/dev_port

	# mv /boot/initrd-2.6.32-573.22.1.el6.x86_64kdump.img /boot/initrd-2.6.32-573.22.1.el6.x86_64kdump.img.backup

	# service kdump restart
	Stopping kdump:                                            [  OK  ]
	No kdump initial ramdisk found.                            [WARNING]
	Rebuilding /boot/initrd-2.6.32-573.22.1.el6.x86_64kdump.img
	Starting kdump:                                            [  OK  ]

And I am running an identical kernel:

	# uname -a
	Linux rhel6-kdump-test 2.6.32-573.22.1.el6.x86_64 #1 SMP Thu Mar 17 03:23:39 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux

There is one major difference here, as Nitin pointed out.  The logic of mkdumprd uses the routable interface as the handle_netdev() argument, which, in this scenario, might be the VPN tunnel:

	- - - - - - - - - 8< - - - - - - - - - 
		    #find ethernet device used to route to remote host, ie eth0  <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
		    netdev=`/sbin/ip route get to $remoteip 2>&1`  <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
		    [ $? != 0 ] && echo "Bad kdump location: $config_val" && cleanup_and_exit 1
		    DUMP_TARGET=$config_val
		    #the field in the ip output changes if we go to another subnet
		    OFF_SUBNET=`echo $netdev | grep via`
		    if [ -n "$OFF_SUBNET" ]
		    then
		        # we are going to a different subnet
		        netdev=`echo $netdev|awk '{print $5;}'|head -n 1`
		    else
		        # we are on the same subnet
		        netdev=`echo $netdev|awk '{print $3}'|head -n 1`
		    fi

		    #add the ethernet device to the list of modules 
		    mkdir -p $MNTIMAGE/etc/network/
		    handlenetdev $netdev  <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
		        |
	      .---------'
	      v
	handlenetdev() {
	    local dev=$1
	    local ifcfg_file
	    local vnet_prefix

	    case " $handlednetdevices " in
		*" $dev "*)
		    return ;;
		*) handlednetdevices="$handlednetdevices $dev" ;;
	    esac

	    ifcfg_file=`find_ifcfg_by_devicename $dev`  --------.
	    if [ -z "${ifcfg_file}" ]; then                     |
		error "The ifcfg-$dev or ifcfg-xxx which contains DEVICE=$dev field doesn't exist."
		cleanup_and_exit 1                              |
	    fi                                                  |
		  .---------------------------------------------'
		  v
	find_ifcfg_by_devicename() {
	    local dev=$1
	- - - - - - - - - 8< - - - - - - - - - 

In my example, my routable interface is not the tap device.  Let us see if it is the routed interface - from an sosreport I unfortunately can't tell:

	$ cat sos_commands/networking/ip_route_show_table_all  | grep tap
	fe80::/64 dev tap0  proto kernel  metric 256  mtu 1500 advmss 1440 hoplimit 4294967295
	ff00::/8 dev tap0  table local  metric 256  mtu 1500 advmss 1440 hoplimit 4294967295



Actual results:


[e723nb@smsslpoc1a ~]$ rpm -qa | grep -i kexec
kexec-tools-2.0.0-300.el6.x86_64

[e723nb@smsslpoc1a ~]$ sudo /sbin/service kdump restart
Stopping kdump:                                            [  OK  ]
No kdump initial ramdisk found.                            [WARNING]
Rebuilding /boot/initrd-2.6.32-573.22.1.el6.x86_64kdump.img
The ifcfg-tap0 or ifcfg-xxx which contains DEVICE=tap0 field doesn't exist.
Failed to run mkdumprd


[e723nb@smsslpoc1a ~]$ sudo /sbin/service kdump restart
Stopping kdump:                                            [  OK  ]
No kdump initial ramdisk found.                            [WARNING]
Rebuilding /boot/initrd-2.6.32-573.22.1.el6.x86_64kdump.img
The ifcfg-tap0 or ifcfg-xxx which contains DEVICE=tap0 field doesn't exist.
Failed to run mkdumprd
 

Expected results:

For kdump to not look for the tap device

Additional info:


Customer can take down the network(tap device and openvpn) and start kdump. The issue is that the customer will have to take down the network every time they upgrade the kernel. This will impact production. They cannot build an ifcfg for the tap device because of conflicts with the ifcfg and the openvpn during reboot.

Comment 2 Xunlei Pang 2016-07-15 01:56:06 UTC
Hi Billy,

Do you have any remote environment for me to have a look?

Thanks!

Comment 3 Billy Woods 2016-07-15 12:30:39 UTC
Hello,

I do not have any remote environment at this time. I can request this from the customer if needed?


Thank you!

Comment 6 Dave Young 2016-07-26 08:23:28 UTC
Please ensure the server for dumping can be accessed without vpn, or kdump will fail.

Comment 7 Billy Woods 2016-07-26 13:50:07 UTC
Xunlei and Dave,

Ack, I have a request for the customer to provide us with the info. Robb requested the "ip route get to" a few days ago and I have re-requested that along with the new info requested.


Regards,
Billy Woods

Comment 35 errata-xmlrpc 2017-03-21 09:12:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0584.html


Note You need to log in before you can comment on or make changes to this bug.