|Summary:||virt/install fails to submit logs and proceed with next task|
|Product:||[Community] Beaker||Reporter:||Jan Stancek <jstancek>|
|Component:||beah||Assignee:||Dan Callaghan <dcallagh>|
|Status:||CLOSED DUPLICATE||QA Contact:||tools-bugs <tools-bugs>|
|Version:||0.15||CC:||aigao, asaha, bpeck, dcallagh, jburke, llim, pbunyan, qwan, rmancy, skrishna|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2014-02-17 03:02:26 UTC||Type:||Bug|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
Description Jan Stancek 2014-02-08 10:04:56 UTC
Description of problem: On some RHEL7 system I see that /virt/install installs all guests, but it fails to proceed with next task. When I log into host I can see in guest console logs, that both guests installed fine. This is what I see on host console logs: ====================================================================== 2014-02-08 04:11:27,880 rhts_task checkin_finish: INFO resetting nohup 02/08/14 04:11:27 testID:19052514 finish: 2014-02-08 04:11:27,894 rhts_task task_exited: INFO task_exited([Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ProcessDone'>: A process has ended without apparent errors: process finished with exit code 0. ]) 2014-02-08 04:11:27,894 rhts_task on_exit: INFO quitting... 2014-02-08 04:11:27,895 rhts_task task_ended: INFO task_ended([Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ProcessDone'>: A process has ended without apparent errors: process finished with exit code 0. ]) 2014-02-08 04:11:28,918 beah processExited: INFO TaskStdoutProtocol:processExited([Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ProcessDone'>: A process has ended without apparent errors: process finished with exit code 0. ]) 2014-02-08 04:11:28,918 beah processEnded: INFO TaskStdoutProtocol:processEnded([Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ProcessDone'>: A process has ended without apparent errors: process finished with exit code 0. ]) 2014-02-08 04:11:28,938 beah task_finished: INFO Task 6062cbda-0a54-4dbd-aa34-922ed5fa7a17 has finished. 2014-02-08 04:11:28,939 backend async_proc: INFO Task 19052514 done. Submitting logs... [-- MARK -- Sat Feb 8 09:15:00 2014] [-- MARK -- Sat Feb 8 09:20:00 2014] [-- MARK -- Sat Feb 8 09:25:01 2014] [-- MARK -- Sat Feb 8 09:30:00 2014] [-- MARK -- Sat Feb 8 09:35:00 2014] [-- MARK -- Sat Feb 8 09:40:00 2014] [-- MARK -- Sat Feb 8 09:45:00 2014] [-- MARK -- Sat Feb 8 09:50:00 2014] ====================================================================== Looking at processes, harness doesn't have any child processes: 3110 ? Ss 0:00 /usr/sbin/crond -n 3111 ? Ss 0:00 /usr/bin/python /usr/bin/beah-srv 3112 ? Ss 0:00 /usr/bin/python /usr/bin/beah-beaker-backend 3113 ? Ss 0:00 /usr/bin/python /usr/bin/beah-fwd-backend 3379 ? Ssl 0:00 /usr/sbin/libvirtd 3405 ? Ssl 0:00 /usr/sbin/automount --pid-file /run/autofs.pid 3583 ? Sl 0:00 Xvfb :1 -screen 0 1600x1200x24 -fbdir /tmp 3709 ? Ss 0:00 /usr/local/bin/logguestconsoles --config /usr/local/etc/logguestconsoles.conf 3865 ? Ss 0:00 /usr/lib/systemd/systemd-machined 25077 ? Ss 0:00 /usr/sbin/anacron -s If I start guests manually with "virsh start", then sometimes they hit same issue. Console log reports "Submitting logs", but no logs get submitted to beaker and tasks eventually hit external watchdog. Version-Release number of selected component (if applicable): 0.15.3 How reproducible: high Steps to Reproduce: 1. Install host using RHEL-7.0-20140206.0 2. Install 2 guests using RHEL-7.0-20140206.0 Actual results: /virt/install fails to submit logs and proceed with next task Expected results: All logs get submitted and harness will proceed with next task Additional info:
Comment 3 Jan Stancek 2014-02-08 17:15:12 UTC
(In reply to Jan Stancek from comment #0) > If I start guests manually with "virsh start", then sometimes they hit same > issue. Console log reports "Submitting logs", but no logs get submitted to > beaker and tasks eventually hit external watchdog. If I log to guest via ssh when this happens, all harness processes are running (with no child processes). If I run "systemctl restart beah-beaker-backend" logs get submitted to beaker immediately and guest continues with next task.
Comment 5 Amit Saha 2014-02-09 12:09:54 UTC
A wild guess is the option: net.ipv6.conf.all.forwarding being turned off on the host.
Comment 6 Nick Coghlan 2014-02-10 00:57:00 UTC
In addition to attempting to fix this directly, we will also add the ability to opt in to using older versions of the harness (see #1063090)
Comment 7 Dan Callaghan 2014-02-12 02:02:25 UTC
(In reply to Jan Stancek from comment #4) > It looks like guests are not getting any response from LC. For example, set > filter to tcp.port==8000 and look at frame 130254. Both guests sent SYN to > LC, but there appears to be no response. I think this is actually from the logguestconsoles service created by /distribution/virt/install. I don't think it's related. From what I can see, when the task ends beah just sits there waiting for... nothing. There are no open connections to the LC waiting for anything.
Comment 8 Dan Callaghan 2014-02-12 02:17:33 UTC
(In reply to Dan Callaghan from comment #7) Scratch that, the problem is definitely that IPv6 connectivity goes bad on the host. I can see beah talking to the LC over IPv6 at the start of the recipe, but once /distribution/virt/install runs IPv6 packets suddenly get dropped on the floor. It looks like the default IPv6 routes are missing from the routing table. Restarting beah-beaker-backend works, because it notices that IPv6 connections are timing out so it falls back to IPv4. Restarting the network service also works because it fixes up the IPv6 routing table to have default routes again.
Comment 9 Dan Callaghan 2014-02-12 06:54:51 UTC
Okay, I don't think it's just the routing entries. I noticed that "service network restart" fixes IPv6 connectivity to the LC, and if I keep packets flowing to the LC (using ping6 left running for example) it will keep working for several hours. But if I leave it alone for about 60 seconds or more, IPv6 packets to the LC suddenly start dropping on the floor again. So I think there must be something going wrong with neighbour discovery/autoconfiguration, but I don't fully understand how that stuff works so I can't figure out what's going wrong. We only seem to hit this problem with /distribution/virt/install, and one of the things it does is disable NetworkManager and enable the network initscript instead. So the problem might be due to some difference between those two.
Comment 14 Nick Coghlan 2014-02-14 04:43:09 UTC
Dropping this from Beaker's targets, as it appears to be failing due to a genuine issue in the kernel. We'll mark it as CLOSED/DUPLICATE once there's an appropriate BZ to reference.