Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.

Bug 1364286

Summary: The agent got stuck if the broker takes more that 30 seconds to reach the smtp server
Product: Red Hat Enterprise Virtualization Manager Reporter: Germano Veit Michel <gveitmic>
Component: ovirt-hosted-engine-haAssignee: Andrej Krejcir <akrejcir>
Status: CLOSED ERRATA QA Contact: Nikolai Sednev <nsednev>
Severity: urgent Docs Contact:
Priority: high    
Version: 3.6.0CC: bugs, cshao, dfediuck, gklein, gveitmic, lsurette, mavital, melewis, mgoldboi, mkalinin, msivak, nsednev, obockows, rs, sbonazzo, stirabos, ycui, ykaul
Target Milestone: ovirt-3.6.9Keywords: EasyFix, Triaged, ZStream
Target Release: 3.6.9   
Hardware: x86_64   
OS: Linux   
Whiteboard: sla
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, the HA broker waited for a non-responsive SMTP server without timing out. This meant the the HA agent waited indefinitely for the HA broker. Now, a timeout has been added to the connection between the HA broker and the SMTP server. This means that the HA broker and the HA agent no longer wait indefinitely for a SMTP response.
Story Points: ---
Clone Of: 1359059 Environment:
Last Closed: 2016-09-21 17:54:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1359059    
Bug Blocks:    

Comment 3 Martin Sivák 2016-08-05 08:10:40 UTC
This is an easy fix and I believe we should backport it to 3.6. I am setting all the right flags to ask for that.

Comment 8 Nikolai Sednev 2016-08-29 16:47:49 UTC
I'm still running with the agent that not being restarted because of SMTP server is not reachable as was blocked.
I don't see original error messages and agent not being restarted by the broker, hence I'm closing this bug as verified.

In broker's log I see these:
Thread-116::ERROR::2016-08-29 19:41:30,220::notifications::39::ovirt_hosted_engine_ha.broker.notifications.Notifications::(send_email) timed out
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/notifications.py", line 26, in send_email
    timeout=float(cfg["smtp-timeout"]))
  File "/usr/lib64/python2.7/smtplib.py", line 255, in __init__
    (code, msg) = self.connect(host, port)
  File "/usr/lib64/python2.7/smtplib.py", line 315, in connect
    self.sock = self._get_socket(host, port, self.timeout)
  File "/usr/lib64/python2.7/smtplib.py", line 290, in _get_socket
    return socket.create_connection((host, port), timeout)
  File "/usr/lib64/python2.7/socket.py", line 571, in create_connection
    raise err
timeout: timed out


Works for me on these components on host:
Host:
libvirt-client-1.2.17-13.el7_2.5.x86_64
vdsm-4.17.34-1.el7ev.noarch
ovirt-vmconsole-1.0.4-1.el7ev.noarch
qemu-kvm-rhev-2.3.0-31.el7_2.21.x86_64
ovirt-hosted-engine-setup-1.3.7.3-1.el7ev.noarch
sanlock-3.2.4-3.el7_2.x86_64
ovirt-setup-lib-1.0.1-1.el7ev.noarch
rhevm-appliance-20160620.0-1.el7ev.noarch
mom-0.5.5-1.el7ev.noarch
rhevm-sdk-python-3.6.8.0-1.el7ev.noarch
rhev-release-3.6.9-1-001.noarch
ovirt-hosted-engine-ha-1.3.5.8-1.el7ev.noarch
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
ovirt-host-deploy-1.4.1-1.el7ev.noarch
Linux version 3.10.0-327.36.1.el7.x86_64 (mockbuild@x86-037.build.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Wed Aug 17 03:02:37 EDT 2016
Linux 3.10.0-327.36.1.el7.x86_64 #1 SMP Wed Aug 17 03:02:37 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.2 (Maipo)
rhevm-appliance-20160620.0-1.el7ev.noarch

Engine:
ovirt-engine-extension-aaa-jdbc-1.0.7-2.el6ev.noarch
ovirt-setup-lib-1.0.1-1.el6ev.noarch
ovirt-vmconsole-1.0.4-1.el6ev.noarch
rhevm-setup-plugin-ovirt-engine-3.6.9-0.1.el6.noarch
ovirt-vmconsole-proxy-1.0.4-1.el6ev.noarch
rhevm-setup-plugin-ovirt-engine-common-3.6.9-0.1.el6.noarch
ovirt-host-deploy-1.4.1-1.el6ev.noarch
ovirt-host-deploy-java-1.4.1-1.el6ev.noarch
rhevm-image-uploader-3.6.1-2.el6ev.noarch
rhevm-webadmin-portal-3.6.9-0.1.el6.noarch
rhevm-spice-client-x64-cab-3.6-7.el6.noarch
rhevm-setup-plugins-3.6.5-1.el6ev.noarch
rhevm-setup-base-3.6.9-0.1.el6.noarch
rhevm-setup-3.6.9-0.1.el6.noarch
rhevm-tools-backup-3.6.9-0.1.el6.noarch
rhevm-branding-rhev-3.6.0-10.el6ev.noarch
rhevm-setup-plugin-ovirt-engine-3.6.9-0.1.el6.noarch
rhevm-tools-3.6.9-0.1.el6.noarch
rhevm-restapi-3.6.9-0.1.el6.noarch
rhevm-spice-client-x86-cab-3.6-7.el6.noarch
rhevm-guest-agent-common-1.0.11-6.el6ev.noarch
rhevm-sdk-python-3.6.9.0-2.el6ev.noarch
rhevm-setup-plugin-vmconsole-proxy-helper-3.6.9-0.1.el6.noarch
rhevm-vmconsole-proxy-helper-3.6.9-0.1.el6.noarch
rhevm-backend-3.6.9-0.1.el6.noarch
rhevm-3.6.9-0.1.el6.noarch
rhevm-log-collector-3.6.1-1.el6ev.noarch
rhevm-spice-client-x86-msi-3.6-7.el6.noarch
rhev-release-3.6.9-1-001.noarch
rhevm-lib-3.6.9-0.1.el6.noarch
rhevm-setup-plugin-ovirt-engine-common-3.6.9-0.1.el6.noarch
rhevm-cli-3.6.9.0-1.el6ev.noarch
rhevm-extensions-api-impl-3.6.9-0.1.el6.noarch
rhevm-websocket-proxy-3.6.9-0.1.el6.noarch
rhevm-doc-3.6.8-1.el6eng.noarch
rhevm-userportal-3.6.9-0.1.el6.noarch
rhevm-setup-plugin-websocket-proxy-3.6.9-0.1.el6.noarch
rhevm-dependencies-3.6.1-1.el6ev.noarch
rhev-guest-tools-iso-3.6-6.el6ev.noarch
rhevm-dbscripts-3.6.9-0.1.el6.noarch
rhevm-spice-client-x64-msi-3.6-7.el6.noarch
rhevm-iso-uploader-3.6.0-1.el6ev.noarch
Linux version 2.6.32-642.el6.x86_64 (mockbuild@x86-033.build.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-17) (GCC) ) #1 SMP Wed Apr 13 00:51:26 EDT 2016
Linux 2.6.32-642.el6.x86_64 #1 SMP Wed Apr 13 00:51:26 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 6.8 (Santiago)


I've deployed clean hosted engine over iSCSI storage domain from appliance, then upgraded the appliance's components to latest bits.

Moving to verified.

Comment 10 errata-xmlrpc 2016-09-21 17:54:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1924.html