|Summary:||Curl error (16): Error in the HTTP2 framing layer against fedora servers|
|Product:||[Fedora] Fedora||Reporter:||Kevin Fenzi <kevin>|
|Component:||curl||Assignee:||Kamil Dudka <kdudka>|
|Status:||CLOSED ERRATA||QA Contact:||Fedora Extras Quality Assurance <extras-qa>|
|Version:||30||CC:||jdieter, john.j5live, kdudka, msekleta, paul, pwhalen, rbarlow|
|Fixed In Version:||curl-7.64.0-6.fc31 curl-7.64.0-6.fc30||Doc Type:||If docs needed, set a value|
|Doc Text:||Story Points:||---|
|Last Closed:||2019-03-29 19:20:32 UTC||Type:||Bug|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Cloudforms Team:||---||Target Upstream Version:|
Description Kevin Fenzi 2019-03-20 15:23:38 UTC
Our rawhide and fedora30 pre-beta composes have been failing off and on with "Curl error (16): Error in the HTTP2 framing layer". Now also a user has seen this against mirrors.fedoraproject.org: [07:50:15] <bowlofeggs> just saw this from dnf: [07:50:22] <bowlofeggs> Error: Error downloading packages: [07:50:24] <bowlofeggs> Curl error (16): Error in the HTTP2 framing layer for https://mirrors.fedoraproject.org/metalink?repo=updates-testing-f30&arch=x86_64  [07:50:45] <bowlofeggs> that was on fedora 30 On composes it always seems to happen on armv7 builds when it happens: https://koji.fedoraproject.org/koji/taskinfo?taskID=33629664 ... DEBUG util.py:439: repo: downloading from remote: koji-override-0 DEBUG util.py:439: error: Curl error (16): Error in the HTTP2 framing layer for https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20190319.n.0/compose/Everything/armhfp/os/repodata/6d2f4c287cfe95c8352e4f196b73775aa4adbd972be26b770363cb674305f03f-filelists.xml.zck  (https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20190319.n.0/compose/Everything/armhfp/os/repodata/6d2f4c287cfe95c8352e4f196b73775aa4adbd972be26b770363cb674305f03f-filelists.xml.zck). DEBUG util.py:439: koji-override-0 335 kB/s | 15 MB 00:45 DEBUG util.py:439: Cannot download 'https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20190319.n.0/compose/Everything/armhfp/os': Yum repo downloading error: Downloading error(s): repodata/6d2f4c287cfe95c8352e4f196b73775aa4adbd972be26b770363cb674305f03f-filelists.xml.zck - Cannot download, all mirrors were already tried without success. DEBUG util.py:439: Failed to synchronize cache for repo 'koji-override-0' ... I'm not sure what additional info I can get from composes, but if we can duplicate it against mirrors perhaps we can figure out whats going on? I wonder if it's related to GOAWAY after 1000 connections, but that patch was merged and should be in the version in both f30 and rawhide. Also, the curl version in both rawhide and f30 is the same currently. Happy to try and gather more info...
Comment 1 Kamil Dudka 2019-03-20 16:53:53 UTC
Do we have output of `curl --verbose` in the failing case? Does the server use nghttp2? Which version?
Comment 2 Kevin Fenzi 2019-03-20 17:12:11 UTC
Alas, we do not. ;( Perhaps we can try hitting mirrors in a loop with --verbose until we get a failure? mirrors is our proxy network, which is all fedora 29 hosts running httpd-2.4.38-2.fc29.x86_64 with h2 enabled. kojipkgs (what the composes hit) is those same proxy servers with varnish behind the httpd.
Comment 3 Kevin Fenzi 2019-03-20 21:41:39 UTC
There's some more verbose logs in https://pagure.io/dusty/failed-composes/issue/1703
Comment 4 Kamil Dudka 2019-03-21 08:49:59 UTC
I think that the "Could not resolve host: kojipkgs.fedoraproject.org" messages appeared there by mistake. I have attempted to fix it in curl-7.64.0-5.fc31 with the following commit: https://src.fedoraproject.org/cgit/rpms/curl.git/commit/?id=902ddefe
Comment 5 Kevin Fenzi 2019-03-21 16:29:09 UTC
ok. It sounds like jdieter has tracked this to a librepo/zchunk issue now... https://pagure.io/dusty/failed-composes/issue/1703#comment-561822 "I've done some more probing and this is definitely zchunk-related. It seems to be an issue with reusing curl handles when downloading the second part of the file (the data), which should work, but seems to crap out occasionally. There are also some (possibly related) speed issues, so I'll see if I can get a patch ready for librepo within the next day or two." So, perhaps we can just close this? Adding jdieter here for comment...
Comment 6 Jonathan Dieter 2019-03-23 23:13:17 UTC
Ok, I've tracked this down, and it seems to be a combination of two bugs, one identified by Kamil and the other in zchunk. First, while zchunk-1.0.4 is downloading missing chunks, it finds the current chunk as it comes in and checks its digest against the full list of chunk digests in the target file. Needless to say this O(n log n) algorithm is less than efficient and makes the write process slow down the more of the file that gets downloaded. On armv7, this delay in the write callback seems to be hitting a timeout, causing the connection to be aborted. I've just released and built zchunk-1.1.0 which stores the target digest in the internal struct, reducing the lookup time to O(1), drastically reducing CPU usage, and avoiding the timeout. In testing, this dropped the likelihood of hitting the HTTP/2 framing error from roughly 50% to 15%. Further investigation showed that libcurl was complaining about being unable to lookup the hostname, even though it had already successfully done so (see Kamil's message in comment 4). This is a known bug (https://github.com/curl/curl/issues/3629) in libcurl-7.64 that supposedly has no effect on the actual download, but, due to some mix of architecture, timing and the aurora borealis, it seems to be killing the downloads roughly 1 in 24 times (which, when multiplied by three for a compose, means 1 in 8 would fail). When both zchunk-1.1.0 and Kamil's libcurl-7.64.0-5 are installed on the test system, I haven't hit any framing errors in between 50 and 100 attempts. To test, I'd like to request that we re-enable zchunk for Rawhide composes only. The latest libcurl is already there, and zchunk-1.1.0 should be in the next compose, so it should just work.
Comment 7 Kamil Dudka 2019-03-25 11:41:45 UTC
Thank you for tracking it down! I realized that all the "Expire in ..." messages appear in the verbose output also by mistake. The following commit will fix it: https://src.fedoraproject.org/cgit/rpms/curl.git/commit/?id=7594f15b
Comment 8 Fedora Update System 2019-03-25 12:53:18 UTC
curl-7.64.0-6.fc30 has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-530dc1ff43
Comment 9 Fedora Update System 2019-03-25 18:54:37 UTC
curl-7.64.0-6.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-530dc1ff43
Comment 10 Fedora Update System 2019-03-25 21:28:45 UTC
zchunk-1.1.0-1.fc30 has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-c83e45fbc8
Comment 11 Fedora Update System 2019-03-25 21:28:54 UTC
zchunk-1.1.0-1.fc29 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2019-291b0ea9e8
Comment 12 Fedora Update System 2019-03-27 00:45:07 UTC
zchunk-1.1.0-1.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-c83e45fbc8
Comment 13 Fedora Update System 2019-03-27 04:34:23 UTC
zchunk-1.1.0-1.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-291b0ea9e8
Comment 14 Fedora Update System 2019-03-29 19:20:32 UTC
curl-7.64.0-6.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.
Comment 15 Fedora Update System 2019-03-29 19:20:41 UTC
zchunk-1.1.0-1.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.
Comment 16 Fedora Update System 2019-04-03 03:31:10 UTC
zchunk-1.1.0-1.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.