Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1516990 - dnf 'sack setup' during installer initialization takes way longer on Modular images
Summary: dnf 'sack setup' during installer initialization takes way longer on Modular ...
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: fedora-modular-release
Version: 27
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Martin Hatina
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-23 19:23 UTC by Adam Williamson
Modified: 2018-04-16 12:15 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-04-16 12:15:45 UTC


Attachments (Terms of Use)

Description Adam Williamson 2017-11-23 19:23:03 UTC
In the last couple of Modular Server composes, several openQA tests failed due to a timeout while waiting for the 'timbuktu screen' (the pre-release warning dialog) to appear.

Digging into this, it seems like anaconda will not proceed from the 'language selection' screen to displaying the timbuktu warning until a dnf operation has completed, and that dnf operation for some reason takes *much* longer on Modular Server composes.

Here's an affected test:

https://openqa.fedoraproject.org/tests/173644

If we look at its os-autoinst log:

https://openqa.fedoraproject.org/tests/173644/file/autoinst-log.txt

we see that the test clicks the 'continue' button on the language selection screen at 15:27:49.7007 (trust me, that's what it says :>), but by 15:28:20.0786 , the 'timbuktu warning' has not appeared. It *does* appear shortly thereafter, however. If we watch the video:

https://openqa.fedoraproject.org/tests/173644/file/video.ogv

we can see that after the test clicked 'continue', anaconda sat there "thinking" - with a spinning cursor - for quite a while before the 'timbuktu screen' appeared. Note the video is heavily sped-up, so although it only appears to take 2 seconds or so in the video, it was over 30 seconds in reality.

If we look at anaconda.log:

https://openqa.fedoraproject.org/tests/173644/file/_boot_to_anaconda-anaconda.log

we see this:

15:28:23,333 DBG payload: Updating payload thread state: 6
15:28:23,333 INF threading: Thread Done: AnaPayloadThread (140178653132544)

which rather suggests that whatever it was thinking about *finally* got done at 15:28:23, just after openQA gave up waiting for it and failed. The final piece of the puzzle seems to be in packaging.log - https://openqa.fedoraproject.org/tests/173644/file/_boot_to_anaconda-packaging.log :

15:27:36,832 INF packaging: configuring base repo
...
15:28:17,230 Level 8 dnf: timer: sack setup: 40008 ms
15:28:17,248 Level 8 dnf: Getting group metadata
15:28:17,249 Level 8 dnf: Adding group file from repository: anaconda
15:28:17,484 Level 8 dnf: timer: loading comps: 238 ms

note that in anaconda.log at 15:28:17 we get this:

15:28:17,487 INF payload: Refreshing environmentAddons

so that all ties in very nicely - this dnf 'sack setup' operation takes 40 seconds, then anaconda spends six seconds 'Refreshing environmentAddons', then it *finally* gets done and moves on to display the 'timbuktu screen' at 15:28:23, but that's too late. The dnf operation seems to be set up to just fire off during installer initialization, I don't think anything the user does triggers it.

Now let's compare these timings to a *non*-modular compose. Here's a random failed test with the relevant logs present from a Rawhide compose:

https://openqa.fedoraproject.org/tests/172138

Note that test ran with the Server DVD image, just like the modular compose tests. Now let's look at the timings in packaging.log - https://openqa.fedoraproject.org/tests/172138/file/_boot_to_anaconda-packaging.log :

22:52:46,666 INF packaging: configuring base repo
...
22:52:49,931 Level 8 dnf: timer: sack setup: 1448 ms
22:52:49,932 Level 8 dnf: Getting group metadata
22:52:49,932 Level 8 dnf: Adding group file from repository: anaconda
22:52:49,953 Level 8 dnf: timer: loading comps: 21 ms

it takes just 1.5 seconds to do its thing, and actually gets done before the test gets to the 'continue' button, so the test perceives no delay at all between clicking 'continue' and the 'timbuktu screen' appearing.

Finally, note this problem seems to be getting worse. In the last two composes - Fedora-Modular-27-20171123.n.1 and Fedora-Modular-27-20171123.n.0 - many tests failed due to this problem. In previous Modular composes, it didn't cause any tests to fail. This is because, while the 'sack setup' operation still took much longer than it does on non-modular images, it didn't take over 30 seconds. See a packaging.log from a job from Fedora-Modular-27-20171121.n.1 - https://openqa.fedoraproject.org/tests/172803/file/_do_install_and_reboot-packaging.log :

15:45:17,248 INF packaging: configuring base repo
...
15:45:37,885 Level 8 dnf: timer: sack setup: 20451 ms
15:45:37,894 Level 8 dnf: Getting group metadata
15:45:37,894 Level 8 dnf: Adding group file from repository: anaconda
15:45:38,070 Level 8 dnf: timer: loading comps: 176 ms

so it took 20 seconds. With the few seconds it takes for the test to select a language and click 'continue', it only had to wait about 13 seconds for the 'timbuktu screen' to appear, which is within the 30 second timeout, so the test didn't fail. Now the operation takes 40 seconds, openQA happens to be right on the cusp - if getting through language selection happens to take more than 10 seconds after the dnf operation starts, then the test will just barely be OK, if it happens to take a bit less than 10 seconds, the test will fail. So we get quite a lot of tests failing due to this problem, but not all of them.

For openQA purposes I can deal with this quite easily - just bump up the timeout when it's waiting for the timbuktu screen. But this is clearly a user-visible bug, and it seems to be getting *worse*. Something's obviously off here.

Comment 1 Adam Williamson 2017-11-23 23:54:54 UTC
This also seems to affect package installs after initial system install. When I go to install a package, the 'Last metadata expiration check' message appears promptly, then metadata downloads - if any need to be refreshed - happen promptly. Then there's a long pause - which appears to be the 'sack setup' time - then the "Dependencies resolved." message appears and the rest of the transaction proceeds promptly.

Checking /var/log/dnf.log shows a 20+ second 'sack setup' time happening.

Comment 2 Adam Williamson 2017-11-23 23:55:46 UTC
However, when *removing* a package, sack setup is almost instantaneous.

Comment 3 Adam Williamson 2017-11-24 00:19:58 UTC
In fact, on an installed system in openQA it seems to be far worse, for some reason - I just bumped the timeouts for various package install operations, but my retry still failed:

https://openqa.fedoraproject.org/tests/173997

because the sack setup times are absurd:

[adamw@adam tmp]$ grep "sack setup" var/log/dnf.log 
2017-11-24T00:11:51Z DDEBUG timer: sack setup: 238224 ms
2017-11-24T00:15:13Z DDEBUG timer: sack setup: 173985 ms
2017-11-24T00:15:45Z DDEBUG timer: sack setup: 33891 ms

it failed on that second one - I bumped the timeout for that install from 120 seconds to 160, but if 'sack setup' takes 173 seconds, that's not gonna cut it :/

Comment 4 Daniel Mach 2018-04-16 12:15:45 UTC
This was fixed by several performance improvements in libdnf already.


Note You need to log in before you can comment on or make changes to this bug.