Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 597288 - Install fails with "(enableNetwork)"
Summary: Install fails with "(enableNetwork)"
Keywords:
Status: CLOSED DUPLICATE of bug 608773
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: anaconda
Version: 6.0
Hardware: All
OS: Linux
low
high
Target Milestone: rc
: ---
Assignee: Radek Vykydal
QA Contact: Release Test Team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-05-28 15:15 UTC by Nate Straz
Modified: 2010-07-12 16:46 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-07-12 16:46:33 UTC
Target Upstream Version:


Attachments (Terms of Use)
Kickstart file used (deleted)
2010-05-28 15:16 UTC, Nate Straz
no flags Details
contents of /tmp during anaconda install failure (deleted)
2010-05-28 18:30 UTC, Nate Straz
no flags Details
contents of /tmp during anaconda install failure w/ updates.img (deleted)
2010-07-09 18:13 UTC, Nate Straz
no flags Details
contents of /tmp during anaconda install failure w/ updates.img (deleted)
2010-07-09 18:19 UTC, Nate Straz
no flags Details

Description Nate Straz 2010-05-28 15:15:50 UTC
Description of problem:

Looking for installation images on CD device /dev/sr0
Retrieving http://nsew.lab.bos.redhat.com/dist/rel-eng/RHEL6.0-20100526.2/6/Server/x86_64/os/images/updates.img...
Retrieving http://nsew.lab.bos.redhat.com/dist/rel-eng/RHEL6.0-20100526.2/6/Server/x86_64/os/images/product.img...
Retrieving http://nsew.lab.bos.redhat.com/dist/rel-eng/RHEL6.0-20100526.2/6/Server/x86_64/os/images/install.img...
Running anaconda 13.21.48, the Red Hat Enterprise Linux system
installer - please wait.
Finding storage devices
Creating device /dev/sda1
Creating ext4 filesystem on /dev/sda1
In progress

Creating device /dev/sda2
Creating physical volume (LVM) on /dev/sda2
In progress

Creating device /dev/mapper/vg_west01
In progress

Creating device /dev/mapper/vg_west01-lv_root
In progress

Creating ext4 filesystem on /dev/mapper/vg_west01-lv_root
In progress

Creating device /dev/mapper/vg_west01-lv_home
In progress

Creating ext4 filesystem on /dev/mapper/vg_west01-lv_home
In progress

Creating device /dev/mapper/vg_west01-lv_swap
In progress

Creating swap on /dev/mapper/vg_west01-lv_swap
In progress

Transferring install image to hard drive
Retrieving installation information for Red Hat Enterprise Linux.
Retrieving installation information for Server.
Retrieving installation information for Server-optional.
Retrieving installation information for Red Hat Enterprise Linux.
Retrieving installation information for Server.
Retrieving installation information for Server-optional.
Checking dependencies in packages selected for installation
In progress
Completed

Starting installation process
In progress
Can't have a question in command line mode!
(enableNetwork)


Version-Release number of selected component (if applicable):
RHEL6.0-20100526.2

How reproducible:
I'm hitting it 60% of the time

Steps to Reproduce:
1. Install w/ kickstart file and "cmdline" option
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Nate Straz 2010-05-28 15:16:53 UTC
Created attachment 417650 [details]
Kickstart file used

Comment 2 Chris Lumens 2010-05-28 15:23:55 UTC
Please attach /tmp/anaconda.log to this bug report.

Comment 4 Nate Straz 2010-05-28 15:27:25 UTC
The install never completed, how am I supposed to get that file off the system?

Comment 5 Chris Lumens 2010-05-28 15:43:14 UTC
Well, since the problem is with enabling the network, the usual methods won't work.  You could always hook a console up to the machine and sit directly in front of it.  You should also be able to add a kernel parameter to the command line to spit tty3 out on a serial console, right?  Or you can try the same install in virt and see if you hit the same problem there.  That should be a better environment to get data out of.

Comment 6 Nate Straz 2010-05-28 15:48:44 UTC
The network isn't the problem because I'm installing over PXE and it got all the way through retrieving install info for all the repos and checking deps.  I tried the "sshpw" option[1] but that doesn't seem to work at all in RHEL6.

[1] http://fedoraproject.org/wiki/Anaconda/Kickstart

Comment 7 Chris Lumens 2010-05-28 17:16:06 UTC
> The network isn't the problem

Clearly, the network is the problem.  We're trying to bring up the network for some reason here.  Either it's not up when it should be, or we have a test for whether the network is active that is incorrectly returning False.  Regardless, we are attempting to display a network-related dialog during an installation mode that does not support displaying dialogs.

I don't want to suggest you ssh in or use syslog= because that could alter the state of whether anaconda thinks the network is up or not.

> because I'm installing over PXE

Just because something PXE booted does not mean anything as far as networking in anaconda goes.  That's completely different code.

> and it got all
> the way through retrieving install info for all the repos and checking deps.

True, which lends some credence to my idea that we have a test returning False when it shouldn't be.

Comment 8 Nate Straz 2010-05-28 18:30:42 UTC
Created attachment 417717 [details]
contents of /tmp during anaconda install failure

I was able to reproduce this without cmdline, get in and rsync /tmp out.  The network was up and working perfectly.

Comment 9 RHEL Product and Program Management 2010-05-28 19:55:18 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 10 David Cantrell 2010-06-02 18:17:30 UTC
This looks like our hasActiveNetDev() function is no longer working as expected.  If you were able to get in and use the network, then NetworkManager is lying to us over D-Bus.

Reassigning to NetworkManager.  Dan, here is what our code looks like that's returning False.  If we need to be using new D-Bus magic, let us know.

NM_SERVICE = "org.freedesktop.NetworkManager"
NM_MANAGER_PATH = "/org/freedesktop/NetworkManager"
NM_MANAGER_IFACE = "org.freedesktop.NetworkManager"
NM_ACTIVE_CONNECTION_IFACE = "org.freedesktop.NetworkManager.Connection.Active"
NM_CONNECTION_IFACE = "org.freedesktop.NetworkManagerSettings.Connection"
NM_DEVICE_IFACE = "org.freedesktop.NetworkManager.Device"
NM_IP4CONFIG_IFACE = "org.freedesktop.NetworkManager.IP4Config"

def hasActiveNetDev():
    try:
        bus = dbus.SystemBus()
        nm = bus.get_object(isys.NM_SERVICE, isys.NM_MANAGER_PATH)
        props = dbus.Interface(nm, isys.DBUS_PROPS_IFACE)
        state = props.Get(isys.NM_SERVICE, "State")

        if int(state) == isys.NM_STATE_CONNECTED:
            return True
        else:
            return False
    except:
        return False

Comment 11 David Cantrell 2010-06-02 18:18:10 UTC
Oh, and we have these:

NM_STATE_UNKNOWN = 0
NM_STATE_ASLEEP = 1
NM_STATE_CONNECTING = 2
NM_STATE_CONNECTED = 3
NM_STATE_DISCONNECTED = 4
NM_DEVICE_STATE_ACTIVATED = 8

Comment 12 David Cantrell 2010-06-02 18:18:38 UTC
...and this:

DBUS_PROPS_IFACE = "org.freedesktop.DBus.Properties"

Comment 13 Nate Straz 2010-06-21 20:15:34 UTC
Adding blocker flag since this causes kickstart installs w/ "cmdline" option to fail 50% of the time.  That makes installing a cluster an almost certain failure case.

Comment 14 Nate Straz 2010-06-21 20:16:48 UTC
FYI, this also shows up as a failure to download a package during install.  If you're not using "cmdline" you can tell it to retry and the install continues.

Comment 15 Dan Williams 2010-06-22 19:54:52 UTC
David, any chance I can get you to check in this?

  except Exception, e:
    print "Error checking network state: %s" % e
    return False

or something like that?

Comment 16 Dan Williams 2010-06-22 20:04:22 UTC
The following appears to work correctly for me:

import dbus

NM_STATE_CONNECTED = 3

DBUS_PROPS_IFACE = "org.freedesktop.DBus.Properties"
NM_SERVICE = "org.freedesktop.NetworkManager"
NM_MANAGER_PATH = "/org/freedesktop/NetworkManager"
NM_MANAGER_IFACE = "org.freedesktop.NetworkManager"

def hasActiveNetDev():
    try:
        bus = dbus.SystemBus()
        nm = bus.get_object(NM_SERVICE, NM_MANAGER_PATH)
        props = dbus.Interface(nm, DBUS_PROPS_IFACE)
        state = props.Get(NM_SERVICE, "State")

        print state
        if int(state) == NM_STATE_CONNECTED:
            return True
        else:
            return False
    except Exception, e:
	print e
        return False

print hasActiveNetDev()

but of course that's not in the installer environment.

Comment 17 Nate Straz 2010-06-22 20:34:47 UTC
I'm willing to test this if I can get an updates.img to apply to a tree.

Comment 18 Dan Williams 2010-06-23 22:20:18 UTC
-> anaconda for that debugging change to ensure this doesn't get lost in the ether...  when we've got a version that has some debugging bits in it we'll flip back to NM.

Comment 19 Dan Williams 2010-06-23 22:23:38 UTC
*** Bug 605377 has been marked as a duplicate of this bug. ***

Comment 20 Kurt Keller 2010-07-08 06:41:47 UTC
I found that the uploaded anaconda.log file ends with the same WARNING we see on a similar install problem with RHEL6b2. Possibly related?

Our problem:

We encounter exactly one error on every kickstart install. This prevents us from performing the fully automated installs we require, as somebody has to manually select 'retry'.
This behaviour is consistent on both physical and virtual installs when installing RHEL6b2 and does not happen when installing RHEL4/5 or Fedora13, using the same webserver, network etc. Also when doing two simultaneous installs, one RHEL6b2 and one F13, RHEL6b2 does have problems and F13 does not, so we do not think it is a problem with the webserver or the like.

What happens is:

- we PXE boot
- kickstart and channels are accessed via http
- while in the stage "Starting installation process", there will always be exactly one rpm package (not always the same) with the following error: "The file xxxx.rpm cannot be opened. This is due to a missing file, a corrupt package or corrupt media..."
- selecting retry works fine and the installation continues
- 30 seconds before this error occurs, we see in the webserver logs, that last request for partial of this file (status code 206)
- at the time this occurrs we see syslog messages from the machine being installed saying it timed out on the file

The last relevant entry in the webserver log:

10.0.45.176 - - [07/Jul/2010:07:58:26 +0000] "GET /linux/channels/rhel6b2/x86_64/h1/base/Packages/glibc-2.12-1.2.el6.x86_64.rpm HTTP/1.1" 206 192428 "-" "Red Hat Enterprise Linux (anaco nda)/6.0"

The relevant syslog messages from the machine to be installed:

Jul  7 07:58:57 lilx003 WARNING  Try 1/10 for http://lifibswww1.hag.hilti.com/linux/channels/rhel6b2/x86_64/h1/base/Packages/glibc-2.12-1.2.el6.x86_64.rpm failed: [Errno 12] Timeout on http://lifibswww1.hag.hilti.com/linux/channels/rhel6b2/x86_64/h1/base/Packages/glibc-2.12-1.2.el6.x86_64.rpm: (28, '')
Jul  7 07:58:57 lilx003 WARNING  Try 1/10 for http://lifibswww1.hag.hilti.com/linux/channels/rhel6b2/x86_64/h1/base/Packages/glibc-2.12-1.2.el6.x86_64.rpm failed: [Errno 12] Timeout on http://lifibswww1.hag.hilti.com/linux/channels/rhel6b2/x86_64/h1/base/Packages/glibc-2.12-1.2.el6.x86_64.rpm: (28, '')
Jul  7 07:58:57 lilx003 WARNING  Failed to get http://lifibswww1.hag.hilti.com/linux/channels/rhel6b2/x86_64/h1/base/Packages/glibc-2.12-1.2.el6.x86_64.rpm from mirror 1/1, or downloaded file is corrupt
Jul  7 07:58:57 lilx003 WARNING  Failed to get http://lifibswww1.hag.hilti.com/linux/channels/rhel6b2/x86_64/h1/base/Packages/glibc-2.12-1.2.el6.x86_64.rpm from mirror 1/1, or downloaded file is corrupt

Comment 21 Simon Matter 2010-07-09 07:25:39 UTC
Just curious, how is NM configured by anaconda, does it survive something
like "link lost for 45 seconds" without any problems?

The reason why I ask is that I remember NM did immediately remove the interface on "link lost" and while it may be what you want on a desktop it may for sure not be desirable in any server environment.

Comment 22 Radek Vykydal 2010-07-09 16:06:36 UTC
(In reply to comment #17)
> I'm willing to test this if I can get an updates.img to apply to a tree.    

Here is updates.img with exception logging from comment #15
http://rvykydal.fedorapeople.org/updates.597288.img
for anaconda 13.21.48 (that from description).

Comment 23 Nate Straz 2010-07-09 18:13:01 UTC
Created attachment 430732 [details]
contents of /tmp during anaconda install failure w/ updates.img

Here is the contents of /tmp after reproducing this.  I hit the (enableNetwork) error, not the retry package download error.

Comment 24 Nate Straz 2010-07-09 18:19:56 UTC
Created attachment 430734 [details]
contents of /tmp during anaconda install failure w/ updates.img

This is a gzipped tarball.  The last one was double-gzipped.

Comment 25 Nate Straz 2010-07-09 20:22:55 UTC
The error from anaconda.log is:

14:05:24,448 INFO    : Preparing to install packages
14:05:25,655 WARNING : Try 1/10 for http://nsew.lab.bos.redhat.com/dist/rel-eng/RHEL6.0-20100526.2/6/Server/x86_64/os/Packages/libselinux-uti
ls-2.0.94-1.el6.x86_64.rpm failed: [Errno 14] PYCURL ERROR 7 - ""
14:05:25,906 WARNING : Failed to get http://nsew.lab.bos.redhat.com/dist/rel-eng/RHEL6.0-20100526.2/6/Server/x86_64/os/Packages/libselinux-ut
ils-2.0.94-1.el6.x86_64.rpm from mirror 1/1, or downloaded file is corrupt

In syslog I can see that the network was up before that, but there was an error just after it came up:

<30>Jul  9 18:03:48 NetworkManager[495]: <info> Activation (eth0) successful, device activated.
<30>Jul  9 18:03:48 NetworkManager[495]: <info> Activation (eth0) Stage 5 of 5 (IP Configure Commit) complete.
<28>Jul  9 18:03:48 nm-dispatcher.action: nm_dispatcher_action: Invalid connection: '(null)' / 'connection setting not found' invalid: 1
<28>Jul  9 18:03:48 nm-dispatcher.action: _nm_object_get_property: Error getting 'Options' for /org/freedesktop/NetworkManager/DHCP4Config/0:
 (9) Interface "org.freedesktop.DBus.Properties" isn't exported (or may not exist), can't access property "Options"

Then about the time the packages are being downloaded the connection resets:

<29>Jul  9 18:05:25 NetworkManager[495]:    ifcfg-rh: removed /etc/sysconfig/network-scripts/ifcfg-eth0.
<30>Jul  9 18:05:25 NetworkManager[495]: <info> (eth0): device state change: 8 -> 3 (reason 38)
<30>Jul  9 18:05:25 NetworkManager[495]: <info> (eth0): deactivating device (reason: 38).
<30>Jul  9 18:05:25 NetworkManager[495]: <info> (eth0): canceled DHCP transaction, DHCP client pid 514
<27>Jul  9 18:05:25 NetworkManager[495]: <error> [1278698725.654355] [nm-system.c:1229] check_one_route(): (eth0): error -34 returned from rt
nl_route_del(): Netlink Error (errno = Numerical result out of range)


I didn't find the message "Error checking network state" anywhere in the logs.

Comment 27 Radek Vykydal 2010-07-12 11:57:11 UTC
Nate, thanks for retesting, you are right pinpointing the issue in comment #25. I believe it is fixed in bug #608773, that is in anaconda-13.21.55-1 which should be in RHEL6 Beta2 snapshot7 (RHEL6.0-20100707.4).

I prepared another updates.img with backport of the patch for anaconda 13.21.48 (that from description):
http://rvykydal.fedorapeople.org/updates.597288+.img
Could you please give it another re-testing shot? You can use the new updates image on the tree from description, or you can test with snap 7.

I'd rather mark this bug as dupe of #608773 after you confirm that it is fixed for you with the updates image or in snap7.

Comment 28 Nate Straz 2010-07-12 16:20:24 UTC
I did 15 installs using the updates.597288+.img and I didn't hit this bug at all. Previously I would have hit the bug several times.  Looks like a good fix to me.

Comment 29 Radek Vykydal 2010-07-12 16:46:33 UTC
Thanks for testing, I'm marking the bug as a dupe of #608773 based on comment #28.

*** This bug has been marked as a duplicate of bug 608773 ***


Note You need to log in before you can comment on or make changes to this bug.