Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 507391 - qemu-kvm PXE boot with e1000 results in bogus packets
Summary: qemu-kvm PXE boot with e1000 results in bogus packets
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: etherboot
Version: 11
Hardware: All
OS: Linux
high
medium
Target Milestone: ---
Assignee: Mark McLoughlin
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 494541 (view as bug list)
Depends On:
Blocks: F11VirtTarget
TreeView+ depends on / blocked
 
Reported: 2009-06-22 16:03 UTC by Gilboa Davara
Modified: 2009-07-02 05:41 UTC (History)
9 users (show)

Fixed In Version: 5.4.4-16.fc11
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-07-02 05:41:58 UTC


Attachments (Terms of Use)
DSL VM configuration (deleted)
2009-06-22 16:03 UTC, Gilboa Davara
no flags Details
Private bridge configuration. (Bridge running in promisc mode) (deleted)
2009-06-22 16:04 UTC, Gilboa Davara
no flags Details
tap42 wireshark recording. (deleted)
2009-06-22 16:05 UTC, Gilboa Davara
no flags Details

Description Gilboa Davara 2009-06-22 16:03:07 UTC
Created attachment 348934 [details]
DSL VM configuration

Description of problem:
I've upgraded my first KVM host to F11.
I'm trying to boot DSL (Damn Small Linux) using bootpxe.
This test works just fine under F9 and F10.

Version-Release number of selected component (if applicable):
qemu-0.10.4-4.fc11.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Setup a private bridge. (Configuration attached.)
2. Setup a qemu empty VM. (Configuration attached.)
3. Boot.

Actual results:
Client fails to receive an IP. Host sees invalid packets. (pcap attached)

Expected results:
boot.

Comment 1 Gilboa Davara 2009-06-22 16:04:02 UTC
Created attachment 348935 [details]
Private bridge configuration. (Bridge running in promisc mode)

Comment 2 Gilboa Davara 2009-06-22 16:05:37 UTC
Created attachment 348936 [details]
tap42 wireshark recording.

Comment 3 Gilboa Davara 2009-06-22 16:31:18 UTC
P.S. dhcp works just fine, once the OS actually boots.

Comment 4 Mark McLoughlin 2009-06-22 16:49:22 UTC
What version of etherboot is this? Does etherboot-5.4.4-15.fc11 help?

  https://admin.fedoraproject.org/updates/etherboot-5.4.4-15.fc11

I doubt it - those frames are pretty messed up. Does it work with e.g. rtl8139, virtio, ne2k_pci or pcnet?

Comment 5 Gilboa Davara 2009-06-22 17:56:57 UTC
Works just fine with rtl8139 with etherboot-5.4.4-13.
I'm still getting trashed 0xff frames with etherboot-5.4.4-15.

- Gilboa

Comment 6 Mark McLoughlin 2009-06-23 11:40:12 UTC
Okay, so the packet dump shows the type field in the ethernet header is (incorrectly) zero.

Enabling debugging in etherboot-5.4.4/drivers/net/e1000.c made the problem go away, which was the first clue.

The code is as follows:

    struct eth_hdr {
        unsigned char dst_addr[ETH_ALEN];
	unsigned char src_addr[ETH_ALEN];
        unsigned short type;
    } hdr;
    ...
    hdr.type = htons (type);
    txhd = tx_base + tx_tail;
    tx_tail = (tx_tail + 1) % 8;
    ...
    txhd->buffer_addr = virt_to_bus (&hdr);
    ...
    E1000_WRITE_REG (&hw, TDT, tx_tail);

i.e. we're setting the type in the header on the stack, setting up a tx descriptor to point to header on the stack and then writing the descriptor number to the device queue.

Looking at the assembly, I see:

     36d:       8b 4c 24 38             mov    0x38(%esp),%ecx
     371:       86 cd                   xchg   %cl,%ch
     ...
     3fb:       89 90 18 38 00 00       mov    %edx,0x3818(%eax)
     ...
     407:       66 89 4c 24 1e          mov    %cx,0x1e(%esp)

i.e. we're only actually moving the results of the htons() into the header on the stack until after we've set the TDT register. At that point the packet has already been sent.

The problem is that the compiler has no way of knowing this memory is used as a result of us writing to the register. So, if we do:

-       struct eth_hdr {
+       volatile struct eth_hdr {

we see:

     36c:       8b 44 24 38             mov    0x38(%esp),%eax
     370:       86 c4                   xchg   %al,%ah
     372:       66 89 44 24 1e          mov    %ax,0x1e(%esp)
     ...
     400:       89 90 18 38 00 00       mov    %edx,0x3818(%eax)

This fixes the problem.

Comment 7 Mark McLoughlin 2009-06-23 11:47:56 UTC
* Tue Jun 23 2009 Mark McLoughlin <markmc@redhat.com> - 5.4.4-16
- Fix e1000 PXE boot - caused by compiler optimization (bug #507391)

Comment 8 Mark McLoughlin 2009-06-23 11:57:41 UTC
*** Bug 494541 has been marked as a duplicate of this bug. ***

Comment 9 Fedora Update System 2009-06-23 11:59:01 UTC
etherboot-5.4.4-16.fc11 has been submitted as an update for Fedora 11.
http://admin.fedoraproject.org/updates/etherboot-5.4.4-16.fc11

Comment 11 Fedora Update System 2009-06-27 02:58:25 UTC
etherboot-5.4.4-16.fc11 has been pushed to the Fedora 11 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update etherboot'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F11/FEDORA-2009-7024

Comment 12 Gilboa Davara 2009-06-29 14:32:14 UTC
etherboot-5.4.4-16.fc11.noarch seems to solve the problem.

- Gilboa

Comment 13 Kari Hautio 2009-06-30 11:19:22 UTC
etherboot-5.4.4-16.fc11 works for me also and solves no IP problem (bug #494541)

Comment 14 Mark McLoughlin 2009-06-30 13:22:58 UTC
Gilboa and Kari, thanks for testing - I'll push to stable now

Note, in future, if you go to the update url:

  https://admin.fedoraproject.org/updates/F11/FEDORA-2009-7024

you can login and add a comment - this increases the update's 'karma'; if enough people comment, the update gets pushed automatically

Comment 15 Gilboa Davara 2009-06-30 14:57:54 UTC
Thanks. Will do.

- Gilboa

Comment 16 Fedora Update System 2009-07-02 05:41:47 UTC
etherboot-5.4.4-16.fc11 has been pushed to the Fedora 11 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.