Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 156608 - [RHEL3 U4] The system clock gains much time when netconle is activated.
Summary: [RHEL3 U4] The system clock gains much time when netconle is activated.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Dave Anderson
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 156321
TreeView+ depends on / blocked
 
Reported: 2005-05-02 14:01 UTC by Issue Tracker
Modified: 2010-10-22 02:57 UTC (History)
4 users (show)

Fixed In Version: RHSA-2005-663
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-09-28 15:01:29 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2005:663 qe-ready SHIPPED_LIVE Important: Updated kernel packages available for Red Hat Enterprise Linux 3 Update 6 2005-09-28 04:00:00 UTC

Description Issue Tracker 2005-05-02 14:01:01 UTC
Escalated to Bugzilla from IssueTracker

Comment 9 Dave Anderson 2005-05-02 21:04:50 UTC
Adding Jeff Moyer to cc: list in hopes he can help explain this one.

Jeff, 
  
zap_completion_queue() is called in both the netconsole and netump
code paths AFAICT.  This code then, seemingly could wreak havoc,
i.e., the jiffies bump below (and what happens if they set idle_timeout?):

        if (idle_timeout) {
                if (t0) {
                        if (((t1 - t0) >> 20) > mhz_cycles * (unsigned long
long)idle_timeout) {
                                t0 = t1;
                                printk("netdump idle timeout - rebooting in 3
seconds.\n");
                                mdelay(3000);
                                machine_restart(NULL);
                        }
                }
        }
        /* maintain jiffies in a polling fashion, based on rdtsc. */
        {
                static unsigned long long prev_tick;

                if (t1 - prev_tick >= jiffy_cycles) {
                        prev_tick += jiffy_cycles;
                        jiffies++;
                }
        }

Since the code has always been like this, what am I missing?


Comment 10 Dave Anderson 2005-05-02 21:10:34 UTC
setting back to kernel...

Comment 11 Dave Anderson 2005-05-02 21:14:00 UTC
Jeff, looks like those two if statements need a netdump_mode check?

Comment 12 Dave Anderson 2005-05-03 12:50:40 UTC
The user-land mhz argument sent to the netconsole module is basically
ignored, unless, during module load, upon reading the tsc two successive
times with an mdelay() in between, it happens to have done so when the
tsc wrapped around:

        platform_timestamp(t0);
        mdelay(1);
        platform_timestamp(t1);

In other works, if t1 > 0, mhz is completely ignored.  So let's put
that issue out of the picture.

The question is whether netconsole should be doing anything at all
with jiffies during runtime.  Doing an alt-sysrq-t operation with
thousands of processes, or simply repeated keyboard-generated attempts
(instead of echoing to /proc/sysrq-trigger), is essentially one huge
interrupt handler.  I don't know what the author's intent was -- to
"help" jiffies along, or whether it was meant to only do so in a netdump
operation?

What would happen say, if a 9600-baud serial console were hooked up,
without netconsole registered, where a single alt-sysrq-t on a system
with thousands of processes could consume several minutes?

It should also be kept in mind that alt-sysrq-t is a debug strategy,
not something that should be done in the normal course of events.
Furthermore, using /proc/sysrq-trigger does the operation in process
context so clock interrupts wouldn't be blocked.






Comment 13 Dave Anderson 2005-05-05 13:28:42 UTC
Ok, the fix will be to make this simple change to zap_completion_queue():

        /* maintain jiffies in a polling fashion, based on rdtsc. */
-       {
+       if (netdump_mode) {
                 static unsigned long long prev_tick;
  
                 if (t1 - prev_tick >= jiffy_cycles) {
                         prev_tick += jiffy_cycles;
                         jiffies++;
                 }
         }

Note that there is no way the idle_timeout check above it can cause a problem,
because t0 can never be set until a netdump operation is set in motion.

netconsole.c has no business mucking around with jiffies during runtime.


Comment 16 Dave Anderson 2005-05-13 18:54:44 UTC
Should be -- the patch will be posted in conjunction with BZ #157439,
which I'm waiting for IBM to test.

Comment 18 Ernie Petrides 2005-06-09 03:25:09 UTC
A fix for this problem has just been committed to the RHEL3 U6
patch pool this evening (in kernel version 2.4.21-32.7.EL).


Comment 27 Red Hat Bugzilla 2005-09-28 15:01:29 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2005-663.html



Note You need to log in before you can comment on or make changes to this bug.