Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.

Bug 156608

Summary: [RHEL3 U4] The system clock gains much time when netconle is activated.
Product: Red Hat Enterprise Linux 3 Reporter: Issue Tracker <tao>
Component: kernelAssignee: Dave Anderson <anderson>
Status: CLOSED ERRATA QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: jmoyer, linville, petrides, tao
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHSA-2005-663 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-09-28 15:01:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 156321    

Description Issue Tracker 2005-05-02 14:01:01 UTC
Escalated to Bugzilla from IssueTracker

Comment 9 Dave Anderson 2005-05-02 21:04:50 UTC
Adding Jeff Moyer to cc: list in hopes he can help explain this one.

Jeff, 
  
zap_completion_queue() is called in both the netconsole and netump
code paths AFAICT.  This code then, seemingly could wreak havoc,
i.e., the jiffies bump below (and what happens if they set idle_timeout?):

        if (idle_timeout) {
                if (t0) {
                        if (((t1 - t0) >> 20) > mhz_cycles * (unsigned long
long)idle_timeout) {
                                t0 = t1;
                                printk("netdump idle timeout - rebooting in 3
seconds.\n");
                                mdelay(3000);
                                machine_restart(NULL);
                        }
                }
        }
        /* maintain jiffies in a polling fashion, based on rdtsc. */
        {
                static unsigned long long prev_tick;

                if (t1 - prev_tick >= jiffy_cycles) {
                        prev_tick += jiffy_cycles;
                        jiffies++;
                }
        }

Since the code has always been like this, what am I missing?


Comment 10 Dave Anderson 2005-05-02 21:10:34 UTC
setting back to kernel...

Comment 11 Dave Anderson 2005-05-02 21:14:00 UTC
Jeff, looks like those two if statements need a netdump_mode check?

Comment 12 Dave Anderson 2005-05-03 12:50:40 UTC
The user-land mhz argument sent to the netconsole module is basically
ignored, unless, during module load, upon reading the tsc two successive
times with an mdelay() in between, it happens to have done so when the
tsc wrapped around:

        platform_timestamp(t0);
        mdelay(1);
        platform_timestamp(t1);

In other works, if t1 > 0, mhz is completely ignored.  So let's put
that issue out of the picture.

The question is whether netconsole should be doing anything at all
with jiffies during runtime.  Doing an alt-sysrq-t operation with
thousands of processes, or simply repeated keyboard-generated attempts
(instead of echoing to /proc/sysrq-trigger), is essentially one huge
interrupt handler.  I don't know what the author's intent was -- to
"help" jiffies along, or whether it was meant to only do so in a netdump
operation?

What would happen say, if a 9600-baud serial console were hooked up,
without netconsole registered, where a single alt-sysrq-t on a system
with thousands of processes could consume several minutes?

It should also be kept in mind that alt-sysrq-t is a debug strategy,
not something that should be done in the normal course of events.
Furthermore, using /proc/sysrq-trigger does the operation in process
context so clock interrupts wouldn't be blocked.






Comment 13 Dave Anderson 2005-05-05 13:28:42 UTC
Ok, the fix will be to make this simple change to zap_completion_queue():

        /* maintain jiffies in a polling fashion, based on rdtsc. */
-       {
+       if (netdump_mode) {
                 static unsigned long long prev_tick;
  
                 if (t1 - prev_tick >= jiffy_cycles) {
                         prev_tick += jiffy_cycles;
                         jiffies++;
                 }
         }

Note that there is no way the idle_timeout check above it can cause a problem,
because t0 can never be set until a netdump operation is set in motion.

netconsole.c has no business mucking around with jiffies during runtime.


Comment 16 Dave Anderson 2005-05-13 18:54:44 UTC
Should be -- the patch will be posted in conjunction with BZ #157439,
which I'm waiting for IBM to test.

Comment 18 Ernie Petrides 2005-06-09 03:25:09 UTC
A fix for this problem has just been committed to the RHEL3 U6
patch pool this evening (in kernel version 2.4.21-32.7.EL).


Comment 27 Red Hat Bugzilla 2005-09-28 15:01:29 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2005-663.html