Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 589660 - QMF: Job status stats incorrect on scheduler and submitter objects
Summary: QMF: Job status stats incorrect on scheduler and submitter objects
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor
Version: Development
Hardware: All
OS: Linux
medium
high
Target Milestone: 1.3
: ---
Assignee: Pete MacKinnon
QA Contact: MRG Quality Engineering
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-05-06 15:57 UTC by Pete MacKinnon
Modified: 2010-07-22 17:18 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-07-22 17:18:47 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Pete MacKinnon 2010-05-06 15:57:49 UTC
src/management/qmfprobe.py script that queries all plugin objects doesn't have expected counts from scheduler and submitter objects for job counts

Comment 1 Pete MacKinnon 2010-05-19 14:28:39 UTC
Seems to be a problem with:
a) idle counts - we can get -1 to start from the schedd-set attr after a py submit
b) submitter thinks there is still 1 job running after all have completed

Need to test this as restart then condor_submit instead of py submit

Comment 2 Pete MacKinnon 2010-06-07 22:11:53 UTC
These counts are actually coming from the UPDATE_SCHEDD_ADS and UPDATE_SUBMITTOR_ADS. QMF plugin just directly updates whatever it gets from the schedd. The counts are off by 1 at both ends...

~/personal-condor/log  $ grep -e "IdleJobs" -e "RunningJobs" SchedLog 
TotalIdleJobs = 0
TotalRunningJobs = 0
TotalIdleJobs = 3
TotalRunningJobs = 0
06/07 17:42:04 Changed attribute: RunningJobs = 0
06/07 17:42:04 Changed attribute: IdleJobs = 3
RunningJobs = 0
IdleJobs = 3
TotalIdleJobs = 1
TotalRunningJobs = 2
06/07 17:47:05 Changed attribute: RunningJobs = 2
06/07 17:47:05 Changed attribute: IdleJobs = 1
RunningJobs = 2
IdleJobs = 1
TotalIdleJobs = 0
TotalRunningJobs = 1
06/07 17:47:25 Changed attribute: RunningJobs = 1
06/07 17:47:25 Changed attribute: IdleJobs = 0
RunningJobs = 1
IdleJobs = 0
TotalIdleJobs = 0
TotalRunningJobs = 1
06/07 17:52:25 Changed attribute: RunningJobs = 1
06/07 17:52:25 Changed attribute: IdleJobs = 0
RunningJobs = 1
IdleJobs = 0
TotalIdleJobs = 0
TotalRunningJobs = 0
TotalIdleJobs = 0
TotalRunningJobs = 0
TotalIdleJobs = 0
TotalRunningJobs = 0

When we are really 2R/1I the update doesn't change from 3I. Then for a period of time we are 3C and it still thinks 1R.

Matt, thoughts?

Comment 3 Matthew Farrellee 2010-06-08 14:58:52 UTC
Thought -
 You didn't want long enough for an update that showed 0R,0I. Does condor_status -sched already report the 1R after all are complete (shown via condor_q | tail -n1)? The SCHEDD&SUBMITTER updates may be delayed when there are no jobs to report, which may be the wrong semantic, e.g. don't report on no change instead.

Comment 4 Pete MacKinnon 2010-06-08 21:58:08 UTC
Lowering the SCHEDD_INTERVAL from the 5 min default certainly improved this. However, we never see a final updated submitter ad (ie., 0 jobs running). The last one claims there is 1 job running and that is what we are left with.

Comment 5 Matthew Farrellee 2010-06-09 10:29:19 UTC
If that can be verified by looking at condor_status -submitter then it's a candidate for fixing. IIRC, submitter ads are generated from jobs in the queue. If there are no jobs for a submitter (all completed) I could imagine the Schedd just wouldn't know to send a final update (an invalidate!).

Comment 6 Pete MacKinnon 2010-06-17 16:22:25 UTC
Fixed for incorrect idle job stats on the scheduler and submitter (needed to augment the inbound classad a bit). Now we need a solution for the missing UPDATE_SUBMITTER_AD to update the submitter objects. Also I see this:

~/personal-condor/log  $ condor_status -submitter

Name                 Machine      Running IdleJobs HeldJobs

nobody@redhat.com    localhost.         0        0 [???????]
                           RunningJobs           IdleJobs           HeldJobs

   nobody@redhat.com                 0                  0                  0

               Total                 0                  0                  0

                    (Omitted 1 malformed ads in computed attribute totals)

Comment 7 Pete MacKinnon 2010-06-18 02:15:51 UTC
We were missing a plugin update when we walk the owner list and there are no jobs. The collectors were getting this submitter update already - just needed to do the same for the schedd plugins also.

FH 29c3f20c2ea


Note You need to log in before you can comment on or make changes to this bug.