Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1064563 - Separate metrics compression and OOB from actual data purge tasks
Summary: Separate metrics compression and OOB from actual data purge tasks
Keywords:
Status: NEW
Alias: None
Product: RHQ Project
Classification: Other
Component: Core Server
Version: 4.9
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified vote
Target Milestone: ---
: ---
Assignee: RHQ Project Maintainer
QA Contact: Mike Foley
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-02-12 20:30 UTC by Elias Ross
Modified: 2014-02-12 20:59 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:


Attachments (Terms of Use)

Description Elias Ross 2014-02-12 20:30:12 UTC
Description of problem:

On my large system, metrics compression can take 45 minutes (depending on Cassandra which can run compression), then the rest of the purge process can take up more than 15 minutes and then RHQ gets behind on processing.

It seems that some of the purge processes could be run daily as opposed to hourly, as most of them run full table queries. They are also touching data that is quite old, so every hour doesn't seem required.

My suggestion is for the following:

        LOG.info("Data Purge Job STARTING");

 (1)           Iterable<AggregateNumericMetric> oneHourAggregates = compressMeasurementData();
 (2)        purgeEverything(systemConfig);
 (2)        performDatabaseMaintenance(LookupUtil.getSystemManager(), systemConfig);
 (1)        calculateAutoBaselines(LookupUtil.getMeasurementBaselineManager());
 (1)        calculateOOBs(oneHourAggregates);

The tasks (1) would be run in a different Quartz job, then tasks in (2) could be run just daily.

See also Bug 1010418


Version-Release number of selected component (if applicable): 4.9

Comment 1 Elias Ross 2014-02-12 20:59:04 UTC
As an example:

14:06:54,103 INFO  [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] (RHQScheduler_Worker-4) Traits data purged [8590] - completed in [555740]ms

About 10 minutes

14:06:54,104 INFO  [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] (RHQScheduler_Worker-4) Availability data purge starting at Wed Feb 12 14:06:54 UTC 2014
14:06:54,104 INFO  [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] (RHQScheduler_Worker-4) Purging availablities that are older than Tue Feb 12 14:06:54 UTC 2013
14:10:35,307 INFO  [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] (RHQScheduler_Worker-4) Availability data purged [519] - completed in [221203]ms

About 3:30 minutes

It could be that these numbers could be also improved using better queries, but I'm not sure.


Note You need to log in before you can comment on or make changes to this bug.