Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1512605 - Satellite backup procedure needs size estimations
Summary: Satellite backup procedure needs size estimations
Alias: None
Product: Red Hat Satellite 6
Classification: Red Hat
Component: Documentation
Version: 6.3.0
Hardware: x86_64
OS: Linux
medium vote
Target Milestone: Unspecified
Assignee: Sergei Petrosian
QA Contact: Stephen Wadeley
Depends On:
Blocks: 1122832 1533259
TreeView+ depends on / blocked
Reported: 2017-11-13 15:58 UTC by Lukas Zapletal
Modified: 2018-06-15 22:18 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2018-05-29 13:10:38 UTC

Attachments (Terms of Use)

System ID Priority Status Summary Last Updated
Red Hat Bugzilla 1583257 None NEW [RFE] Save space by compressing data on the fly 2019-04-10 01:49:36 UTC
Red Hat Bugzilla 1583534 None NEW [RFE] Evaluate if there is enough space available to store a backup 2019-04-10 01:49:36 UTC

Description Lukas Zapletal 2017-11-13 15:58:18 UTC
Document URL:

Ensure your backup location has enough disk space to contain a copy of the following directories

We need to explain this in more detail. Provide information how to calculate source sizes, total size of input data, estimated compression ratio and thus estimated target size.

Comment 2 2017-11-13 16:47:07 UTC
I think this needs to be built in the katello-backup tool itself and the backup should not start if the storage is not sufficient?

Comment 3 Lukas Zapletal 2017-11-14 07:58:41 UTC
I tend to prefer documentation, backup is a procedure where you want to be sure you are doing it right. Let's have this documented first and then talk about integrating it.

Comment 6 Lukas Zapletal 2017-11-29 08:45:00 UTC
Please elaborate this into satellite backup and clone documentation:

Backup size estimations

Before performing the backup, calculate requied free space for the backup target directory. To do that, count used space of the following folders: /var/lib/pgsql/data, /var/lib/mongodb and /var/lib/pulp.

WARNING: Running "du" utility on pulp content can take time, it is recommended to have this directory on separate volume. In that case, "df" can be used to quickly determine used space.

The backup tool also copies some configuration files, it is good idea to calculate total size of the following directories:

du -h /etc /root/ssl-build /var/lib/candlepin /opt/puppetlabs /var/www/html/pub

For simplicity, the command above calculates total size of /etc directory while the backup tool performs backup of about dozen of subdirectories from this path. Usually /etc is small enough, but if needed see the katello-backup script source to get the list of directories.

WARNING: Public www directory only contains configuration files and certificates by default, but some users tend to publish various non-related content like custom RPMs or ISO files, these files will end up in the backup.

Expected compression ratio

In the following table, you can find expected compression ratio for all data items which are in the backups.

| Type | Directory | Ratio | Example results |
| PostgreSQL database files | /var/lib/pgsql/data | 15-20 % | 105 GB -> 20 GB |
| MongoDB database files | /var/lib/mongodb | 10-15 % | 483 GB -> 53 GB |
| Pulp RPM files | /var/lib/pulp | - | (not compressed) |
| Configuration files | /etc /root-ssl/build ... | 5-10 % | 50 MB -> 4 MB |

Add 20 % of extra space to the total and that is the total backup estimated size.

NOTE: The backup tool uses gzip with default compression level of 5.

Pulp content (RPM files and repositories) backup can be skipped via --skip-pulp-content option and this content type is never compressed because RPM files have compression ratio higher than 95 %. Backup tool uses simple method using GNU tar utility, when using alternative tools (e.g. rsync) make sure SELinux labels are also carried over.

NOTE: Backup of pulp content can take a lot of time, it is recommended to use snapshot of underlying storage (or LVM) so maintenance window can be shortened and data can be safely copied while system is in operational state.

Comment 8 Lukas Zapletal 2017-12-13 08:14:28 UTC
Thanks, I will add to that:

These numbers were calculated from offline backup.

For online backup, extra space of total size of PostgreSQL databases and MongoDB database must be allocated because online backup copies data first out of database and then compresses it.

The estimation does not include incremental backups. This highly depends on how often new Red Hat content (RPMs) is added or how many sync/promote/publish operations are performed.

Comment 9 Peter Vreman 2017-12-13 13:00:31 UTC
Also add a remark that for a weekly full+incr you need 2x the full backup, because old full backup is only removed after the next full nbackup is successful.

At least garantuees the ability to be able to restore.

Maybe katello-backup can get an option to improve reduce this that it will delete the old full backup first before creating a new one.

Ofcourse this all relies also on an extenral backup tool that takes things off-site. Having it only on the sat6 server still you are at risk with disk corruption.

Note You need to log in before you can comment on or make changes to this bug.