Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1515185 - blkdiscard takes unreasonably long
Summary: blkdiscard takes unreasonably long
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kmod-kvdo
Version: 7.5
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Bryan Gurney
QA Contact: vdo-qe
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-20 11:05 UTC by Marius Vollmer
Modified: 2017-11-22 13:09 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1516041 (view as bug list)
Environment:
Last Closed: 2017-11-21 21:57:59 UTC


Attachments (Terms of Use)

Description Marius Vollmer 2017-11-20 11:05:46 UTC
Description of problem:

blkdiscard of a VDO volume takes unreasonably long.  Also, the time grows with the logical size, and not with the amount of actual data.

Version-Release number of selected component (if applicable):
kmod-kvdo-6.1.0.0-5.el7.x86_64

How reproducible:
Always

Steps to Reproduce:
# time vdo create --name vdo0 --device /dev/sda2 --vdoLogicalSize 128G
real	0m1.519s
user	0m0.101s
sys	0m0.302s

# time blkdiscard /dev/mapper/vdo0 
real	1m3.015s
user	0m0.001s
sys	0m37.569s

# vdo remove --name vdo0
# time vdo create --name vdo0 --device /dev/sda2 --vdoLogicalSize 256G
real	0m1.999s
user	0m0.110s
sys	0m0.291s

# time blkdiscard /dev/mapper/vdo0
real	2m5.612s
user	0m0.000s
sys	1m14.823s


Expected results:
bkdiscard is at least as fast as "vdo create".


Additional info:
The actual use case I am concerned about is "mkfs".  Some filesystems do the equivalent of blkdiscard in their mkfs, and that is usually not a problem.  With VDO it is, and everyone has to take heed of that, for example Cockpit.

Comment 2 Louis Imershein 2017-11-20 18:27:09 UTC
Discard performance is certainly a known issue with VDO and it won't be resolved in the RHEL 7.5 timeframe due to architectural constraints. I will look at adding this to confluence for RHEL 8, thought it likely will not make it in until RHEL 8.1 unless we have some new revelations from engineering. 

In the meantime, documentation and user interface should be utilized as much as possible to mitigate the issue because, even with improvements, we have to consider that blkdiscard will never be as fast as a vdo create operation.  This is because to function correctly, VDO has to go through and update the meta data for every 4K logical block as requested during a blockdiscard operation. By comparison, the vdo create operation is simply laying down a fixed fractional amount of metadata proportional to the physical storage available. The correct goal is for vdo discards to be no slower than the operations sent down to the flash translation layer of an SSD.

So for example for a comparably sized (logical) device:

blkdiscard /dev/mapper/vdo1

should take no more time than:

blkdiscard /dev/ssd1

Comment 3 Bryan Gurney 2017-11-21 21:57:59 UTC
We will be looking to re-evaluate this issue at a later date.

Comment 5 Marius Vollmer 2017-11-22 08:24:14 UTC
> This is because to function correctly, VDO has to go through and update the
> meta data for every 4K logical block as requested during a blockdiscard
> operation. By comparison, the vdo create operation is simply laying down a 
> fixed fractional amount of metadata proportional to the physical storage
> available.

I am not convinced by this.  VDO can also go through the meta data and update it for every recorded area covered by the discard request.  If the meta data is small (as it is after a create), this should be a huge win.  Also, a discard request that covers all of the recorded areas (such as a discard request for the whole logical volume), can be handled as a special case and reinitialize the meta data to represent a completely empty logical volume (just like the create operation).

I had a superficial look at the kvdo source and I now start to understand what VDO is actually doing a bit better, and I can appreciate that improving discard performance is not easy.  For example, VDO asks the kernel to split discard requests into 4K chunks and then handles them individually.  This makes it hard to apply the optimizations outlined in the previous paragraph.


I know I am being a pain in the ass here, and I apologize.  But I hope you appreciate that keeping a mysterious "[ ] Don't be stupid" checkbox out of
the UI is worth some effort.  I'd rather spend three weeks fiddling with kernel code than 2 hours adding a checkbox. :-)


Note You need to log in before you can comment on or make changes to this bug.