Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.

Bug 698394

Summary: GFS2: rlist code should be removed once it is no longer used
Product: [Fedora] Fedora Reporter: Steve Whitehouse <swhiteho>
Component: kernelAssignee: Steve Whitehouse <swhiteho>
Status: NEW --- QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: low Docs Contact:
Priority: medium    
Version: rawhideCC: adas, anprice, bmarzins, collura, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, rpeterso
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On: 236102    
Bug Blocks: 790188    

Description Steve Whitehouse 2011-04-20 19:31:54 UTC
Ideally we'd like to alloc GFP_NOFAIL completely, but its not obvious how. We should at least allow lists of pages to be used here rather than a contiguous allocation.

------------[ cut here ]------------                                            
WARNING: at mm/page_alloc.c:1331 get_page_from_freelist+0x8a1/0x910()           
Hardware name: PowerEdge R710                                                   
Modules linked in: gfs2 ebtable_nat ebtables x_tables bridge stp dlm af_packet ]
Pid: 3632, comm: bonnie++ Not tainted 2.6.39-rc4+ #214                          
Call Trace:                                                                     
 [<ffffffff8108fada>] warn_slowpath_common+0x7a/0xb0                            
 [<ffffffff8108fb25>] warn_slowpath_null+0x15/0x20                              
 [<ffffffff8112cd51>] get_page_from_freelist+0x8a1/0x910                        
 [<ffffffff81682af9>] ? sub_preempt_count+0xa9/0xe0                             
 [<ffffffff8112d0de>] __alloc_pages_nodemask+0x13e/0x980                        
 [<ffffffff8116d8ea>] kmem_getpages+0x5a/0x170                                  
 [<ffffffff8116f4b7>] cache_grow+0x2e7/0x310                                    
 [<ffffffff8116f731>] cache_alloc_refill+0x251/0x290                            
 [<ffffffff81170569>] __kmalloc+0x239/0x280                                     
 [<ffffffffa0229d73>] ? gfs2_rlist_alloc+0x23/0x80 [gfs2]                       
 [<ffffffffa0229d73>] gfs2_rlist_alloc+0x23/0x80 [gfs2]                         
 [<ffffffffa0203e7a>] do_strip+0x20a/0x490 [gfs2]                               
 [<ffffffffa0218ca0>] ? gfs2_meta_read+0xd0/0x160 [gfs2]                        
 [<ffffffff810b253a>] ? wake_up_bit+0x2a/0x40                                   
 [<ffffffffa02041c7>] recursive_scan.clone.23+0xc7/0x1d0 [gfs2]                 
 [<ffffffffa0204223>] recursive_scan.clone.23+0x123/0x1d0 [gfs2]                
 [<ffffffff811707ad>] ? kmem_cache_alloc_trace+0x1fd/0x230                      
 [<ffffffffa02043d8>] trunc_dealloc+0x108/0x150 [gfs2]                          
 [<ffffffff810b2590>] ? autoremove_wake_function+0x40/0x40                      
 [<ffffffffa0210652>] ? gfs2_glock_wait+0x42/0x50 [gfs2]                        
 [<ffffffffa0212070>] ? gfs2_glock_nq+0x320/0x480 [gfs2]                        
 [<ffffffff810b2590>] ? autoremove_wake_function+0x40/0x40                      
 [<ffffffffa0205feb>] gfs2_file_dealloc+0xb/0x10 [gfs2]                         
 [<ffffffffa022a85d>] gfs2_evict_inode+0x22d/0x510 [gfs2]                       
 [<ffffffffa022a715>] ? gfs2_evict_inode+0xe5/0x510 [gfs2]                      
 [<ffffffff8119e6e1>] evict+0x81/0x180                                          
 [<ffffffff8119e944>] iput+0x104/0x1f0                                          
 [<ffffffff8119302c>] do_unlinkat+0x10c/0x1b0                                   
 [<ffffffff810ee432>] ? audit_syscall_entry+0x1c2/0x1f0                         
 [<ffffffff813d1dae>] ? trace_hardirqs_on_thunk+0x3a/0x3f                       
 [<ffffffff81194651>] sys_unlink+0x11/0x20                                      
 [<ffffffff81686612>] system_call_fastpath+0x16/0x1b                            
---[ end trace 2035b90cd25b602e ]---                      

page_alloc.c line 1331 reads:

                if (unlikely(gfp_flags & __GFP_NOFAIL)) {
                        /*
                         * __GFP_NOFAIL is not to be used in new code.
                         *
                         * All __GFP_NOFAIL callers should be fixed so that they
                         * properly detect and handle allocation failures.
                         *
                         * We most definitely don't want callers attempting to
                         * allocate greater than order-1 page units with
                         * __GFP_NOFAIL.
                         */
                        WARN_ON_ONCE(order > 1);
                }

Comment 1 Steve Whitehouse 2011-09-22 10:24:40 UTC
One solution to this is to get rid of the rlist code entirely. This should be possible if we can remove the two users of this code. These are:

 o Deallocation of dir leaf blocks
 o Deallocation of indirect block tree
 o Deallocation of indirect xattr tree blocks

It should be possible to redesign the code to work in the opposite way to the allocation code in order to only lock a single rgrp at a time and thus resolve the locking order issue.

Comment 2 Fedora End Of Life 2013-04-03 18:22:41 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19

Comment 3 Josh Boyer 2013-09-18 20:28:24 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs.

Fedora 19 has now been rebased to 3.11.1-200.fc19.  Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you experience different issues, please open a new bug report for those.

Comment 4 Justin M. Forbes 2014-01-03 22:06:51 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs.

Fedora 19 has now been rebased to 3.12.6-200.fc19.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 20, and are still experiencing this issue, please change the version to Fedora 20.

If you experience different issues, please open a new bug report for those.

Comment 5 Steve Whitehouse 2017-07-04 10:42:22 UTC
Indirect block deallocations now no longer use the rlist code. That leaves two callers: gfs2_dir_exhash_dealloc() and ea_dealloc_indirect()

The direcotry code can be easily updated to be more efficient, although it is very likely to be dealing with many scattered blocks. The algorithm should be something along the lines of:

 1. Find first leaf block to deallocate
 2. Lock that rgrp
 3. See how many other leaf blocks are in the same rgrp, by (a) scanning down the chain until we hit a block in another rgrp and (b) repeating that for every hash chain
 4. Deallocating the leaf blocks in question
 5. Updating the hash chain headers
 6. Repeating until all hash chains are empty

That way each transactions (one per rgrp) ensures that everything remains consistent while the hash chains are deallocated. The iteration means that we no longer need to use the rlist code.

A similar approach could be used for the EA blocks too, and then the rlist code can be removed entirely.