|Summary:||GFS2: statfs code clean-up|
|Product:||Red Hat Enterprise Linux 5||Reporter:||Wendy Cheng <wcheng>|
|Component:||gfs2-utils||Assignee:||Wendy Cheng <wcheng>|
|Status:||CLOSED WORKSFORME||QA Contact:||GFS Bugs <gfs-bugs>|
|Version:||5.1||CC:||kanderso, rkenna, swhiteho|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2007-04-13 18:01:56 UTC||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Cloudforms Team:||---||Target Upstream Version:|
|Bug Depends On:|
|Bug Blocks:||204760, 208514|
Description Wendy Cheng 2007-03-06 22:14:25 UTC
Description of problem: In the middle of porting GFS2 fast statfs implementation back into GFS1. While working thru the code, there are obvious issues found in the existing GFS2 code. Open this bugzilla to track them so it won't get lost. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Comment 1 Wendy Cheng 2007-03-07 04:43:31 UTC
* GFS2 stores statfs statistics info in two files. Upon node crash, the data left behind in page cache could get lost (?). Can't find the logic where these data can be recovered. Need to look into this. * There is a "slow" path statfs call (identical to GFS1 behavior) but doesn't seem to have external interfaces to invoke it (?). With the above recovery issue, we may need this slow path to re-calculate the disk utilization data and correct the master file accordingly. * The fields in struct gfs_statfs_change_host is defined as __u64 (unsigned long long) but is used as s64 (signed long long) thru out the code. Not sure how compiler handles this but some of the math looks very wrong to me (gfs2_statfs_i(), for example). * There are probably more - will update the bugzilla when I find them.
Comment 2 Wendy Cheng 2007-03-07 04:58:47 UTC
* Some of the fields in struct kstatfs (buf) is not properly set in gfs2_statfs().
Comment 3 Wendy Cheng 2007-03-07 21:33:04 UTC
This is not a problem (maybe it is ?) but can well describe the difficult aspect of this approach. The statfs per node delta is synced to master file every 30 (default) seconds. This implies if anyone floods huge amount of data into GFS2 partitions quickly from one node, the "df" output from other node will be far-off from the actual value. We need to improve this a little bit (say if the delta is above 10%, it should get unconditionally flushed).
Comment 4 Steve Whitehouse 2007-03-12 15:21:49 UTC
I must admit that I thought that it was using the same algorithm as the quota system of increasing the frequency of syncing as we near the limit. Have you come to any conclusions about what needs to be done in GFS2 yet? Probably the reason that statfs_change_host has a __u64 is that it was copied directly from the statfs_change on-disk structure which would be a __be64 and thus sign isn't really taken into account. So long as all the manipulations are done using the s64, then there should be no problem. It is a long term goal to rid ourselves of the _host structures, and I think that should be trivial in this case just by passing the struct kstatfs to gfs2_statfs_slow and gfs2_statfs_i instead. I presume the two fields you are referring to in comment #2 are f_fsid and f_frsize ?
Comment 5 Wendy Cheng 2007-03-12 15:57:18 UTC
Not sure how RHEL5 compiler handles this - but on RHEL4, I got funny data. When blocks count should be subtracted, it immediately becomes a huge number (apparently the compiler does an "add-using-unsigned", instead of two signed long long math). However, it is possible that I have header file issues though. And yes, the f_frsize, etc are not initialized. These are not critical issues - so don't worry about them. Will fix them when I'm done with RHEl4 port (and using the same test suites to unit test GFS2).