Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 156872 - lt_high_locks setting
Summary: lt_high_locks setting
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: gulm
Version: 3
Hardware: All
OS: Linux
medium
high
Target Milestone: ---
Assignee: michael conrad tadpol tilstra
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-05-04 20:27 UTC by Wendy Cheng
Modified: 2009-04-16 20:24 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-05-25 16:41:14 UTC


Attachments (Terms of Use)
kludge around ccs's lack of find_css_long (deleted)
2005-05-09 15:38 UTC, michael conrad tadpol tilstra
no flags Details | Diff
celera files 4-1 (deleted)
2005-05-19 19:42 UTC, Wendy Cheng
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2005:466 normal SHIPPED_LIVE GFS bug fix update 2005-05-25 04:00:00 UTC

Description Wendy Cheng 2005-05-04 20:27:15 UTC
Description of problem:
The problem is reported as the lt_high_locks setting in cluster.ccs is getting
ignored. However, via few quick greps/finds, the issue (looks to me) seems to be
caused by the compiler casting in the bound_to_ulong() call since both ccs and
gulm all know about this tunable and have code to work with it.

Note that the customer is running GFS-6.0.2.12 on AMD64 and this is affecting
their production environment - when the maximum number of locks is reached,
performance is drastically reduced while the lock server requests nodes drop
their unncessary locks.  Adjusting this setting higher would be a work-around to
that problem if this bug can be fixed.

Version-Release number of selected component (if applicable): 
GFS-6.0.2.12

How reproducible:
Always

Steps to Reproduce:
1. 
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 2 Wendy Cheng 2005-05-04 20:34:48 UTC
Symptoms:

1) Numerous messages shown up in /var/log/messages on the lock_gulm MASTER:

Apr 28 10:51:51 icla2g lock_gulmd_LT000[11621]: Lock count is at 2127373 which
is more than the max 2097152. Sending Drop all req to clients
..............

2) performance is drastically reduced since lock server keeps requesting nodes
to drop their unncessary locks.

Comment 3 Wendy Cheng 2005-05-04 20:37:20 UTC
The culprit seems to be in this routine where val should have been (casted) set
to ulong ? 

unsigned long bound_to_ulong(int val, unsigned long min, unsigned long max)
{
   if( val < min ) return min;
   if( val > max ) return max;
   return val;
}



Comment 4 michael conrad tadpol tilstra 2005-05-05 13:12:44 UTC
its not there actually.  ccs doesn't have a function to find a long, only int or
float values.  So ccs is actually not reading the number correctly.

For fixing the code, I think libccs will need to add a find long function.


Another temporary thing the customer can do is decrease the rate at when the
drop lock req are sent.  This would be the lt_drop_req_rate, set this to the
number of seconds between each drop req. 

Comment 5 michael conrad tadpol tilstra 2005-05-05 13:57:13 UTC
ccs in 6.0 can only find int, float, or string.
so to get a long form ccs, either we need to have a string passed in and parse
it ourselves, or we need to change the ccs libs.


Comment 8 Wendy Cheng 2005-05-06 16:01:50 UTC
Second report on GFS-6.0.2-25-i686 2.4.21-27.0.2.ELhugemem kernel. The problem
causes failover to occur. 

Comment 9 michael conrad tadpol tilstra 2005-05-09 13:26:47 UTC
also, setting lt_high_locks to -1 will max it out.  (although if you dump the
config with either -C or SIGUSR1, it will show -1 instead of 4294967295.  One
more thing to fix. wheeeee.)

Comment 10 michael conrad tadpol tilstra 2005-05-09 15:38:13 UTC
Created attachment 114165 [details]
kludge around ccs's lack of find_css_long

This is a quick patch that can fix this bug.  It kludges around things by
letting users specify unsigned longs as a string.  ie lt_high_locks =
"4294967296" (which would be the maximum value)

This also changes a bunch of %d to %u is the config dump function.

Comment 11 michael conrad tadpol tilstra 2005-05-11 15:59:54 UTC
checked this into cvs.

Comment 12 michael conrad tadpol tilstra 2005-05-11 16:01:09 UTC
oh, you can still use numbers to set lt_hight_locks, just in case that wasn't clear.

Comment 13 michael conrad tadpol tilstra 2005-05-16 15:00:05 UTC
Without the patch, the following two settings effectively turn the HighWater
lock drop request off.

 lt_high_locks = -1
 lt_drop_req_rate = -1

This actually sets both values to the maximum of an unsigned integer.


Comment 21 Jay Turner 2005-05-25 16:41:14 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2005-466.html



Note You need to log in before you can comment on or make changes to this bug.