Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 452169 - Crash in indexing code under heavy memberOf load with replication
Summary: Crash in indexing code under heavy memberOf load with replication
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: 389
Classification: Retired
Component: Database - Indexes/Searches
Version: 1.1.1
Hardware: All
OS: Linux
low
low
Target Milestone: ---
Assignee: Nathan Kinder
QA Contact: Chandrasekar Kannan
URL:
Whiteboard:
Depends On:
Blocks: FDS112
TreeView+ depends on / blocked
 
Reported: 2008-06-19 19:24 UTC by Nathan Kinder
Modified: 2015-01-04 23:32 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-08-27 20:39:29 UTC


Attachments (Terms of Use)
CVS Diffs (deleted)
2008-06-19 20:32 UTC, Nathan Kinder
no flags Details | Diff
tarball with scripts (deleted)
2008-08-19 22:48 UTC, Chandrasekar Kannan
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2008:0602 normal SHIPPED_LIVE Moderate: redhat-ds-base and redhat-ds-admin security and bug fix update 2008-08-27 20:38:30 UTC

Description Nathan Kinder 2008-06-19 19:24:45 UTC
I encountered a crash in the indexing code while doing some memberOf stress
testing.  I was using a local build with the current ldapserver code from CVS
and the current freeIPA memberOf plug-in code.

My test procedure consists of setting up 2 masters replicating to each other
with a fractional agreement that excludes the memberOf attribute.  I run some
load scripts against both masters at the same time which do various memberOf
operations on the same 3 entries.  These entries are creted, deleted, and
modified very, very often.  After some time under this load, one of the masters
crashed with this stack trace:

(gdb) bt
#0  0x00136962 in slapi_attr_value_cmp (a=0x0, v1=0x9dd50e8, v2=0xa002718)
    at ../threadlocal/ldap/servers/slapd/attr.c:526
#1  0x001a8a17 in slapi_value_compare (a=0x0, v1=0x9dd50e8, v2=0xa002718)
    at ../threadlocal/ldap/servers/slapd/value.c:486
#2  0x001a91b2 in valuearray_find (a=0x0, va=0x9dd5188, v=0x9dd50e8)
    at ../threadlocal/ldap/servers/slapd/valueset.c:364
#3  0x009a5bd3 in index_add_mods (be=0x99db7c8, mods=0x9e9d970, olde=0x98841fa0,
newe=0x9ca6180, 
    txn=0xa853a0f4) at ../threadlocal/ldap/servers/slapd/back-ldbm/index.c:657
#4  0x009ba3bc in ldbm_back_modify (pb=0xa00f468)
    at ../threadlocal/ldap/servers/slapd/back-ldbm/ldbm_modify.c:401
#5  0x0017138d in op_shared_modify (pb=0xa00f468, pw_change=0, old_pw=0x0)
    at ../threadlocal/ldap/servers/slapd/modify.c:789
#6  0x00170424 in do_modify (pb=0xa00f468) at
../threadlocal/ldap/servers/slapd/modify.c:341
#7  0x0805678d in connection_dispatch_operation (conn=0xadf4ccb8, op=0x9e84440,
pb=0xa00f468)
    at ../threadlocal/ldap/servers/slapd/connection.c:504
#8  0x08057dc2 in connection_threadmain () at
../threadlocal/ldap/servers/slapd/connection.c:2163
#9  0x024eaf51 in ?? () from /lib/libnspr4.so
#10 0x008e332f in start_thread () from /lib/libpthread.so.0
#11 0x0081e27e in clone () from /lib/libc.so.6

The code from frame 3 shows that we are not checking if the call to
slapi_entry_attr_find() was successful before attempting to use the Slapi_Attr
it returns upon success.  It seems that we are assuming that the old copy of the
entry will contain the attribute we are looking for.

Inspection of the mod we are processing and the old entry produces some
interesting findings.  The attribute that we are looking for is the "member"
attribute.  The operation is deleting a specific value from the entry.  The old
copy of the entry doesn't have a "member" attribute present, it does however
have the "member" attribute in it's deleted attributes list.  Another
interesting thing is that the old copy of the entry has a "nsds5ReplConflict"
attribute value present that indicates that there is a namingConflict.

Comment 1 Nathan Kinder 2008-06-19 20:13:59 UTC
I took a look at the new entry copy in the code where this crashes, and it's a
conflict entry (dn: nsuniqueid=<uuid>, <olddn>.  It seems that the indexing code
shouldn't assume that the attribute will be present since the replication URP
code may find conflicts.

The code where this fails in index_add_mods() is specific to a modify operation
where an attribute value to delete is specified.  We want to check if the value
being deleted is present in the entry for a subtype of the attribute whose value
is being deleted (for example, we're trying to delete "member: foo", but we want
to see if something like "member;blah: foo exists).  We do this check so we know
whether or not we should delete the equality index for this value.  Here is the
code I'm referring to:

 /* If the same value doesn't exist in a subtype, set
  * BE_INDEX_EQUALITY flag so the equality index is
  * removed.
  */
 slapi_entry_attr_find( olde->ep_entry, mods[i]->mod_type, &curr_attr);
 for (j = 0; mods_valueArray[j] != NULL; j++ ) {
     if ( valuearray_find(curr_attr, evals, mods_valueArray[j]) == -1 ) {
         if (!(flags & BE_INDEX_EQUALITY)) {
             flags |= BE_INDEX_EQUALITY;
         }
     }
 }

I see two things wrong with this code.  The first is that we need to check if
curr_attr is NULL before diving into the for loop and attempting to pass it to
valuearray_find.  If the attribute doesn't exist in the entry, we can assume
that we should be getting rid of the equality index.

The second thing that seems wrong is that we are performing this check against
the copy of the old entry (olde).  It seems to me that the proper thing to do
would be to perform the check against the new copy of the entry (newe).

Comment 2 Nathan Kinder 2008-06-19 20:32:39 UTC
Created attachment 309879 [details]
CVS Diffs

Comment 3 Noriko Hosoi 2008-06-19 20:38:54 UTC
Your fixes look good!

Comment 4 Nathan Kinder 2008-06-20 15:10:39 UTC
Checked into ldapserver (HEAD).  Thanks to Noriko for her review!

Checking in index.c;
/cvs/dirsec/ldapserver/ldap/servers/slapd/back-ldbm/index.c,v  <--  index.c
new revision: 1.14; previous revision: 1.13
done

Comment 5 Nathan Kinder 2008-07-09 17:04:03 UTC
Checked into Directory71RtmBranch.

Checking in slapd/back-ldbm/index.c;
/cvs/dirsec/ldapserver/ldap/servers/slapd/back-ldbm/index.c,v  <--  index.c
new revision: 1.5.2.3; previous revision: 1.5.2.2
done

Comment 6 Nathan Kinder 2008-07-10 22:47:49 UTC
Checked into Directory_Server_8_0_Branch.

Checking in ldap/servers/slapd/back-ldbm/index.c;
/cvs/dirsec/ldapserver/ldap/servers/slapd/back-ldbm/index.c,v  <--  index.c
new revision: 1.13.2.1; previous revision: 1.13
done

Comment 10 Jenny Galipeau 2008-08-18 18:43:05 UTC
With DS80/errata build I am unable to crash with Nathan's scripts against RHEL or Chandra's simple scripts against Sol9 and HPUX.

Comment 11 Chandrasekar Kannan 2008-08-19 22:48:04 UTC
Created attachment 314595 [details]
tarball with scripts

test tarball attached. has simple scripts that does add,modrdn,del on 2 master mmr.

Comment 12 Jenny Galipeau 2008-08-21 17:54:38 UTC
fix verified 7.1 and 8.0 RHEL4, RHEL5, SOLARIS and HPUX

Comment 13 Chandrasekar Kannan 2008-08-21 18:19:23 UTC
with ds71sp7, per the test attached in comment #11, I did see crashes on RHEL4,SunOS9. not on HPUX. That bug has been reported separately as bug 459433

I was not able to observe the crash reported in this bug with ds71sp7. 
Hence verified.

Comment 15 errata-xmlrpc 2008-08-27 20:39:29 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2008-0602.html


Note You need to log in before you can comment on or make changes to this bug.