Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 215180 - rpm segfaults on an attempt to rebuild database
Summary: rpm segfaults on an attempt to rebuild database
Keywords:
Status: CLOSED DUPLICATE of bug 213963
Alias: None
Product: Fedora
Classification: Fedora
Component: rpm
Version: 6
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Paul Nasrat
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-11-12 00:35 UTC by Michal Jaegermann
Modified: 2007-11-30 22:11 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-07-17 12:44:25 UTC


Attachments (Terms of Use)

Description Michal Jaegermann 2006-11-12 00:35:58 UTC
Description of problem:

After 'rpm --rebuilddb --verbose' I got a segfault an a core (I just
turn cores on).  With data from rpm-debuginfo gdb produces the following

Core was generated by `/usr/lib/rpm/rpmd --rebuilddb --verbose'.
Program terminated with signal 11, Segmentation fault.
#0  __memp_fget_rpmdb (dbmfp=0x9c55788, pgnoaddr=0xbfefbbac, flags=0,
    addrp=0xbfefbb88) at ../db/dist/../mp/mp_fget.c:190
190     ../db/dist/../mp/mp_fget.c: No such file or directory.
        in ../db/dist/../mp/mp_fget.c
(gdb) where
#0  __memp_fget_rpmdb (dbmfp=0x9c55788, pgnoaddr=0xbfefbbac, flags=0,
    addrp=0xbfefbb88) at ../db/dist/../mp/mp_fget.c:190
#1  0x003c8510 in __db_goff_rpmdb (dbp=0x9c55488, dbt=0x9c5899c, tlen=12052,
    pgno=6916, bpp=0x9c55914, bpsz=0x9c5591c)
    at ../db/dist/../db/db_overflow.c:147
#2  0x003cfe4d in __db_ret_rpmdb (dbp=0x9c55488, h=0xb7c935c4, indx=11,
    dbt=0x9c5899c, memp=0x9c55914, memsize=0x9c5591c)
    at ../db/dist/../db/db_ret.c:50
#3  0x003bb115 in __db_c_get_rpmdb (dbc_arg=0x9c558c8, key=0x9c58984,
    data=0x9c5899c, flags=<value optimized out>)
    at ../db/dist/../db/db_cam.c:778
#4  0x003c15f6 in __db_c_get_pp_rpmdb (dbc=0x9c558c8, key=0x9c58984,
    data=0x9c5899c, flags=18) at ../db/dist/../db/db_iface.c:1741
#5  0x00351706 in db3cget (dbi=0x9c54f30, dbcursor=0x5704db86, key=0x9c58984,
    data=0x9c5899c, flags=1459936134) at db3.c:612
#6  0x0034d333 in rpmdbNextIterator (mi=0x9c58968) at rpmdb.h:591
#7  0x0034ee04 in rpmdbRebuild (prefix=0x9c41f30 "/", ts=0x9c53cd8,
    hdrchk=0x160830 <headerCheck>) at rpmdb.c:3854
#8  0x00184af6 in rpmtsRebuildDB (ts=0x9c53cd8) at rpmts.c:209
#9  0x08049822 in main (argc=3, argv=Cannot access memory at address 0x5704db8a
) at ./rpmqv.c:633
#10 0x00a3cf2c in __libc_start_main () from /lib/libc.so.6
#11 0x080490c1 in _start ()
(gdb)

Locations like "../db/dist/../mp/mp_fget.c:190" are somewhat nasty
to look at but it is possible to find the file outside of gdb.
The code in question looks like this:

	/* Search the hash chain for the page. */
retry:	st_hsearch = 0;
	MUTEX_LOCK(dbenv, &hp->hash_mutex);
	for (bhp = SH_TAILQ_FIRST(&hp->hash_bucket, __bh);
	    bhp != NULL; bhp = SH_TAILQ_NEXT(bhp, hq, __bh)) {
		++st_hsearch;
-- bomb! -->	if (bhp->pgno != *pgnoaddr || bhp->mf_offset != mf_offset)
			continue;

and gdb prints

gdb) p bhp
$1 = (BH *) 0x5704db86
(gdb) p *bhp
Cannot access memory at address 0x5704db86
(gdb) p pgnoaddr
$2 = (db_pgno_t *) 0xbfefbbac
(gdb) p bhp->pgno
Cannot access memory at address 0x5704dbfa

Trying to access memory which was already freed?

Version-Release number of selected component (if applicable):
rpm-4.4.2-32

How reproducible:
the next attempt of --rebuilddb succeeded but I tried that because
I got a segfault from yum during an installation and maybe this
was really an rpm fault?

Comment 1 Jeff Johnson 2006-11-12 04:57:06 UTC
The segfault is likely the result of bad data, which is likely corrected by --rebuilddb.



Comment 2 Michal Jaegermann 2006-11-12 05:15:27 UTC
> The segfault is likely the result of bad data ...
These "bad data" were produced by nothing else but rpm and
an attempt to correct that resulted in a segfault.  Luckily
the condition did not persist.

Comment 3 Michal Jaegermann 2006-11-12 05:21:22 UTC
BTW - segfault in 'yum update' mentioned in the report is now
bug 215184.  Not much information there, I am afraid, beyond nasty
result. It happened when all new packages were already installed
and now yum was supposed to do all cleanups; so it left me with
a pile of duplicates.

Comment 4 Jeff Johnson 2006-11-12 07:24:49 UTC
rpm (and Berkeley DB) relies on shared posix mutexes for locking to insure data integrity.

There's a rash of recent rpmdb problems, dunno the cause .... blame rpm which has not changed for 
over a year, certainly not the rpmdb code. YMMV.

A --dupes option can be added to rpm with this line in /etc/popt:

    rpm     alias --dupes   --qf '%|SOURCERPM?{%{name}.%{arch}}:{%|ARCH?{%{name}}:{%{name}-%
{version}}|}|
\n' --pipe "sort | uniq -d" \
        --POPTdesc=$"list duplicated packages"

Invoke as rpm -qa --dupes.

Comment 5 Jeff Johnson 2006-11-12 07:34:39 UTC
BTW, doing
    rm -f /var/lib/rpm/__db*
before --rebuilddb --verbose would have eliminated a corrupt cache.

Comment 6 Ian Collier 2006-11-18 16:12:17 UTC
In common with a few users, it seems, I'm finding rpm and yum very unstable
under FC6.  Just now:

# rpm -ivh /home/imc/rpmbuild/RPMS/i386/xli-1.17.0-6.fc6.i386.rpm
Preparing...                Segmentation fault (core dumped)

But where's my core file?

# ls -l core
ls: core: No such file or directory
# ulimit -c
unlimited


Comment 7 Michal Jaegermann 2006-11-18 17:14:53 UTC
> But where's my core file?

If you have 'ulimit -c' set to 'unlimited' then your core file will
really have a name like core.<process_id> so try 'ls -l core*'.
Also a process which dumped core may be a child with a different
context and a core is somewhere else (maybe /?).  To look for all
possible core files try, with a current updatedb, the following

   locate -r '/core\.[1-9]'

This may have a few wrong hits but not too many.

Comment 8 Jeff Johnson 2006-11-18 18:22:13 UTC
If you give me a ptr to a core using -ivv and the packages involved, I'll diagnose
the segfault.

Be forewarned: almost all segfaults in rpm are caused by bad data.

Comment 9 Jeff Johnson 2006-12-03 18:35:15 UTC
Segafualts and loss of data are likely due to removing an rpmdb environment
without correcting other problems in the rpmdb.

FYI: Most rpmdb "hangs" are now definitely fixed by purging stale read locks when opening
a database environment in rpm-4.4.8-0.4. There's more todo, but I'm quite sure that a
large class of problems with symptoms of "hang" are now corrected.

Detecting damaged by verifying when needed is well automated in rpm-4.4.8-0.4. Automatically 
correcting all possible damage is going to take more work, but a large class of problems is likely
already fixed in rpm-4.4.8-0.8 as well.

UPSTREAM

Comment 10 Panu Matilainen 2007-07-17 12:44:25 UTC

*** This bug has been marked as a duplicate of 213963 ***


Note You need to log in before you can comment on or make changes to this bug.