Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 452629 - getpwuid_r sometimes hangs
Summary: getpwuid_r sometimes hangs
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: 8
Hardware: All
OS: Linux
low
low
Target Milestone: ---
Assignee: Jakub Jelinek
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-06-24 06:43 UTC by Ian Kent
Modified: 2008-07-07 05:32 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-07-07 05:32:12 UTC


Attachments (Terms of Use)
gdb stace trace of hung process (deleted)
2008-06-24 06:43 UTC, Ian Kent
no flags Details

Description Ian Kent 2008-06-24 06:43:43 UTC
Description of problem:
getpwuid_r sometimes hangs when called.
It is likely that the problem results when two processes
concurrently call the function but I have been unable to
get clear evidence of this.

Version-Release number of selected component (if applicable):
glibc-2.7-2.i686

How reproducible:
Occasionally when frequently called by multiple pthread threads.

Steps to Reproduce:
This is a bit hard as it happens during testing of autofs
when using the Connectathon suite.

I'll need to try to develop something simpler to replicate
the problem.

Comment 1 Ian Kent 2008-06-24 06:43:43 UTC
Created attachment 310108 [details]
gdb stace trace of hung process

Comment 2 Ian Kent 2008-06-24 06:46:00 UTC
Also I found that adding a mutex to bracket the get*_r calls
I make causes the problem go away.

Comment 3 Ian Kent 2008-06-24 13:56:04 UTC
(In reply to comment #2)
> Also I found that adding a mutex to bracket the get*_r calls
> I make causes the problem go away.

Oh .. now I've seen this with the mutex in the autofs code
as well.

Is this something I'm doing wrong or is there something amis
with the pthread locking code?

Ian


Comment 4 Ulrich Drepper 2008-07-06 15:08:55 UTC
There are no known problems with the locking code and I rate that there is one a
extremely unlikely.  The code is used by hundreds of thousands of programs.

The libc-internal locking can potentially be thrown off if the program doesn't
know that more than one thread is running.  This can really only happen if
you're using clone() directly (as opposed to pthread_create) or if the
appropriate memory location storing that information is corrupted.  The effect
would be that instead of using atomic operations for various locks non-atomic
operations are used.

This does not affect the pthread_* functions themselves, though.  If you see
problems with them as well it is something else.

Comment 5 Ian Kent 2008-07-07 05:32:12 UTC
(In reply to comment #4)
> There are no known problems with the locking code and I rate that there is one a
> extremely unlikely.  The code is used by hundreds of thousands of programs.

Yes, I agree.

> 
> The libc-internal locking can potentially be thrown off if the program doesn't
> know that more than one thread is running.  This can really only happen if
> you're using clone() directly (as opposed to pthread_create) or if the
> appropriate memory location storing that information is corrupted.  The effect
> would be that instead of using atomic operations for various locks non-atomic
> operations are used.
> 
> This does not affect the pthread_* functions themselves, though.  If you see
> problems with them as well it is something else.

I'm only using pthread_* functions and I do see problems
occasionally but I'm unable to duplicate them in any
reasonably simple way.

Oddly enough, after a yum update this particular problem
seems to have gone away. But that could also be due to
corrections that I've made.

Ian


Note You need to log in before you can comment on or make changes to this bug.