Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1516394 - glibc: sigwait can fail with EINTR
Summary: glibc: sigwait can fail with EINTR
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Florian Weimer
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-22 14:30 UTC by Yanko Kaneti
Modified: 2017-11-23 13:24 UTC (History)
11 users (show)

Fixed In Version: glibc-2.26.9000-28.fc28
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-11-23 13:24:45 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Sourceware 22478 None None None 2019-02-12 15:49:28 UTC

Description Yanko Kaneti 2017-11-22 14:30:42 UTC
Description of problem:
vlc's http access input module no longer works after update to glibc-2.26.9000-26.fc28


Version-Release number of selected component (if applicable):
glibc-2.26.9000-26.fc28

How reproducible:
Always.  Works with glibc-2.26.9000-25.fc28, doesn't work with glibc-2.26.9000-26.fc28
Both prebuilt vlc rpms and handbuilt vlc git.

Steps to Reproduce:
vlc http://download.blender.org/peach/bigbuckbunny_movies/BigBuckBunny_320x180.mp4

Comment 1 Florian Weimer 2017-11-22 14:34:41 UTC
I don't think Fedora ships a vlc package.  Whose build are you using?

Would you please clarify what exactly breaks?  Does vlc crash?  Does it report a name resolution failure?

Does the issue reproduce with wget or curl and the same URL?

Comment 2 Yanko Kaneti 2017-11-22 14:47:48 UTC
(In reply to Florian Weimer from comment #1)
> I don't think Fedora ships a vlc package.  Whose build are you using?

As I said its probably not related to the specific build. Happens both with rpmfusion packages and hadnbuilt vlc git master.


> Would you please clarify what exactly breaks?  Does vlc crash?  Does it
> report a name resolution failure?

The http connection fails and no video is played (you can get that detail with -vvvvv) the program is otherwise still working. Other networking accesses eg udp://@....  work ok.


> Does the issue reproduce with wget or curl and the same URL?

The file is there and is http gettable.

Comment 3 Yanko Kaneti 2017-11-22 14:59:28 UTC
Just tested an just-IP http host (can't share here) with the same result, so maybe not resolver related.   vlc 'udp://@192.168.2.30:5000' for example works..

Comment 4 Florian Weimer 2017-11-22 15:03:19 UTC
(In reply to Yanko Kaneti from comment #3)
> Just tested an just-IP http host (can't share here) with the same result, so
> maybe not resolver related.   vlc 'udp://@192.168.2.30:5000' for example
> works..

Thanks, that is helpful.  The resolver changes happened in -27, not -26, so I was wondering if the reported glibc version was off.  I'll try to reproduce this.

Comment 5 Carlos O'Donell 2017-11-22 16:21:56 UTC
(In reply to Florian Weimer from comment #4)
> (In reply to Yanko Kaneti from comment #3)
> > Just tested an just-IP http host (can't share here) with the same result, so
> > maybe not resolver related.   vlc 'udp://@192.168.2.30:5000' for example
> > works..
> 
> Thanks, that is helpful.  The resolver changes happened in -27, not -26, so
> I was wondering if the reported glibc version was off.  I'll try to
> reproduce this.

Would an `strace -ff -ttt -o vlc.log` help identify any issues on the users side configuration?

Comment 6 Florian Weimer 2017-11-22 16:29:56 UTC
strace shows that the HTTP module bails out on EINPROGRESS without ever calling poll.

It appears that vlc_killed returns true, and indeed, something causes vlc_interrupt_kill to be called.  I don't have debug symbols yet, so I'm a bit stuck here.  If anyone can reproduce this (calling vlc with “-I dummy” helps to make GDB happy), I would like to see a backtrace with symbols of the vlc_interrupt_kill call.

(GDB does not show any signals delivered to the process.)

Comment 7 Yanko Kaneti 2017-11-22 16:52:03 UTC
Maybe this? breakpoint on vlc_interrupt_kill. with some vlc debug input in front
......
.....
[00000000006ef360] dummy interface: using the dummy interface module...
[00000000006ef360] main interface debug: using interface module "dummy"
[00000000006b6030] main playlist debug: processing request item: null, node: Playlist, skip: 0
[00000000006b6030] main playlist debug: rebuilding array of current - root Playlist
[00000000006b6030] main playlist debug: rebuild done - 1 items, index -1
[00000000006b6030] main playlist debug: starting playback of new item
[00000000006b6030] main playlist debug: resyncing on BigBuckBunny_320x180.mp4
[00000000006b6030] main playlist debug: BigBuckBunny_320x180.mp4 is at 0
[00000000006b6030] main playlist debug: creating new input thread
[00007fffd0000c40] main input debug: Creating an input for 'BigBuckBunny_320x180.mp4'
[New Thread 0x7fffeac63700 (LWP 32599)]
[00000000006b6030] main playlist debug: requesting art for new input thread
[00007fffd0000c40] main input debug: using timeshift granularity of 50 MiB
[00007fffd0000c40] [New Thread 0x7fffeab62700 (LWP 32600)]
main input debug: using default timeshift path
[00007fffd0000c40] main input debug: `http://download.blender.org/peach/bigbuckbunny_movies/BigBuckBunny_320x180.mp4' gives access `http' demux `any' path `download.blender.org/peach/bigbuckbunny_movies/BigBuckBunny_320x180.mp4'
[00007fffc4000f60] main input source debug: creating demux: access='http' demux='any' location='download.blender.org/peach/bigbuckbunny_movies/BigBuckBunny_320x180.mp4' file='(null)'
[00007fffc4001120] main demux debug: looking for access_demux module matching "http": 21 candidates
[00007fffc4001120] main demux debug: no access_demux modules matched
[00007fffc4001610] main stream debug: creating access: http://download.blender.org/peach/bigbuckbunny_movies/BigBuckBunny_320x180.mp4
[00007fffc4001610] main stream debug: looking for access module matching "http": 26 candidates
[New Thread 0x7fffeaa61700 (LWP 32601)]
[00007fffbc000be0] main meta fetcher debug: looking for meta fetcher module matching "any": 1 candidates
[00007fffbc000be0] lua meta fetcher debug: Trying Lua scripts in /home/yaneti/.local/share/vlc/lua/meta/fetcher
[00007fffbc000be0] lua meta fetcher debug: Trying Lua scripts in /home/yaneti/cvs/videolan/vlc/share/lua/meta/fetcher
[00007fffbc000be0] main meta fetcher debug: no meta fetcher modules matched
[00007fffbc000be0] main art finder debug: looking for art finder module matching "any": 2 candidates
Detaching after vfork from child process 32602.
[00007fffbc000be0] lua art finder debug: Trying Lua scripts in /home/yaneti/.local/share/vlc/lua/meta/art
[00007fffbc000be0] lua art finder debug: Trying Lua scripts in /home/yaneti/cvs/videolan/vlc/share/lua/meta/art
[00007fffbc000be0] lua art finder debug: Trying Lua playlist script /home/yaneti/cvs/videolan/vlc/share/lua/meta/art/00_musicbrainz.lua
[00007fffbc000be0] lua art finder debug: skipping script (unmatched scope) /home/yaneti/cvs/videolan/vlc/share/lua/meta/art/00_musicbrainz.lua
[00007fffbc000be0] lua art finder debug: Trying Lua playlist script /home/yaneti/cvs/videolan/vlc/share/lua/meta/art/00_musicbrainz.luac
[00007fffbc000be0] lua art finder debug: skipping script (unmatched scope) /home/yaneti/cvs/videolan/vlc/share/lua/meta/art/00_musicbrainz.luac
[00007fffbc000be0] lua art finder debug: Trying Lua playlist script /home/yaneti/cvs/videolan/vlc/share/lua/meta/art/01_googleimage.lua
[00007fffbc000be0] lua art finder debug: skipping script (unmatched scope) /home/yaneti/cvs/videolan/vlc/share/lua/meta/art/01_googleimage.lua
[00007fffbc000be0] lua art finder debug: Trying Lua playlist script /home/yaneti/cvs/videolan/vlc/share/lua/meta/art/01_googleimage.luac
[00007fffbc000be0] lua art finder debug: skipping script (unmatched scope) /home/yaneti/cvs/videolan/vlc/share/lua/meta/art/01_googleimage.luac
[00007fffbc000be0] lua art finder debug: Trying Lua playlist script /home/yaneti/cvs/videolan/vlc/share/lua/meta/art/02_frenchtv.lua
[00007fffbc000be0] lua art finder debug: skipping script (unmatched scope) /home/yaneti/cvs/videolan/vlc/share/lua/meta/art/02_frenchtv.lua
[00007fffbc000be0] lua art finder debug: Trying Lua playlist script /home/yaneti/cvs/videolan/vlc/share/lua/meta/art/02_frenchtv.luac
[00007fffbc000be0] lua art finder debug: skipping script (unmatched scope) /home/yaneti/cvs/videolan/vlc/share/lua/meta/art/02_frenchtv.luac
[00007fffbc000be0] lua art finder debug: Trying Lua playlist script /home/yaneti/cvs/videolan/vlc/share/lua/meta/art/03_lastfm.lua
[00007fffbc000be0] lua art finder debug: skipping script (unmatched scope) /home/yaneti/cvs/videolan/vlc/share/lua/meta/art/03_lastfm.lua
[00007fffbc000be0] lua art finder debug: Trying Lua playlist script /home/yaneti/cvs/videolan/vlc/share/lua/meta/art/03_lastfm.luac
[00007fffbc000be0] lua art finder debug: skipping script (unmatched scope) /home/yaneti/cvs/videolan/vlc/share/lua/meta/art/03_lastfm.luac
[00007fffbc000be0] main art finder debug: no art finder modules matched
[Thread 0x7fffeaa61700 (LWP 32601) exited]
[00007fffc4001610] http stream debug: resolving download.blender.org ...
[New Thread 0x7ffff7fe28c0 (LWP 32603)]
[Switching to Thread 0x7fffeab62700 (LWP 32600)]

Thread 6 "vlc" hit Breakpoint 1, vlc_interrupt_kill (ctx=ctx@entry=0x7fffc8000b40) at misc/interrupt.c:182
182	    atomic_store(&ctx->killed, true);
Missing separate debuginfos, use: dnf debuginfo-install libdvdread-5.0.3-7.fc27.x86_64
(gdb) bt
#0  vlc_interrupt_kill (ctx=ctx@entry=0x7fffc8000b40) at misc/interrupt.c:182
#1  0x00007ffff78d8145 in CloseWorker (fetcher_=<optimized out>, th_=0x7fffc8000b20) at playlist/fetcher.c:387
#2  0x00007ffff7946463 in Thread (data=0x6b44c0) at misc/background_worker.c:134
#3  0x00007ffff6c6f55b in start_thread (arg=0x7fffeab62700) at pthread_create.c:463
#4  0x00007ffff67a156f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) bt full
#0  vlc_interrupt_kill (ctx=ctx@entry=0x7fffc8000b40) at misc/interrupt.c:182
        __atomic_store_ptr = 0x7fffc8000b69
        __atomic_store_tmp = true
#1  0x00007ffff78d8145 in CloseWorker (fetcher_=<optimized out>, th_=0x7fffc8000b20) at playlist/fetcher.c:387
        th = 0x7fffc8000b20
#2  0x00007ffff7946463 in Thread (data=0x6b44c0) at misc/background_worker.c:134
        b_timeout = false
        item = <optimized out>
        handle = 0x7fffc8000b20
        worker = 0x6b44c0
#3  0x00007ffff6c6f55b in start_thread (arg=0x7fffeab62700) at pthread_create.c:463
        pd = 0x7fffeab62700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140737131194112, 6704462533039208939, 140737245956686, 140737245956687, 140737131194112, 0, -6704491731360433685, -6704482522124535317}, mask_was_saved = 0}}, priv = {
            pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#4  0x00007ffff67a156f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.
(gdb)

Comment 8 Florian Weimer 2017-11-22 17:29:13 UTC
I'm wondering if the backtrace is for something earlier than the HTTP download cancellation.

I'm looking at the posix_spawn call to launch /usr/bin/proxy now.  We did patch posix_spawn between -25 and -26:

commit aa95a2414e4f664ca740ad5f4a72d9145abbd426
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Sat Oct 21 11:33:27 2017 -0200

    posix: Do not use WNOHANG in waitpid call for Linux posix_spawn

Comment 9 Yanko Kaneti 2017-11-22 17:43:11 UTC
FWIW I just removed libproxy-bin , which gets rid of /usr/bin/proxy , without any noticable change in behavior.

One more thing. 
Some of the failed runs actually don't end normaly but glibc aborts with "The futex facility returned an unexpected error code."

Here is a backtrace of one such crashdump collected by coredumpctl
...

Core was generated by `./vlc -vvvvvvv http://download.blender.org/peach/bigbuckbunny_movies/BigBuckBun'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51	}
[Current thread is 1 (Thread 0x7f1412bef700 (LWP 5458))]
Missing separate debuginfos, use: dnf debuginfo-install libdvdread-5.0.3-7.fc27.x86_64 libedit-3.1-20.20170329cvs.fc27.x86_64 llvm-libs-5.0.0-5.fc28.x86_64 qt5-qtbase-5.9.2-5.fc28.x86_64 qt5-qtbase-gui-5.9.2-5.fc28.x86_64
(gdb) bt full
#0  0x00007f146e7ac54b in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
        set = {__val = {0 <repeats 16 times>}}
        pid = <optimized out>
        tid = <optimized out>
#1  0x00007f146e7ad8e1 in __GI_abort () at abort.c:79
        save_stage = 1
        act = 
          {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, sa_mask = {__val = {0 <repeats 14 times>, 139724190575520, 139724190575744}}, sa_flags = 314502048, sa_restorer = 0x1000}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2  0x00007f146e7eee27 in __libc_message (action=action@entry=(do_abort | do_backtrace), fmt=fmt@entry=0x7f146e8f5e13 "%s")
    at ../sysdeps/posix/libc_fatal.c:181
        ap = {{gp_offset = 24, fp_offset = 0, overflow_arg_area = 0x7f1412beec90, reg_save_area = 0x7f1412beec20}}
        fd = <optimized out>
        list = <optimized out>
        nlist = <optimized out>
        cp = <optimized out>
        written = <optimized out>
#3  0x00007f146e7eeec2 in __GI___libc_fatal (message=message@entry=0x7f146ed4b8c0 "The futex facility returned an unexpected error code.")
    at ../sysdeps/posix/libc_fatal.c:191
#4  0x00007f146ed4835a in futex_fatal_error () at ../sysdeps/nptl/futex-internal.h:200
        res = <optimized out>
        isem = <optimized out>
        private = <optimized out>
        d = <optimized out>
#5  0x00007f146ed4835a in futex_wake (private=<optimized out>, processes_to_wake=1, futex_word=<optimized out>)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:247
        res = <optimized out>
        isem = <optimized out>
        private = <optimized out>
        d = <optimized out>
#6  0x00007f146ed4835a in __new_sem_post (sem=<optimized out>) at sem_post.c:57
        isem = <optimized out>
        private = <optimized out>
        d = <optimized out>
#7  0x00007f146fa43049 in vlc_sem_post (sem=<optimized out>) at posix/thread.c:318
        val = 0
#8  0x00007f146f541703 in notify_func_wrapper (arg=0x7f13e4001650) at gai_notify.c:39
        n = 0x7f13e4001650
        func = 0x7f146fa43f30 <vlc_getaddrinfo_notify>
        value = {sival_int = 409445408, sival_ptr = 0x7f141867a420}
#9  0x00007f146ed3f55b in start_thread (arg=0x7f1412bef700) at pthread_create.c:463
        pd = 0x7f1412bef700
        now = <optimized out>
        unwind_buf = 
              {cancel_jmp_buf = {{jmp_buf = {139724190578432, -3216130511296136405, 139724882804110, 139724882804111, 139724190578432, 0, 3275848845164411691, 3276052005417777963}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#10 0x00007f146e86d56f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Comment 10 Yanko Kaneti 2017-11-22 17:50:00 UTC
Hm.. should note that that last observation was with glibc-2.26.9000-27.fc28.x86_64

Comment 11 Florian Weimer 2017-11-22 18:31:37 UTC
(In reply to Yanko Kaneti from comment #10)
> Hm.. should note that that last observation was with
> glibc-2.26.9000-27.fc28.x86_64

I see that as well.  My futex instrumentation shows that ENOSYS is returned, and I finally could reproduce it under strace.  The futex system call is passed a garbage futex operation.  I don't know yet if it is a different corruption issue, but I'll assume for now that it is related.

Comment 12 Florian Weimer 2017-11-22 18:40:44 UTC
I looked at the disassembly of __new_sem_post, and the error is basically completely impossible.  %eax is loaded right before the syscall.  The call stack is also consistent, and it does not look at all we called into the middle of the function, before the system call (GOT pointer points to the start of __new_sem_post).

Everything looks like we _exit from vfork on the right thread (in the subprocess), and the thread which called vfork remains blocked until we exit from the subprocess.

I need to sleep over this.  It looks really bizarre.

Comment 13 Florian Weimer 2017-11-22 19:16:26 UTC
I cheated and used git bisect.  The culprit is this commit:

commit 8b0e795aaa445e9167aa07b282c5720b35342c07
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Wed Nov 1 11:49:05 2017 -0200

    Simplify Linux sig{timed}wait{info} implementations

This causes sigwait in the main thread to fail with EINTR, something vlc does not expect.  The following corruption happens because something vlc exit and things go quite horribly wrong after that.

I think EINTR for sigwait is wrong, and glibc needs to be fixed.


Note You need to log in before you can comment on or make changes to this bug.