Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1060126 - libguests appliance creation (with 'direct' backend) hangs on a Btrfs machine
Summary: libguests appliance creation (with 'direct' backend) hangs on a Btrfs machine
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Virtualization Tools
Classification: Community
Component: libguestfs
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Richard W.M. Jones
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-01-31 10:55 UTC by Kashyap Chamarthy
Modified: 2014-02-09 17:25 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-02-09 17:25:03 UTC


Attachments (Terms of Use)
Complete stdout of libguestfs-test-tool (deleted)
2014-01-31 10:55 UTC, Kashyap Chamarthy
no flags Details
Complete stderr of libguestfs test tool with direct backend and TCG.txt (deleted)
2014-02-01 20:43 UTC, Kashyap Chamarthy
no flags Details
Complete stderr of libguestfs-test-tool with direct backend and TCG (deleted)
2014-02-01 20:58 UTC, Kashyap Chamarthy
no flags Details
Successful stderr of libguestfs-test-tool with direct backend (deleted)
2014-02-09 17:20 UTC, Kashyap Chamarthy
no flags Details
Successful stderr of libguestfs-test-tool with 'libvirt' backend (deleted)
2014-02-09 17:23 UTC, Kashyap Chamarthy
no flags Details

Description Kashyap Chamarthy 2014-01-31 10:55:32 UTC
Created attachment 857793 [details]
Complete stdout of libguestfs-test-tool

Description of problem
----------------------

Running libguestfs-test-tool ( with 'direct' backend) on a machine with
btrfs just hangs. 

    # NOTE: Full stdout, refer attachment

    $ libguestfs-test-tool
    [. . .]
    guestfsd: main_loop: new request, len 0x34
    guestfsd: main_loop: proc 3 (touch) took 0.06 seconds
    guestfsd: main_loop: new request, len 0x28
    umount /sysroot
    fsync /dev/sda
    libguestfs: sending SIGTERM to process 48146

Contextual `strace` of qemu-system-x86 further below. (Full strace is
37MB)


Version
-------

  $ uname -r; rpm -q qemu-system-x86 libvirt-daemon-kvm \
    libguestfs btrfs-progs
  3.14.0-0.rc0.git12.1.fc21.x86_64
  qemu-system-x86-1.7.0-4.fc21.x86_64
  libvirt-daemon-kvm-1.2.1-2.fc21.x86_64
  libguestfs-1.25.29-1.fc21.x86_64
  btrfs-progs-0.20.rc1.20131114git9f0c53f-1.fc20.x86_64


How reproducible: Consistently


Steps to reproduce
------------------

- Make a qumu wrapper script to strace qemu process, set libguestfs
  backend as 'direct' (use QEMU)

- Run libguestfs-test-tool with the above qemu wrapper to capture strace
  results of QEMU binary:
 
      $ cat qemu-strace.sh 
      #!/bin/bash -
      exec strace -o /tmp/qemu-strace.log -f -s 1024 /usr/bin/qemu-system-x86_64 "$@"
      
      $ chmod +x qemu-strace.sh
      
      $ export LIBGUESTFS_QEMU=/root/qemu-strace.sh 
      
      $ export LIBGUESTFS_BACKEND=direct
    
      $ guestfish get_backend
      direct
    
      $ libguestfs-test-tool


Actual results
--------------

libguestfs-test-tool


Expected results
----------------

Appliance creation should be successful on a Btrfs machine.


Additional info
---------------

Contextual `strace` of qemu-system-x86

===========
[. . .]
48170 +++ exited with 0 +++
48168 <... futex resumed> )             = -1 ETIMEDOUT (Connection timed out)
48168 madvise(0x7f7c14ffa000, 8368128, MADV_DONTNEED) = 0
48168 munmap(0x7f7c15ffc000, 8392704)   = 0
48168 _exit(0)                          = ?
48168 +++ exited with 0 +++
48161 <... futex resumed> )             = -1 ETIMEDOUT (Connection timed out)
48161 madvise(0x7f7c485fc000, 8368128, MADV_DONTNEED) = 0
48161 munmap(0x7f7c147f9000, 8392704)   = 0
48161 _exit(0)                          = ?
48161 +++ exited with 0 +++
48165 <... futex resumed> )             = -1 ETIMEDOUT (Connection timed out)
48165 madvise(0x7f7c167fd000, 8368128, MADV_DONTNEED) = 0
48165 munmap(0x7f7c243ff000, 8392704)   = 0
48165 _exit(0)                          = ?
48165 +++ exited with 0 +++
48154 <... ioctl resumed> , 0)          = 0
48154 ioctl(13, KVM_RUN, 0)             = 0
48154 ioctl(13, KVM_RUN, 0)             = 0
48154 ioctl(13, KVM_RUN, 0)             = 0
48154 ioctl(13, KVM_RUN, 0)             = 0
48154 ioctl(13, KVM_RUN, 0)             = 0
48154 ioctl(13, KVM_RUN, 0)             = 0
48154 ioctl(13, KVM_RUN, 0)             = 0
48154 ioctl(13, KVM_RUN <unfinished ...>
48152 <... ppoll resumed> )             = ? ERESTARTNOHAND (To be restarted if no handler)

48152 --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
48152 ppoll([{fd=80, events=POLLIN|POLLERR|POLLHUP}, {fd=79, events=POLLIN|POLLERR|POLLHUP}, {fd=78, events=POLLIN|POLLERR|POLLHUP}, {fd=77, events=POLLIN|POLLERR|POLLHUP}, {fd=76, events=POLLIN|POLLERR|POLLHUP}, {fd=75, events=POLLIN|POLLERR|POLLHUP}, {fd=74, events=POLLIN|POLLERR|POLLHUP}, {fd=73, events=POLLIN|POLLERR|POLLHUP}, {fd=72, events=POLLIN|POLLERR|POLLHUP}, {fd=71, events=POLLIN|POLLERR|POLLHUP}, {fd=70, events=POLLIN|POLLERR|POLLHUP}, {fd=69, events=POLLIN|POLLERR|POLLHUP}, {fd=68, events=POLLIN|POLLERR|POLLHUP}, {fd=67, events=POLLIN|POLLERR|POLLHUP}, {fd=66, events=POLLIN|POLLERR|POLLHUP}, {fd=65, events=POLLIN|POLLERR|POLLHUP}, {fd=64, events=POLLIN|POLLERR|POLLHUP}, {fd=63, events=POLLIN|POLLERR|POLLHUP}, {fd=62, events=POLLIN|POLLERR|POLLHUP}, {fd=61, events=POLLIN|POLLERR|POLLHUP}, {fd=60, events=POLLIN|POLLERR|POLLHUP}, {fd=59, events=POLLIN|POLLERR|POLLHUP}, {fd=58, events=POLLIN|POLLERR|POLLHUP}, {fd=57, events=POLLIN|POLLERR|POLLHUP}, {fd=56, events=POLLIN|POLLERR|POLLHUP}, {fd=55, events=POLLIN|POLLERR|POLLHUP}, {fd=54, events=POLLIN|POLLERR|POLLHUP}, {fd=53, events=POLLIN|POLLERR|POLLHUP}, {fd=52, events=POLLIN|POLLERR|POLLHUP}, {fd=51, events=POLLIN|POLLERR|POLLHUP}, {fd=50, events=POLLIN|POLLERR|POLLHUP}, {fd=49, events=POLLIN|POLLERR|POLLHUP}, {fd=48, events=POLLIN|POLLERR|POLLHUP}, {fd=47, events=POLLIN|POLLERR|POLLHUP}, {fd=46, events=POLLIN|POLLERR|POLLHUP}, {fd=45, events=POLLIN|POLLERR|POLLHUP}, {fd=44, events=POLLIN|POLLERR|POLLHUP}, {fd=43, events=POLLIN|POLLERR|POLLHUP}, {fd=42, events=POLLIN|POLLERR|POLLHUP}, {fd=41, events=POLLIN|POLLERR|POLLHUP}, {fd=40, events=POLLIN|POLLERR|POLLHUP}, {fd=39, events=POLLIN|POLLERR|POLLHUP}, {fd=38, events=POLLIN|POLLERR|POLLHUP}, {fd=37, events=POLLIN|POLLERR|POLLHUP}, {fd=36, events=POLLIN|POLLERR|POLLHUP}, {fd=35, events=POLLIN|POLLERR|POLLHUP}, {fd=34, events=POLLIN|POLLERR|POLLHUP}, {fd=33, events=POLLIN|POLLERR|POLLHUP}, {fd=32, events=POLLIN|POLLERR|POLLHUP}, {fd=31, events=POLLIN|POLLERR|POLLHUP}, {fd=30, events=POLLIN|POLLERR|POLLHUP}, {fd=29, events=POLLIN|POLLERR|POLLHUP}, {fd=28, events=POLLIN|POLLERR|POLLHUP}, {fd=27, events=POLLIN|POLLERR|POLLHUP}, {fd=26, events=POLLIN|POLLERR|POLLHUP}, {fd=25, events=POLLIN|POLLERR|POLLHUP}, {fd=24, events=POLLIN|POLLERR|POLLHUP}, {fd=23, events=POLLIN|POLLERR|POLLHUP}, {fd=22, events=POLLIN|POLLERR|POLLHUP}, {fd=21, events=POLLIN|POLLERR|POLLHUP}, {fd=20, events=POLLIN|POLLERR|POLLHUP}, {fd=19, events=POLLIN|POLLERR|POLLHUP}, {fd=18, events=POLLIN|POLLERR|POLLHUP}, {fd=17, events=POLLIN|POLLERR|POLLHUP}, {fd=16, events=POLLIN|POLLERR|POLLHUP}, {fd=15, events=POLLIN|POLLERR|POLLHUP}, {fd=14, events=POLLIN|POLLERR|POLLHUP}, {fd=3, events=POLLIN|POLLERR|POLLHUP}, {fd=5, events=POLLIN}, {fd=0, events=POLLIN}, {fd=6, events=POLLIN}, {fd=11, events=POLLIN}, {fd=4, events=POLLIN}], 73, NULL, NULL, 8
(END)
===========


Just for reference, Btrfs partitioning scheme on the host:

  $ df -hT
  Filesystem     Type      Size  Used Avail Use% Mounted on
  /dev/sda3      btrfs     123G   15G  107G  13% /
  devtmpfs       devtmpfs   32G     0   32G   0% /dev
  tmpfs          tmpfs      32G     0   32G   0% /dev/shm
  tmpfs          tmpfs      32G  1.4M   32G   1% /run
  tmpfs          tmpfs      32G     0   32G   0% /sys/fs/cgroup
  tmpfs          tmpfs      32G   42M   32G   1% /tmp
  /dev/sda3      btrfs     123G   15G  107G  13% /home

Comment 1 Kashyap Chamarthy 2014-01-31 10:59:03 UTC
(In reply to Kashyap Chamarthy from comment #0)
> Created attachment 857793 [details]
> Complete stdout of libguestfs-test-tool
> 
> Description of problem
> ----------------------
> 
> Running libguestfs-test-tool ( with 'direct' backend) on a machine with
> btrfs just hangs. 
> 
>     # NOTE: Full stdout, refer attachment
> 
>     $ libguestfs-test-tool
>     [. . .]
>     guestfsd: main_loop: new request, len 0x34
>     guestfsd: main_loop: proc 3 (touch) took 0.06 seconds
>     guestfsd: main_loop: new request, len 0x28
>     umount /sysroot
>     fsync /dev/sda
>     libguestfs: sending SIGTERM to process 48146
> 
> Contextual `strace` of qemu-system-x86 further below. (Full strace is
> 37MB)
> 
> 
> Version
> -------
> 
>   $ uname -r; rpm -q qemu-system-x86 libvirt-daemon-kvm \
>     libguestfs btrfs-progs
>   3.14.0-0.rc0.git12.1.fc21.x86_64
>   qemu-system-x86-1.7.0-4.fc21.x86_64
>   libvirt-daemon-kvm-1.2.1-2.fc21.x86_64
>   libguestfs-1.25.29-1.fc21.x86_64
>   btrfs-progs-0.20.rc1.20131114git9f0c53f-1.fc20.x86_64
> 
> 
> How reproducible: Consistently
> 
> 
> Steps to reproduce
> ------------------
> 
> - Make a qumu wrapper script to strace qemu process, set libguestfs
>   backend as 'direct' (use QEMU)
> 
> - Run libguestfs-test-tool with the above qemu wrapper to capture strace
>   results of QEMU binary:
>  
>       $ cat qemu-strace.sh 
>       #!/bin/bash -
>       exec strace -o /tmp/qemu-strace.log -f -s 1024
> /usr/bin/qemu-system-x86_64 "$@"
>       
>       $ chmod +x qemu-strace.sh
>       
>       $ export LIBGUESTFS_QEMU=/root/qemu-strace.sh 
>       
>       $ export LIBGUESTFS_BACKEND=direct
>     
>       $ guestfish get_backend
>       direct
>     
>       $ libguestfs-test-tool
> 
> 
> Actual results
> --------------
> 
> libguestfs-test-tool

s/libguestfs-test-tool/Appliance creation hangs as noted in the attachment

    https://bugzilla.redhat.com/attachment.cgi?id=857793

Comment 2 Kashyap Chamarthy 2014-01-31 11:00:30 UTC
Complete `strace`: http://kashyapc.fedorapeople.org/temp/qemu-strace.log

Comment 3 Richard W.M. Jones 2014-02-01 07:55:54 UTC
I suspect this isn't btrfs, but just a general hang.  I've seen
Rawhide hanging quite a lot when running the tests, but it is
frustratingly unreproducible.  It indicates some kind of kernel/
qemu emulation bug.

Firstly, I'm assuming because of previous conversations that this
is running on your baremetal 48 core monster machine?  So we're
not talking about nested KVM.

'clocksource: kvm-clock' which indicates that it is using KVM.
Also KVM_RUN ioctls appear in the strace.

I notice that you've set the backend to 'direct'.  Does it happen
with the libvirt backend?

Comment 4 Richard W.M. Jones 2014-02-01 07:56:38 UTC
Also you could try:

  export LIBGUESTFS_BACKEND_SETTINGS=force_tcg

(in combination with backend direct & libvirt).

Comment 5 Kashyap Chamarthy 2014-02-01 17:55:34 UTC
(In reply to Richard W.M. Jones from comment #3)
> I suspect this isn't btrfs, but just a general hang.  I've seen
> Rawhide hanging quite a lot when running the tests, but it is
> frustratingly unreproducible.  It indicates some kind of kernel/
> qemu emulation bug.

I see. Even, I noticed some compiles were hanging in a unreliable way, but I haven't done a systematic analysis. Next time I compile, will try to keep notes.

> 
> Firstly, I'm assuming because of previous conversations that this
> is running on your baremetal 48 core monster machine? So we're
> not talking about nested KVM.

Correct, just bare-metal with: F20, with Kernel, QEMU, Libvirt from Rawhide.

> 
> 'clocksource: kvm-clock' which indicates that it is using KVM.
> Also KVM_RUN ioctls appear in the strace.
> 
> I notice that you've set the backend to 'direct'.  Does it happen
> with the libvirt backend?

IIRC, it did hang with libvirt back-end, let me double confirm.

Yes, will try the 'force_tcg' back-end setting with both 'direct' and 'libvirt' back-end combinations.

Comment 6 Kashyap Chamarthy 2014-02-01 19:50:22 UTC
With 'libvirt' backend (and KVM)
--------------------------------

    $ guestfish get_backend
    libvirt
  
       $ time libguestfs-test-tool |& tee stderr-libvirt-kvm.txt
        ************************************************************
        *                    IMPORTANT NOTICE
        *
        * When reporting bugs, include the COMPLETE, UNEDITED
        * output below in your bug report.
        *
        ************************************************************
   LIBGUESTFS_QEMU=/usr/local/bin/qemu-strace.sh
   LIBGUESTFS_BACKEND=libvirt
   PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/root/bin
   SELinux: Permissive
   guestfs_get_append: (null)
   guestfs_get_autosync: 1
   guestfs_get_backend: libvirt
   guestfs_get_backend_settings: []
   guestfs_get_cachedir: /var/tmp
   guestfs_get_direct: 0
   guestfs_get_hv: /usr/local/bin/qemu-strace.sh
   guestfs_get_memsize: 500
   guestfs_get_network: 0
   guestfs_get_path: /usr/lib64/guestfs
   guestfs_get_pgroup: 0
   guestfs_get_program: libguestfs-test-tool
   guestfs_get_recovery_proc: 1
   guestfs_get_selinux: 0
   guestfs_get_smp: 1
   guestfs_get_tmpdir: /tmp
   guestfs_get_trace: 0
   guestfs_get_verbose: 1
   host_cpu: x86_64
   Launching appliance, timeout set to 600 seconds.
   libguestfs: launch: program=libguestfs-test-tool
   libguestfs: launch: version=1.25.29fedora=21,release=1.fc21,libvirt
   libguestfs: launch: backend registered: unix
   libguestfs: launch: backend registered: uml
   libguestfs: launch: backend registered: libvirt
   libguestfs: launch: backend registered: direct
   libguestfs: launch: backend=libvirt
   libguestfs: launch: tmpdir=/tmp/libguestfsHhzoO1
   libguestfs: launch: umask=0022
   libguestfs: launch: euid=0
   libguestfs: libvirt version = 1002001 (1.2.1)
   libguestfs: guest random name = guestfs-ehitwgt8t5q9o250
   libguestfs: [00000ms] connect to libvirt
   libguestfs: opening libvirt handle: URI = qemu:///system, auth = default+wrapper, flags = 0
   libguestfs: successfully opened libvirt handle: conn = 0x7f7547ee4880
   libguestfs: [00015ms] get libvirt capabilities    
       real    10m3.155s
       user    0m0.020s
       sys     0m0.322s
     
       $ echo $?
       0

Note
----

The reason for 10 minutes to run w/ wrapper (from IRC discussion with
Rich:

  From IRC discussion with Rich: I's basically because when you use a
  qemu wrapper w/ strace libvirt has trouble running qemu in order to
  query its capabilities.

Comment 7 Kashyap Chamarthy 2014-02-01 20:43:01 UTC
Created attachment 858116 [details]
Complete stderr of libguestfs test tool with direct backend and TCG.txt

So, even with setting 'force, /direct/ backend, the appliance seems to be hung just as previously:

------
.
.
.
[   40.500245] EXT4-fs (sda1): mounting ext2 file system using the ext4 subsystem
[   40.669298] EXT4-fs (sda1): mounted filesystem without journal. Opts: (null)
guestfsd: main_loop: proc 1 (mount) took 0.76 seconds
guestfsd: main_loop: new request, len 0x34
guestfsd: main_loop: proc 3 (touch) took 0.02 seconds
guestfsd: main_loop: new request, len 0x28
umount /sysroot
fsync /dev/sda
libguestfs: sending SIGTERM to process 40190
------

Comment 8 Kashyap Chamarthy 2014-02-01 20:58:58 UTC
Created attachment 858129 [details]
Complete stderr of libguestfs-test-tool with direct backend and TCG

Now for real, test with 'direct' backend and force_tcg.

With TCG, it's hung here:

    $ export LIBGUESTFS_BACKEND=direct

    $ export LIBGUESTFS_BACKEND_SETTINGS=force_tcg

    $ guestfish get_backend_settings
    force_tcg

    $ libguestfs-test-tool |& tee stderr-direct-tcg.txt
    .
    .
    .
    [    0.000000] \tRCU lockdep checking is enabled.
    [    0.000000] \tRCU restricting CPUs from NR_CPUS=8192 to nr_cpu_ids=1.
    [    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
    [    0.000000] NR_IRQS:524544 nr_irqs:256 16
    [    0.000000] Console: colour *CGA 80x25
    [    0.000000] console [ttyS0] enabled
    [    0.000000] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
    [    0.000000] ... MAX_LOCKDEP_SUBCLASSES:  8
    [    0.000000] ... MAX_LOCK_DEPTH:          48
    [    0.000000] ... MAX_LOCKDEP_KEYS:        8191
    [    0.000000] ... CLASSHASH_SIZE:          4096
    [    0.000000] ... MAX_LOCKDEP_ENTRIES:     16384
    [    0.000000] ... MAX_LOCKDEP_CHAINS:      32768
    [    0.000000] ... CHAINHASH_SIZE:          16384
    [    0.000000]  memory used by lock dependency info: 6367 kB
    [    0.000000]  per task-struct memory footprint: 2688 bytes
    [    0.000000] kmemleak: Kernel memory leak detector disabled
    main-loop: WARNING: I/O thread spun for 1000 iterations

Comment 9 Kashyap Chamarthy 2014-02-02 06:05:58 UTC
(In reply to Kashyap Chamarthy from comment #8)
> Created attachment 858129 [details]
> Complete stderr of libguestfs-test-tool with direct backend and TCG
> 
> Now for real, test with 'direct' backend and force_tcg.

Here's the complete strace with 'direct' backend and force_tcg:

   http://kashyapc.fedorapeople.org/temp/qemu-strace-direct-backend-with-tcg.log


PS: Forgot to note in the previous comment -- disregard comment #7]

Comment 10 Kashyap Chamarthy 2014-02-09 17:20:40 UTC
Created attachment 861085 [details]
Successful stderr of libguestfs-test-tool with direct backend

Ok, I re-ran the test with "direct" and "libvirt" backends with (KVM), both are successful on this Btrfs machine with 3.14 rc1 Kernel -- 3.14.0-0.rc1.git3.1.fc21.x86_64

  $ rpm -q libvirt-daemon-kvm qemu-system-x86 libguestfs
  libvirt-daemon-kvm-1.2.1-2.fc21.x86_64
  qemu-system-x86-1.7.0-4.fc21.x86_64
  libguestfs-1.25.29-1.fc21.x86_64

Comment 11 Kashyap Chamarthy 2014-02-09 17:23:56 UTC
Created attachment 861097 [details]
Successful stderr of libguestfs-test-tool with 'libvirt' backend

Comment 12 Kashyap Chamarthy 2014-02-09 17:25:03 UTC
Since a mysterious rawhide Kernel fixed it, closing it with rationale: RAWHIDE

I noted the fixed Kernel version in Comment #10


Note You need to log in before you can comment on or make changes to this bug.