Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 590073 - Memory leak in libvirtd
Summary: Memory leak in libvirtd
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: libvirt
Version: 5.4
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Laine Stump
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 593339 606919 619711
TreeView+ depends on / blocked
 
Reported: 2010-05-07 16:42 UTC by Nandini Chandra
Modified: 2018-10-27 13:42 UTC (History)
20 users (show)

Fixed In Version: libvirt-0.8.2-1.el5
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 606919 (view as bug list)
Environment:
Last Closed: 2011-01-13 23:12:06 UTC
Target Upstream Version:


Attachments (Terms of Use)
output of Valgrind when libvirtd was run under Valgrind (deleted)
2010-05-07 16:42 UTC, Nandini Chandra
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2011:0060 normal SHIPPED_LIVE libvirt bug fix and enhancement update 2011-01-12 17:22:30 UTC

Description Nandini Chandra 2010-05-07 16:42:02 UTC
Created attachment 412398 [details]
output of Valgrind when libvirtd was run under Valgrind

Description of problem:
The leak is pretty slow, but becomes problematic for systems that have been running a long time.For example, on one of the customer's systems that had been running for ~2 months with ~25 VM's, libvirtd was using ~2.3GiB of memory (over half of the total memory allocated to the dom0).

Snippet from the output of Valgrind when libvirtd was run under Valgrind:
valgrind -v --leak-check=full --show-reachable=yes --log-file=libvirtd.memcheck /usr/sbin/libvirtd --daemon
==3876== 789,432 bytes in 8,938 blocks are definitely lost in loss record 417 of 417
==3876==    at 0x4A05F1D: realloc (vg_replace_malloc.c:476)
==3876==    by 0x384F0191AE: virReallocN (memory.c:160)
==3876==    by 0x384F06166A: xenUnifiedAddDomainInfo  (xen_unified.c:1688)
==3876==    by 0x384F076A01: xenStoreDomainIntroduced (xs_internal.c:1373)
==3876==    by 0x384F0775FD: xenStoreWatchEvent       (xs_internal.c:1303)
==3876==    by 0x40E3FE: virEventRunOnce (event.c:451)
==3876==    by 0x40F7DE: qemudRunLoop (qemud.c:2079)
==3876==    by 0x413DE8: main (qemud.c:2956)

Version-Release number of selected component (if applicable):
libvirt-0.6.3-20.1.el5_4


How reproducible:
Consistently


Steps to Reproduce:
1.Run libvirtd on a dom0 long enough(atleast a week) 
2.Make sure dom0 has numerous guests.
3.Verify the memory usage of libvirtd using 
ps auxwww|grep libvirtd
  
Actual results:
libvirtd slowly leaks memory

Expected results:
libvirtd should not leak memory

Additional info:
1)There are quite a few fixes for memory leaks in the upstream code which aren't include in RHEL 5.
For example:
(upstream fix: 7be5c26d746643b5ba889d62212a615050fed772)
virDomainPtr xenXMDomainDefineXML(virConnectPtr conn, const char *xml) {
    virDomainPtr ret;
-    virDomainPtr olddomain;
<snip>
-        /* XXX wtf.com is this line for - it appears to be amemory leak */
-        if (!(olddomain = virGetDomain(conn, def->name, entry->def->uuid)))
-            goto error;

virGetDomain allocs() the domain.

2)I've also attached the output of Valgrind when libvirtd was run under Valgrind.(libvirtd.memcheck)

Comment 5 Laine Stump 2010-06-22 16:58:17 UTC
Here's Dan Berrange's comments on the two biggest offenders in the valgrind output.

On 06/08/2010 04:53 AM, Daniel P. Berrange wrote:

> On Mon, Jun 07, 2010 at 11:02:21PM -0400, Laine Stump wrote:
>> Before I dig into this, do either of the following memory leaks look 
>> familiar to any libvirt guys? (as subject says, they're from 0.6.3 in 
>> RHEL 5.5)
>>
>> ==3876== 357,840 bytes in 8,946 blocks are definitely lost in loss 
>> record 416 of 417
>> ==3876==    at 0x4A05E1C: malloc (vg_replace_malloc.c:195)
>> ==3876==    by 0x346A2019D9: read_message (xs.c:768)
>> ==3876==    by 0x346A201B4B: read_thread (xs.c:824)
>> ==3876==    by 0x346760673C: start_thread (pthread_create.c:301)
>> ==3876==    by 0x3466ED3D1C: clone (in /lib64/libc-2.5.so)
>
> That's a XenStore bug and I'm not sure its easily fixable. When you
> register for a watch notification with xenstore it spawns a background
> thread for that. When you close your xenstore handle it uses the pure
> evil  pthread_cancel() to kill that thread. Memory cleanup ? What's 
> that ?  Would need todo something with cancelation handlers or rewrite
> the code to not use pthread_cancel().
>
> You'll leak one record for each libvirt Xen connection you open &
> close
>
>> ==3876==
>> ==3876== 789,432 bytes in 8,938 blocks are definitely lost in loss 
>> record 417 of 417
>> ==3876==    at 0x4A05F1D: realloc (vg_replace_malloc.c:476)
>> ==3876==    by 0x384F0191AE: virReallocN (memory.c:160)
>> ==3876==    by 0x384F06166A: xenUnifiedAddDomainInfo (xen_unified.c:1688)
>> ==3876==    by 0x384F076A01: xenStoreDomainIntroduced (xs_internal.c:1373)
>> ==3876==    by 0x384F0775FD: xenStoreWatchEvent (xs_internal.c:1303)
>> ==3876==    by 0x40E3FE: virEventRunOnce (event.c:451)
>> ==3876==    by 0x40F7DE: qemudRunLoop (qemud.c:2079)
>> ==3876==    by 0x413DE8: main (qemud.c:2956)
>
> I'm not sure what this is caused buy offhand. I might say that it was a
> result of the virConnectPtr ref counting not being right, but if that
> were the case I'd expect to see valgrind report that 'virConnectPtr' was
> leaked too, but it doesn't. So it must be some other issue.

Comment 6 Paolo Bonzini 2010-06-23 11:39:52 UTC
These ones are also noticeable:

==3876== 262,168 bytes in 1 blocks are indirectly lost in loss record 414 of 417
==3876==    at 0x4A05140: calloc (vg_replace_malloc.c:418)
==3876==    by 0x384F01921D: virAlloc (memory.c:100)
==3876==    by 0x410B20: qemudDispatchClientEvent (qemud.c:1741)
==3876==    by 0x40E3FE: virEventRunOnce (event.c:451)
==3876==    by 0x40F7DE: qemudRunLoop (qemud.c:2079)
==3876==    by 0x413DE8: main (qemud.c:2956)
==3876== 

==3876== 262,496 (8 direct, 262,488 indirect) bytes in 1 blocks are definitely lost in loss record 415 of 417
==3876==    at 0x4A05F1D: realloc (vg_replace_malloc.c:476)
==3876==    by 0x384F0191AE: virReallocN (memory.c:160)
==3876==    by 0x40FA16: qemudRunLoop (qemud.c:2206)
==3876==    by 0x413DE8: main (qemud.c:2956)
==3876==

Comment 7 Laine Stump 2010-07-19 16:01:01 UTC
Note that Bug 606919, which was cloned from this bug, is in MODIFIED state. The modified xen userspace package xen-3.0.3-114.el5 will eliminate the first of the leaks in comment 5.

I'm looking into the other leak in comment 5 now.

The two leaks noted in comment 6 don't seem as concerning, since each only occurs once.

Comment 8 Laine Stump 2010-07-23 20:30:29 UTC
I'm unable to reproduce the second of the 2 leaks in Comment 5 on my RHEL5 system (with libvirt-0.6.3-33.el5_5.1 and xen-3.0.3-105.el5_5.4). Can you provide more information on what you're doing on the system to produce the leak?

(On my setup, I run libvirt under valgrind as described above, then startup a few guests, and leave virt-manager running overnight (it's calling libvirt several times/second).

Comment 10 Paolo Bonzini 2010-07-29 11:59:48 UTC
Maybe I'm missing something obvious, but here:

void
xenUnifiedDomainInfoListFree(xenUnifiedDomainInfoListPtr list)
{
    int i;

    if (list == NULL)
        return;

    for (i=0; i<list->count; i++) {
        VIR_FREE(list->doms[i]->name);
        VIR_FREE(list->doms[i]);
    }
    VIR_FREE(list);
}

isn't a VIR_FREE(list->doms); missing??

Comment 11 Laine Stump 2010-07-29 13:07:38 UTC
I guess my time would have been better spent examining the code rather than trying to reproduce first.

That is definitely the problem. Thanks, Paolo!

Comment 13 Jiri Denemark 2010-07-29 18:55:45 UTC
Fix built into libvirt-0.6.3-37.el5

Comment 15 Jiri Denemark 2010-09-02 11:58:40 UTC
Fixed in libvirt-0.8.2-1.el5

Comment 18 yanbing du 2010-10-27 08:21:42 UTC
Verified the bug on RHEL5.6-Server-x86_64-Xen, RHEL5.6-Client-x86_64-Xen and  RHEL5.6-Server-ia64-Xen.
# rpm -q libvirt 
libvirt-0.8.2-8.el5
Steps:
1. Open two terminal.
2. On one terminal, run "while true; do echo connect; done | virsh",
which will repeatedly connect and disconnect to/from libvirtd.
The connect/disconnect loop is not the best way to trigger this leak but seems
to be the easiest one and it shows quite fast.
3. On the other terminal, run "top -d1 -p $(pidof libvirtd)" to watch memory
consumption of libvirtd, take note the value of RES column.

With the new package the memory remain more-or-less steady, it sometimes stay
unchanged for a while or even go down. So this bug is fixed. Move to VERIFIED.

Comment 19 xhu 2010-10-29 06:59:32 UTC
Verified on RHEL5u6-Client-i386-xen and it passed:
kernel-2.6.18-228.el5xen
libvirt-0.8.2-9.el5
xen-3.0.3-117.el5

Comment 21 errata-xmlrpc 2011-01-13 23:12:06 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2011-0060.html


Note You need to log in before you can comment on or make changes to this bug.