Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1688958 - Memory leak: libcurl leaks 120 bytes on each connection
Summary: Memory leak: libcurl leaks 120 bytes on each connection
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: nss
Version: 7.6
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: nss-nspr-maint
QA Contact: BaseOS QE Security Team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-14 19:03 UTC by Dmitry Oksenchuk
Modified: 2019-04-03 10:44 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)
curl_test.cpp: small example that reproduces the issue (deleted)
2019-03-16 10:08 UTC, Dmitry Oksenchuk
no flags Details
massif.out.3203.txt: profiling information gathered by Massif for 10000 requests (deleted)
2019-03-16 10:13 UTC, Dmitry Oksenchuk
no flags Details

Description Dmitry Oksenchuk 2019-03-14 19:03:16 UTC
Description of problem:
libcurl uses libnss3 which uses libnsspem which allocates objects in an arena which is unable to free memory until the arena is destroyed at program exit or until some other event.

Strictly speaking this is not a memory leak, Memcheck doesn't detect it but for a long running daemon memory growth is unlimited. Massif correctly displays call stacks allocated memory. See call stacks below.

Version-Release number of selected component (if applicable):
libcurl.x86_64 7.29.0-51.el7 @base
nss.x86_64 3.36.0-7.1.el7_6 @updates
nss-pem.x86_64 1.0.3-5.el7 @base  

How reproducible:
I may try to write a simple program that reproduces the issue, if necessary.

Steps to Reproduce:
Do thousands https requests using libcurl.

Actual results:
Unlimited memory growth.

Expected results:
Stable memory usage.

Additional info:

Call stack allocated the object:
nss_zalloc_arena_locked (arena.c:750)
nss_ZAlloc (arena.c:872)
nssCKFWMutex_Create (mutex.c:85)
nssCKFWInstance_CreateMutex (instance.c:465)
nssCKFWObject_Create (object.c:160)
nssCKFWSession_CreateObject (session.c:1274)
NSSCKFWC_CreateObject (wrap.c:1921)
pemC_CreateObject (nssck.api:530)
PK11_CreateNewObject (pk11obj.c:404)
pk11_CreateGenericObjectHelper (pk11obj.c:1642)
PK11_CreateManagedGenericObject (pk11obj.c:1689)
nss_create_object (nss.c:405)
nss_load_cert (nss.c:457)
nss_load_ca_certificates (nss.c:1243)
nss_setup_connect (nss.c:1499)
nss_connect_common (nss.c:1672)
Curl_nss_connect_nonblocking (nss.c:1725)
Curl_ssl_connect_nonblocking (sslgen.c:229)
https_connecting (http.c:1352)
Curl_http_connect (http.c:1322)
Curl_protocol_connect (url.c:3328)
multi_runsingle (multi.c:1159)
curl_multi_perform (multi.c:1757)
curl_easy_perform (easy.c:480)

Call stack "freed" the object:
nss_ZFreeIf (arena.c:955)
nssCKFWMutex_Destroy (mutex.c:143)
nssCKFWObject_Destroy (object.c:248)
NSSCKFWC_DestroyObject (wrap.c:2093)
pemC_DestroyObject (nssck.api:582)
PK11_DestroyObject (pk11obj.c:55)
PK11_DestroyGenericObject (pk11obj.c:1590)
Curl_llist_remove (llist.c:132)
Curl_llist_destroy (llist.c:149)
nss_fail_connect (nss.c:1362)
nss_do_connect (nss.c:1654)
nss_connect_common (nss.c:1685)
Curl_nss_connect_nonblocking (nss.c:1725)
Curl_ssl_connect_nonblocking (sslgen.c:229)
https_connecting (http.c:1352)
Curl_http_connect (http.c:1322)
Curl_protocol_connect (url.c:3328)
multi_runsingle (multi.c:1159)
curl_multi_perform (multi.c:1757)
curl_easy_perform (easy.c:480)

Me trying to dig deeper:

nss_create_object for every request uses the same 'slot' object found by name "PEM Token #0" (comments in the code are mine):

  const int slot_id = (cacert) ? 0 : 1;                  // slot_id = 0
  char *slot_name = aprintf("PEM Token #%d", slot_id);   // slot_name = "PEM Token #0"
  slot = nss_find_slot_by_name(slot_name);

pk11_CreateGenericObjectHelper takes session number from the slot object:

    crv = PK11_CreateNewObject(slot, slot->session, pTemplate, count,   // slot->session = 1
                               token, &objectID);

PK11_CreateNewObject forwards session number:

    crv = PK11_GETTAB(slot)->C_CreateObject(rwsession,                  // rwsession = 1
                                            /* cast away const :-( */ (CK_ATTRIBUTE_PTR)theTemplate,
                                            count, objectID);

pemC_CreateObject provides fwInstance and forwards session number:

  return NSSCKFWC_CreateObject(fwInstance, hSession, pTemplate, ulCount, phObject);    // fwInstance - static variable, hSession = 1

NSSCKFWC_CreateObject finds session object by session number:

   fwSession = nssCKFWInstance_ResolveSessionHandle(fwInstance, hSession);    // = nssCKFWHash_Lookup(fwInstance->sessionHandleHash, (const void *)hSession)

nssCKFWSession_CreateObject gets arena from session token object:

   arena = nssCKFWToken_GetArena(fwSession->fwToken, pError);     // = fwSession->fwToken->arena

nssCKFWObject_Create uses arena to allocate memory for NSSCKFWObject:
   fwObject = nss_ZNEW(arena, NSSCKFWObject);
   fwObject->mutex = nssCKFWInstance_CreateMutex(fwInstance, arena, pError);

nssCKFWMutex_Create uses arena to allocate memory for NSSCKFWMutex:
   mutex = nss_ZNEW(arena, NSSCKFWMutex);

sizeof(NSSCKFWObject) == 80, sizeof(NSSCKFWMutex) == 8, sizeof(arena pointer_header) == 16, in total 80 + 8 + 16*2 = 120 leaks on each request because nss_ZFreeIf doesn't free memory.

Extract from nss_ZFreeIf:
        PR_Lock(h->arena->lock);
        (void)nsslibc_memset(pointer, 0, h->size);
        /* No way to "free" it within an NSPR arena. */
        PR_Unlock(h->arena->lock);

That's strange because there is PL_ARENA_RELEASE as opposite to PL_ARENA_ALLOCATE used in nss_zalloc_arena_locked.

As workaround, is it possible to re-create the arena or re-initialize curl/nss?

Maybe there is something wrong in how we use libcurl? I haven't found any bug reports for the problem. We use Google API C++ Client that uses libcurl: https://github.com/google/google-api-cpp-client/blob/master/src/googleapis/client/transport/curl_http_transport.cc

I would be grateful for any information that helps to resolve the problem.

Thank you.

Comment 2 Kamil Dudka 2019-03-15 09:10:40 UTC
Thanks for the analysis!  I have not looked at all the details you provided and the corresponding source code.  However you are using libcurl-7.29.0-51.el7 together with nss-pem-1.0.3-5.el7.  This combination of builds is known to cause severe performance regression:

    bug #1659108

Could you please try to upgrade to nss-pem-1.0.3-5.el7_6.1, which went out 3 days ago?

    https://access.redhat.com/errata/RHBA-2019:0500

Comment 3 Dmitry Oksenchuk 2019-03-15 12:54:41 UTC
Thank you Kamil!

nss-pem-1.0.3-5.el7_6.1 is not available in CentOS yet. I have tried nss-pem-1.0.4.20181221.153303.gbc177c0.gl_list-1.el7 mentioned in bug #1659108 and it still leaks.

I will try to write a small example that reproduces the problem.

Comment 4 Dmitry Oksenchuk 2019-03-16 10:08:29 UTC
Created attachment 1544742 [details]
curl_test.cpp: small example that reproduces the issue

Comment 5 Dmitry Oksenchuk 2019-03-16 10:13:52 UTC
Created attachment 1544755 [details]
massif.out.3203.txt: profiling information gathered by Massif for 10000 requests

Comment 6 Dmitry Oksenchuk 2019-03-16 10:32:30 UTC
I attached a small example that reproduces the problem. The code is based on curl_http_transport.cc: https://github.com/google/google-api-cpp-client/blob/master/src/googleapis/client/transport/curl_http_transport.cc. 

To compile and run the example execute:
   $ g++ -o curl_test -lcurl -O1 -ggdb -std=c++11 curl_test.cpp
   $ ./curl_test 10000 # number of https requests

Also I attached memory profiling information gathered by Massif. If you compare snapshots at the beginning of the program execution and at the end, you will notice that more and more bytes are allocated by the following functions:

->42.14% (1,337,344B) 0x6AB610B: PL_ArenaAllocate (plarena.c:127)
| ->40.07% (1,271,808B) 0xB89A47C: nss_zalloc_arena_locked (in /usr/lib64/libnsspem.so)
| | ->40.07% (1,271,808B) 0xB89A56E: nss_ZAlloc (in /usr/lib64/libnsspem.so)
| |   ->19.36% (614,400B) 0xB88DD5B: nssCKFWMutex_Create (in /usr/lib64/libnsspem.so)
| |   | ->19.36% (614,400B) 0xB88D567: nssCKFWInstance_CreateMutex (in /usr/lib64/libnsspem.so)
| |   |   ->19.29% (612,352B) 0xB88DF4B: nssCKFWObject_Create (in /usr/lib64/libnsspem.so)
| |   |   | ->19.29% (612,352B) 0xB88F4FE: nssCKFWSession_CreateObject (in /usr/lib64/libnsspem.so)
| |   |   |   ->19.29% (612,352B) 0xB894359: NSSCKFWC_CreateObject (in /usr/lib64/libnsspem.so)
| |   |   |     ->19.29% (612,352B) 0x65A2210: PK11_CreateNewObject (pk11obj.c:404)
| |   |   |       ->19.29% (612,352B) 0x65A40E6: pk11_CreateGenericObjectHelper (pk11obj.c:1642)
| |   |   |         ->19.29% (612,352B) 0x4E375DB: nss_create_object.isra.10 (nss.c:405)
| |   |   |           ->19.29% (612,352B) 0x4E37697: nss_load_cert (nss.c:457)
| |   |   |             ->19.29% (612,352B) 0x4E705BB: nss_connect_common (nss.c:1243)
| |   |   |               ->19.29% (612,352B) 0x4E66AEC: Curl_ssl_connect_nonblocking (sslgen.c:229)
| |   |   |                 ->19.29% (612,352B) 0x4E3DDDB: Curl_http_connect (http.c:1352)
| |   |   |                   ->19.29% (612,352B) 0x4E4D733: Curl_protocol_connect (url.c:3328)
| |   |   |                     ->19.29% (612,352B) 0x4E60E59: multi_runsingle (multi.c:1159)
| |   |   |                       ->19.29% (612,352B) 0x4E613AF: curl_multi_perform (multi.c:1757)
| |   |   |                         ->19.29% (612,352B) 0x4E58621: curl_easy_perform (easy.c:480)
| |   |   |                           ->19.29% (612,352B) 0x4016D2: main (curl_test.cpp:60)
| |   |   
| |   ->19.36% (614,400B) 0xB88DEEE: nssCKFWObject_Create (in /usr/lib64/libnsspem.so)
| |   | ->19.36% (614,400B) 0xB88F4FE: nssCKFWSession_CreateObject (in /usr/lib64/libnsspem.so)
| |   |   ->19.36% (614,400B) 0xB894359: NSSCKFWC_CreateObject (in /usr/lib64/libnsspem.so)
| |   |     ->19.36% (614,400B) 0x65A2210: PK11_CreateNewObject (pk11obj.c:404)
| |   |       ->19.36% (614,400B) 0x65A40E6: pk11_CreateGenericObjectHelper (pk11obj.c:1642)
| |   |         ->19.36% (614,400B) 0x4E375DB: nss_create_object.isra.10 (nss.c:405)
| |   |           ->19.36% (614,400B) 0x4E37697: nss_load_cert (nss.c:457)
| |   |             ->19.36% (614,400B) 0x4E705BB: nss_connect_common (nss.c:1243)
| |   |               ->19.36% (614,400B) 0x4E66AEC: Curl_ssl_connect_nonblocking (sslgen.c:229)
| |   |                 ->19.36% (614,400B) 0x4E3DDDB: Curl_http_connect (http.c:1352)
| |   |                   ->19.36% (614,400B) 0x4E4D733: Curl_protocol_connect (url.c:3328)
| |   |                     ->19.36% (614,400B) 0x4E60E59: multi_runsingle (multi.c:1159)
| |   |                       ->19.36% (614,400B) 0x4E613AF: curl_multi_perform (multi.c:1757)
| |   |                         ->19.36% (614,400B) 0x4E58621: curl_easy_perform (easy.c:480)
| |   |                           ->19.36% (614,400B) 0x4016D2: main (curl_test.cpp:60)

At the end of the program execution 1,271,808 bytes were allocated for 10,000 requests - about 127 bytes per request.

Comment 7 Dmitry Oksenchuk 2019-03-19 11:59:57 UTC
I haven't found any workaround.

There are three possible fixes in my opinion:
1. Switch to openssl hoping it doesn't leak.
2. Don't use arena in nss-pem. But it may bring a performance degradation.
3. Try to find object before creating one in nss_load_cert(). Every time the object is created for system certificate file, so we definitely can use previously created one.

Let me know if you need any additional information regarding this issue.

Comment 8 Kamil Dudka 2019-03-20 15:48:16 UTC
Sorry for the delay!  I have been busy with security issues elsewhere...

(In reply to Dmitry Oksenchuk from comment #7)
> There are three possible fixes in my opinion:
> 1. Switch to openssl hoping it doesn't leak.

This is now implemented in Fedora 27+ per the following change:

    https://fedoraproject.org/wiki/Changes/libcurlBackToOpenSSL

Unfortunately, this is not an option for RHEL-7.

> 2. Don't use arena in nss-pem. But it may bring a performance degradation.

Do you have any patch in mind?

As I understand it, the leaked memory is not allocated in nss-pem code directly.  It is nss code what allocates the mutex, isn't it?

> 3. Try to find object before creating one in nss_load_cert(). Every time the
> object is created for system certificate file, so we definitely can use
> previously created one.

This is actually what nss-pem does internally -- see the AddObjectIfNeeded() function, introduced while fixing the following bug:

    bug #509705

Unfortunately, nss breaks when pem_CreateObject() returns a single address multiple times, because CKFW hashes the addresses returned out of the C_CreateObject() callback.  So, to work around this limitation, nss-pem pretends object creation even when it internally re-uses an already existing object.  This is tracked as the following bug:

    bug #733685

Comment 9 Kamil Dudka 2019-03-20 16:05:40 UTC
(In reply to Kamil Dudka from comment #8)
> As I understand it, the leaked memory is not allocated in nss-pem code
> directly.  It is nss code what allocates the mutex, isn't it?

Note that, despite /usr/lib64/libnsspem.so appears in the backtrace next to nssCKFW* symbols, the actual code is not maintained in nss-pem.  It is nss code that nss-pem links _statically_ via -lnssckfw.

Comment 10 Dmitry Oksenchuk 2019-03-21 13:50:22 UTC
(In reply to Kamil Dudka from comment #8)
> > 2. Don't use arena in nss-pem. But it may bring a performance degradation.
> 
> Do you have any patch in mind?
> 
> As I understand it, the leaked memory is not allocated in nss-pem code
> directly.  It is nss code what allocates the mutex, isn't it?

I was thinking about passing NULL instead of arena to nssCKFWObject_Create() in nssCKFWSession_CreateObject(). nss_ZNEW() creates object on the heap if arena is NULL. However, this looks like a dirty hack because nssCKFWSession_CreateObject() returns CKR_GENERAL_ERROR if arena is NULL.

(In reply to Kamil Dudka from comment #8)
> > 3. Try to find object before creating one in nss_load_cert(). Every time the
> > object is created for system certificate file, so we definitely can use
> > previously created one.
> 
> This is actually what nss-pem does internally -- see the AddObjectIfNeeded()
> function, introduced while fixing the following bug:
> 
>     bug #509705
> 
> Unfortunately, nss breaks when pem_CreateObject() returns a single address
> multiple times, because CKFW hashes the addresses returned out of the
> C_CreateObject() callback.  So, to work around this limitation, nss-pem
> pretends object creation even when it internally re-uses an already existing
> object.  This is tracked as the following bug:
> 
>     bug #733685

Maybe PK11_FindGenericObjects() / PK11_GetNextGenericObject() could be used in nss_load_cert() to find previously created object. nss_create_object() in this case should use PK11_CreateGenericObject() instead of PK11_CreateManagedGenericObject() to leave the object alive for future search. This brings back leak fixed by the Managed function but it will be one leak per certificate file instead of one leak per https request.

Comment for PK11_CreateGenericObject() says that Mozilla uses the interface this way:
 * The old interface is preserved because applications like Mozilla purposefully
 * leak the reference to be found later with PK11_FindGenericObjects. New 
 * applications should use the new interface PK11_CreateManagedGenericObject.

Comment 11 Kamil Dudka 2019-03-21 15:26:33 UTC
(In reply to Dmitry Oksenchuk from comment #10)
> I was thinking about passing NULL instead of arena to nssCKFWObject_Create()
> in nssCKFWSession_CreateObject(). nss_ZNEW() creates object on the heap if
> arena is NULL.

I do not understand how allocating the objects on heap instead of arena could ever help here.  Unless you explicitly free the allocated objects somewhere (which you can do for arena-allocated objects, too), it would only turn the "run-time" memory leak into regular memory leak observable by valgrind.  On of the advantages of using arenas is that all the allocated memory is released once the arena is destroyed.  In this case, the arena is created in nssCKFWSession_Create() and destroyed in nssCKFWSession_Destroy().  The arena is likely not destroyed soon enough in the mentioned scenario, so the allocated memory grows.  Still it is better to free the memory later than never.

> However, this looks like a dirty hack because nssCKFWSession_CreateObject()
> returns CKR_GENERAL_ERROR if arena is NULL.

As already explained, this code is not part of nss-pem.  It would need to be discussed with nss maintainers.
 
> Maybe PK11_FindGenericObjects() / PK11_GetNextGenericObject() could be used
> in nss_load_cert() to find previously created object.

This interface would not work for curl.  As it works now, the PK11_CreateGenericObject() interface is kind of abused.  curl just passes in a file name, nss-pem opens the file, parses it, and checks (using brute force) whether an object with the same DER contents is already loaded.  In order to check anything at curl's side, it would be necessary to let curl process the certificate by itself first before passing it to nss-pem.  I am afraid it would kill the only advantage of using nss-pem in curl.

Comment 12 Dmitry Oksenchuk 2019-03-21 19:26:29 UTC
(In reply to Kamil Dudka from comment #11)
> I do not understand how allocating the objects on heap instead of arena
> could ever help here.  Unless you explicitly free the allocated objects
> somewhere (which you can do for arena-allocated objects, too), it would only
> turn the "run-time" memory leak into regular memory leak observable by
> valgrind.

It is explicitly freed by Curl_llist_destroy():
nss_ZFreeIf (arena.c:955)
nssCKFWMutex_Destroy (mutex.c:143)
nssCKFWObject_Destroy (object.c:248)
NSSCKFWC_DestroyObject (wrap.c:2093)
pemC_DestroyObject (nssck.api:582)
PK11_DestroyObject (pk11obj.c:55)
PK11_DestroyGenericObject (pk11obj.c:1590)
Curl_llist_remove (llist.c:132)
Curl_llist_destroy (llist.c:149)

The problem is that nss_ZFreeIf() just zeros memory without freeing it if arena is not NULL: https://github.com/servo/nss/blob/master/lib/base/arena.c#L1018

I don't know when the arena is supposed to be destroyed but it seems like it's destroyed only on exit.

Comment 13 Kamil Dudka 2019-03-22 08:01:52 UTC
(In reply to Dmitry Oksenchuk from comment #12)
> The problem is that nss_ZFreeIf() just zeros memory without freeing it if 
> arena is not NULL:
> https://github.com/servo/nss/blob/master/lib/base/arena.c#L1018

Indeed.  Sorry for missing this important fact.  Then the correct fix would be to make nss_ZFreeIf() work as documented:

/*
 * NSS_ZFreeIf
 *
 * If the specified pointer is non-null, then the region of memory
 * to which it points -- which must have been allocated with
 * nss_ZAlloc -- will be zeroed and released.  This routine
 * returns a PRStatus value; if successful, it will return PR_SUCCESS.
 * If unsuccessful, it will set an error on the error stack and return
 * PR_FAILURE.
 *
 * The error may be one of the following values:
 *  NSS_ERROR_INVALID_POINTER
 *
 * Return value:
 *  PR_SUCCESS
 *  PR_FAILURE
 */

Returning PR_SUCCESS while keeping the memory allocated violates the contract.


Note You need to log in before you can comment on or make changes to this bug.