Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.

Bug 1364566

Summary: After adding generic/orchestration catalog item infinispinner and 502 error(appliance crashed)
Product: Red Hat CloudForms Management Engine Reporter: Matouš Mojžíš <mmojzis>
Component: AutomateAssignee: Martin Povolny <mpovolny>
Status: CLOSED NOTABUG QA Contact: Matouš Mojžíš <mmojzis>
Severity: high Docs Contact:
Priority: high    
Version: 5.6.0CC: abellott, cpelland, dajohnso, jhardy, jrafanie, mfalesni, mkanoor, mmojzis, obarenbo, tfitzger
Target Milestone: GAKeywords: ZStream
Target Release: 5.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: service:catalog:appliance:perf
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1371174 (view as bug list) Environment:
Last Closed: 2016-10-12 08:58:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1371174    

Description Matouš Mojžíš 2016-08-05 17:59:28 UTC
Description of problem:
I tried to add generic catalog item according to one of my test case.
Then infinispinner appeared, so I tried to open cfme in another window. I was able to brwose through pages until I opened catalog items page again. Then it freezed on this window too. So I tried it on another appliance, where I added generic catalog item and then infinispinner appeared again, so I was able to browse other pages for like three minutes until whole appliance crashed and 502 proxy error appeared.

The worst thing is, that there are no errors in logs. It looks like appliance is still working.

When I added orchestration catalog item the only difference was, that infinispinner didn't appear and it saved that item.

Version-Release number of selected component (if applicable):
5.6.1.0

How reproducible:
Always

Steps to Reproduce:
1. Services -> Catalogs -> Catalog items
2. Add a generic/orchestration item
3.

Actual results:
Appliance crash and 502 proxy error

Expected results:


Additional info:

Comment 2 Joe Rafaniello 2016-08-05 18:30:07 UTC
This sounds like multiple requests are being processed concurrently in puma threads and are possibly deadlocking.  This sounds similar to what happened when automate's mutex and rails activesupport dependency mutex deadlocked each other [1] and was fixed in [2].

Note, I don't know if catalog items uses automate in the same way as the custom dialog form pulldown items.  Re-assigning to automate for now... Sorry, I'm not sure what category catalog items are.

A quick hack to see if it's puma threads running concurrently is to change your config/puma.rb from 'threads 5, 5' to 'threads 1, 1' and restart your appliance processes.

Note, The entire automate mutex goes away on master/5.7 via [3]

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1354054
[2] https://github.com/ManageIQ/manageiq/pull/9903
[3] https://github.com/ManageIQ/manageiq/pull/10135

Comment 3 Matouš Mojžíš 2016-08-05 18:33:10 UTC
I just figured out that it recovers after some time, so it might be the deadlock thing.

Comment 4 Greg McCullough 2016-08-08 14:24:05 UTC
Matous - Were you able to test with the config/puma.rb changes suggested in comment #2?

Please test with that configuration and provide feedback.

Comment 5 Matouš Mojžíš 2016-08-08 16:24:00 UTC
With 1, 1 still with infinispinner and in second panel it doesn't let me to connect immediately. After five minutes I can connect and create new catalog items without problems and also the first one is created.

Comment 6 mkanoor 2016-08-08 18:13:06 UTC
Matous,
The deadlock in 5.6 was fixed via this PR https://github.com/ManageIQ/manageiq/pull/10038
Can you make sure that this PR is in your environment.

You should be seeing this line in lib/miq_automation_engine/engine/miq_ae_engine.rb

def self.deliver(*args)
    # This was added because in RAILS 5 PUMA sends multiple requests
    # to the same process and our DRb server implementation for Automate
    # Methods is not thread safe. The permit_concurrent_loads allows the
    # DRb requests which run in different threads to load constants.
    ActiveSupport::Dependencies.interlock.permit_concurrent_loads do
      @deliver_mutex.synchronize { deliver_block(*args) }
    end
end

Thanks,
Madhu

Comment 7 Matouš Mojžíš 2016-08-09 09:20:14 UTC
Yes, I can see this in miq_ae_engine.rb.

Comment 8 mkanoor 2016-08-09 16:01:20 UTC
Hi,
I see the delay the first time I create a catalog item, after that it seems to create the subsequent items quickly. Is this the same behavior? Or is it slow for you every time? Also if you can provide your appliance IP I would like to test it there.
Thanks,
Madhu

Comment 9 Matouš Mojžíš 2016-08-10 10:16:12 UTC
Hi,
same for me with threads 1,1.
First is created and then it crashes for some time like five minutes and then it's okay and catalog items are created immediately.

Comment 10 mkanoor 2016-08-10 13:47:54 UTC
Hi,
So this is happening only when the Appliance uses Catalog Items for the first time, subsequent creations of Catalog Items work properly. So if I were to use your appliance  it won't exhibit this issue correct?

Thanks,
Madhu

Comment 12 Martin Povolny 2016-10-12 08:58:48 UTC
This was never on 5.7 as the fix was in master before the fork (and then backported to Darga). So closing this.

Comment 13 Matouš Mojžíš 2016-10-27 13:10:00 UTC
Removing needinfo flag as we discussed everything in BJ session with Madhu.