Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1688317 - sporadic timeouts in opening TCP connection prevents Satellite upgrades
Summary: sporadic timeouts in opening TCP connection prevents Satellite upgrades
Keywords:
Status: VERIFIED
Alias: None
Product: Red Hat Satellite 6
Classification: Red Hat
Component: Upgrades
Version: 6.4.2
Hardware: x86_64
OS: Linux
high
high vote
Target Milestone: 6.4.3
Assignee: satellite6-bugs
QA Contact: Nikhil Kathole
URL:
Whiteboard:
Depends On: 1687190
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-13 14:28 UTC by sthirugn@redhat.com
Modified: 2019-04-11 09:22 UTC (History)
8 users (show)

Fixed In Version: foreman-1.18.0.41-1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1687190
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)

Description sthirugn@redhat.com 2019-03-13 14:28:21 UTC
+++ This bug was initially created as a clone of Bug #1687190 +++

Description of problem:
Upgrading Satellite (e.g. from 6.3 to 6.4) can fail at random rake script with backtrace like:

RestClient::Exceptions::OpenTimeout: Timed out connecting to server
/opt/theforeman/tfm/root/usr/share/gems/gems/rest-client-2.0.1/lib/restclient/request.rb:731:in `rescue in transmit'
/opt/theforeman/tfm/root/usr/share/gems/gems/rest-client-2.0.1/lib/restclient/request.rb:642:in `transmit'
/opt/theforeman/tfm/root/usr/share/gems/gems/rest-client-2.0.1/lib/restclient/request.rb:145:in `execute'
/opt/theforeman/tfm/root/usr/share/gems/gems/rest-client-2.0.1/lib/restclient/request.rb:52:in `execute'
/opt/theforeman/tfm/root/usr/share/gems/gems/rest-client-2.0.1/lib/restclient/resource.rb:67:in `post'
/opt/theforeman/tfm/root/usr/share/gems/gems/runcible-2.8.1/lib/runcible/base.rb:92:in `get_response'
/opt/theforeman/tfm/root/usr/share/gems/gems/runcible-2.8.1/lib/runcible/base.rb:72:in `call'
/opt/theforeman/tfm/root/usr/share/gems/gems/runcible-2.8.1/lib/runcible/resources/repository.rb:147:in `unit_search'
/opt/theforeman/tfm/root/usr/share/gems/gems/runcible-2.8.1/lib/runcible/extensions/repository.rb:199:in `distributions'
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.7.0.46/app/models/katello/repository.rb:620:in `import_distribution_data'
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.7.0.46/app/models/katello/repository.rb:615:in `block in import_distributions'
/opt/theforeman/tfm-ror51/root/usr/share/gems/gems/activerecord-5.1.6/lib/active_record/relation/delegation.rb:39:in `each'
/opt/theforeman/tfm-ror51/root/usr/share/gems/gems/activerecord-5.1.6/lib/active_record/relation/delegation.rb:39:in `each'

(now it depends per use case / rake script failed, e.g. few times:

/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.7.0.46/app/models/katello/repository.rb:614:in `import_distributions'
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.7.0.46/lib/katello/tasks/upgrades/3.7/make_all_ks_repos_bootable.rake:7:in `block (4 levels) in <top (required)>'
/opt/rh/rh-ruby24/root/usr/share/gems/gems/rake-12.0.0/exe/rake:27:in `<top (required)>'

)

Usually it happens in a rake script that calls REST requests in a tight loop. Very random occurrences, more often on 6.5.


inecas++ ++ found out a fix:

diff --git a/app/models/setting.rb b/app/models/setting.rb
index 72a094024aa135fa41e0f6454d07c1d79fa3ac0a..e19ad55b24afb74c83f9e3033db2756541830fab 100644
--- a/app/models/setting.rb
+++ b/app/models/setting.rb
@@ -93,12 +93,8 @@ class Setting < ApplicationRecord
   def self.[](name)
     name = name.to_s
     cache_value = Setting.cache.read(name)
-    if cache_value.nil?
-      value = find_by(:name => name).try(:value)
-      Setting.cache.write(name, value)
-      return value
-    else
-      return cache_value
+    cache.fetch(name) do
+      find_by(:name => name).try(:value)
     end
   end

that really works.

This BZ is rather a tracker to ensure we fix it in downstream / to decide where all to fix it.


Version-Release number of selected component (if applicable):
Sat 6.4.2


How reproducible:
very sporadicaly during upgrade. Bit artificial reproducer found by Ivan is provided below.


Steps to Reproduce:
1. Decrease katello rest timeout to see the timeouts much sooner than after 1 hour:
/etc/foreman/plugins/katello.yaml :

  :rest_client_timeout: 30

2. Run either (or all) scripts that fires REST requests in a tight loop:

a) requests to 127.0.0.1/pub :
mkdir -p ~/repro
cat <<EOF | foreman-rake console | tee ~/repro/pub-$(date -Isec).log
   conf.return_format = ""
   log = Logger.new(STDOUT)
   log.info "=== process pid #{\$\$}"
   url = URI.parse('http://127.0.0.1/pub/')
   threads = 1.times.map do |i|
     Thread.new do
       log.info "Start thread no. #{i+1}"
         iteration = 0
         cycle = 0
         while true
         begin
           cycle += 1
           if cycle == 1000
             iteration += 1
             cycle = 0
             log.info "=== Thread #{i+1}: iteration #{iteration}k"
           end
           Setting['http_cache']
           Net::HTTP.get_response(url)
         rescue Exception => e
           log.info "Thread #{i+1}: error at iteration #{iteration * 1000 + cycle}"
           log.info "#{e.class} #{e.message}"
           log.info e.backtrace.join("\n")
           sleep(1)
         end
       end
     end
   end
   threads.each(&:join)
EOF

b) requests to pulp repo details:
cat <<EOF | foreman-rake console | tee ~/repro/pulp-$(date -Isec).log
   conf.return_format = ""
   log = Logger.new(STDOUT)
   pulp_id = Katello::Repository.first.pulp_id
   Katello.pulp_server.extensions.repository
   threads = 4.times.map do |i|
     Thread.new do
       log.info "Start thread no. #{i+1}"
       iteration = 0
       cycle = 0
       while true        
         begin
           cycle += 1
           if cycle == 1000
             iteration += 1
             cycle = 0
             log.info "=== iteration #{iteration}k"
           end
           Katello.pulp_server.extensions.repository.distributions(pulp_id)
         rescue Exception => e
           log.info "Error at iteration #{iteration * 1000 + cycle}"
           log.info "#{e.class} #{e.message}"
           log.info e.backtrace.join("\n")
         end
       end
     end
   end
   threads.each(&:join)
EOF

c) optionally, run requests against candlepin like (in rake console):

  4.times do
    Thread.new do
      begin
      org = Organization.first
      label = org.label
      iteration = 0
      cycle = 0
      while true        
        cycle += 1
        if cycle == 1000
          iteration += 1
          cycle = 0
          puts "=== iteration #{iteration}k"
        end
        Katello::Resources::Candlepin::Owner.get_ueber_cert(label)
      end
      rescue Exception => e
        puts e.message
        puts e.backtrace.join("\n")
        raise e
      end
    end
  end

3. Wait few hours and count number of timeouts


Actual results:
3. VERY depending on system, few to many timeouts happen


Expected results:
3. no timeouts


Additional info:

--- Additional comment from RHEL Product and Program Management on 2019-03-10 19:18:08 UTC ---

Since this bug report was entered in Red Hat Bugzilla, the 'sat-backlog' flag has been set to ? to ensure that it is properly evaluated for release.

--- Additional comment from RHEL Product and Program Management on 2019-03-10 19:18:08 UTC ---

Since this issue was entered in Red Hat Bugzilla, the pm_ack has been set to + automatically for the next planned release.

Comment 3 Nikhil Kathole 2019-04-11 09:22:47 UTC
VERIFIED

Version tested:
# rpm -q satellite
satellite-6.4.3-1.el7sat.noarch

No timeout seen with provided reproducer

I, [2019-04-11T05:06:38.925736 #19214]  INFO -- : === Thread 1: iteration 100k
I, [2019-04-11T05:06:41.913713 #19214]  INFO -- : === Thread 1: iteration 101k
I, [2019-04-11T05:06:45.022606 #19214]  INFO -- : === Thread 1: iteration 102k
I, [2019-04-11T05:06:48.103804 #19214]  INFO -- : === Thread 1: iteration 103k
I, [2019-04-11T05:06:51.326653 #19214]  INFO -- : === Thread 1: iteration 104k


I, [2019-04-11T05:19:39.623569 #1336]  INFO -- : === iteration 16k
I, [2019-04-11T05:19:40.514961 #1336]  INFO -- : === iteration 16k
I, [2019-04-11T05:20:22.603960 #1336]  INFO -- : === iteration 17k
I, [2019-04-11T05:20:24.864712 #1336]  INFO -- : === iteration 17k
I, [2019-04-11T05:20:25.063272 #1336]  INFO -- : === iteration 17k
I, [2019-04-11T05:20:26.603654 #1336]  INFO -- : === iteration 17k
I, [2019-04-11T05:21:08.424344 #1336]  INFO -- : === iteration 18k


Note You need to log in before you can comment on or make changes to this bug.