|Summary:||[abrt] [faf] sos: _write_checksum(): /usr/lib/python2.7/site-packages/sos/sosreport.py killed by TypeError|
|Product:||Red Hat Enterprise Linux 7||Reporter:||Vladimir Benes <vbenes>|
|Component:||sos||Assignee:||Pavel Moravec <pmoravec>|
|Status:||CLOSED ERRATA||QA Contact:||Miroslav Hradílek <mhradile>|
|Version:||7.5||CC:||agk, bmr, fkrska, gavin, mhradile, plambri, pportant, sbradley, vbenes|
|Fixed In Version:||sos-3.6-1.el7||Doc Type:||If docs needed, set a value|
|Doc Text:||Story Points:||---|
|Last Closed:||2018-10-30 10:31:19 UTC||Type:||Bug|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Cloudforms Team:||---||Target Upstream Version:|
Description Vladimir Benes 2017-11-09 12:56:34 UTC
This bug has been created based on an anonymous crash report requested by the package maintainer. Report URL: http://faf.lab.eng.brq.redhat.com/faf/reports/bthash/baf7294f2b22418a8ccc03af719d971eed80ab0c/
Comment 1 Bryn M. Reeves 2017-11-09 13:11:05 UTC
Are the logs from the run available?
Comment 4 Pavel Moravec 2017-11-15 07:49:38 UTC
Simple reproducer - bit more generic: mkdir /tmp/tmp; sosreport --batch --tmp-dir=/tmp/tmp in another terminal, run after a random time: rm -rf /tmp/tmp Possible backtraces: Traceback (most recent call last): File "/usr/sbin/sosreport", line 25, in <module> main(sys.argv[1:]) File "/usr/lib/python2.7/site-packages/sos/sosreport.py", line 1637, in main sos.execute() OSError: [Errno 2] No such file or directory: '/tmp/tmp/sos.X05jdV/sosreport-pmoravec-rhel74.gsslab.brq2.redhat.com-20171115084447' > /usr/lib64/python2.7/shutil.py(237)rmtree() -> names = os.listdir(path) or the reported one: 1 _write_checksum /usr/lib/python2.7/site-packages/sos/sosreport.py /usr/lib/python2.7/site-packages/sos/sosreport.py 1471 2 final_work /usr/lib/python2.7/site-packages/sos/sosreport.py /usr/lib/python2.7/site-packages/sos/sosreport.py 1526 3 execute /usr/lib/python2.7/site-packages/sos/sosreport.py /usr/lib/python2.7/site-packages/sos/sosreport.py 1613 4 main /usr/lib/python2.7/site-packages/sos/sosreport.py /usr/lib/python2.7/site-packages/sos/sosreport.py 1634 5 <module> /usr/sbin/sosreport /usr/sbin/sosreport 25 Code fix is "obvious" (catch and react on exceptions when trying to delete tmp dir or writing checksum), but proper reaction might be tricky - dont want to create potential regression in 7.5 so defering to 7.6
Comment 5 Pavel Moravec 2018-03-03 16:52:30 UTC
*** Bug 1548199 has been marked as a duplicate of this bug. ***
Comment 6 Pavel Moravec 2018-04-23 14:26:07 UTC
Much more probable cause: https://github.com/sosreport/sos/pull/1273 that is due to bz1548199 .
Comment 7 Pavel Moravec 2018-04-23 14:27:49 UTC
(In reply to Pavel Moravec from comment #6) > Much more probable cause: > > https://github.com/sosreport/sos/pull/1273 > > that is due to bz1548199 . Please ignore, I am wrong (again).
Comment 8 Bryn M. Reeves 2018-04-23 15:08:21 UTC
The argument `archive` is passed None: sosreport.py:1471:_write_checksum:TypeError: unsupported operand type(s) for +: 'NoneType' and 'str' Traceback (most recent call last): File "/usr/sbin/sosreport", line 25, in <module> main(sys.argv[1:]) File "/usr/lib/python2.7/site-packages/sos/sosreport.py", line 1634, in main sos.execute() File "/usr/lib/python2.7/site-packages/sos/sosreport.py", line 1613, in execute return self.final_work() File "/usr/lib/python2.7/site-packages/sos/sosreport.py", line 1526, in final_work self._write_checksum(archive, hash_name, checksum) File "/usr/lib/python2.7/site-packages/sos/sosreport.py", line 1471, in _write_checksum fp = open(archive + "." + hash_name, "w") TypeError: unsupported operand type(s) for +: 'NoneType' and 'str' Local variables in innermost frame: checksum: False self: <sos.sosreport.SoSReport object at 0x7f7bbcc97910> hash_name: 'md5' archive: None Which is possible due to a buglet in _create_checksum(): 1443 def _create_checksum(self, archive, hash_name): 1444 if not archive: 1445 return False <------ 1446 1447 archive_fp = open(archive, 'rb') 1448 digest = hashlib.new(hash_name) 1449 digest.update(archive_fp.read()) 1450 archive_fp.close() 1451 return digest.hexdigest() We should not even be attempting to call these functions if we have no valid archive path. The real bug is here: 1475 # compression could fail for a number of reasons 1476 try: 1477 archive = self.archive.finalize( 1478 self.opts.compression_type) 1479 except (OSError, IOError) as e: 1480 if e.errno in fatal_fs_errors: 1481 print("") 1482 print(_(" %s while finalizing archive" % e.strerror)) 1483 print("") 1484 self._exit(1) 1485 except: 1486 if self.opts.debug: 1487 raise 1488 else: 1489 return False We must have returned from self.archive.finalize() with archive==None: nothing ever checks this, _create_checksum() happily ignores it, and then _write_checksum() dies in the spaghetti... There's a straightforward fix if this is needed promptly, but really, all of these functions would benefit from a thorough review and more robust error handling.
Comment 9 Pavel Moravec 2018-05-30 10:48:30 UTC
Filip, I know you collected various reproducers of "sosreport fails on disk full" scenarios - what is their status / some patches available?
Comment 10 Filip Krska 2018-06-01 13:46:27 UTC
Ahoj Pavle, you probably mean the mentioned pull/1273/ (related rather to bug 1548199 than this "TypeError: unsupported operand" bug). And yes, latest patch there attempts to address all "disk full" scenarios I encountered when testing.
Comment 11 Pavel Moravec 2018-06-05 06:56:13 UTC
(In reply to Filip Krska from comment #10) > Ahoj Pavle, you probably mean the mentioned pull/1273/ (related rather to > bug 1548199 than this "TypeError: unsupported operand" bug). And yes, latest > patch there attempts to address all "disk full" scenarios I encountered when > testing. Three times is enough! :) to confusingly match this BZ with an independent another one. https://github.com/sosreport/sos/pull/1329 has a fix for this one.
Comment 12 Pavel Moravec 2018-06-21 10:44:39 UTC
Reproducer steps (deterministic but require code change in sos to catch proper timing): 1) in /usr/lib/python2.7/site-packages/sos/sosreport.py around line 1493 (depends on sos version): try: archive = self.archive.finalize( self.opts.compression_type) except (OSError, IOError) as e: add there: try: import time time.sleep(10) archive = self.archive.finalize( self.opts.compression_type) except (OSError, IOError) as e: 2) Then run e.g.: sosreport --batch (you can limit it to few plugins only, but dont use --build !) 3) Check tmp dir from: An archive containing the collected information will be generated in /var/tmp/sos.U68EOp and may be provided to a Red Hat support representative. 4) Once you see: Creating compressed archive... delete the temp.dir: rm -rf /var/tmp/sos.U68EOp 5) wait 10s to complete sosreport's run
Comment 13 Bryn M. Reeves 2018-06-21 11:20:30 UTC
We don't support an external process removing the temporary directory out from under sos while it is running and never have. The correct procedure is: 1. Stop the sosreport process (*which will clean up its temporary directory anyway*) 2. Remove the containing directory if you wish to We can harden the exception handling to print a better error in this case but just like every other time this has been discussed the best answer is "don't do it". ABRT has had this problem for years: it sets up the directory, it starts the sosreport process (in correct order!), yet it cannot stop the process before it unlinks the temporary directory .... ?
Comment 17 errata-xmlrpc 2018-10-30 10:31:19 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:3144