Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1511487 - [abrt] [faf] sos: _write_checksum(): /usr/lib/python2.7/site-packages/sos/sosreport.py killed by TypeError
Summary: [abrt] [faf] sos: _write_checksum(): /usr/lib/python2.7/site-packages/sos/sos...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: sos
Version: 7.5
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Pavel Moravec
QA Contact: Miroslav Hradílek
URL: http://faf.lab.eng.brq.redhat.com/faf...
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-09 12:56 UTC by Vladimir Benes
Modified: 2018-10-30 10:33 UTC (History)
9 users (show)

Fixed In Version: sos-3.6-1.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-10-30 10:31:19 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2018:3144 None None None 2018-10-30 10:33:21 UTC
Github sosreport sos pull 1329 None None None 2018-06-05 06:56:13 UTC

Description Vladimir Benes 2017-11-09 12:56:34 UTC
This bug has been created based on an anonymous crash report requested by the package maintainer.

Report URL: http://faf.lab.eng.brq.redhat.com/faf/reports/bthash/baf7294f2b22418a8ccc03af719d971eed80ab0c/

Comment 1 Bryn M. Reeves 2017-11-09 13:11:05 UTC
Are the logs from the run available?

Comment 4 Pavel Moravec 2017-11-15 07:49:38 UTC
Simple reproducer - bit more generic:

mkdir /tmp/tmp; sosreport --batch --tmp-dir=/tmp/tmp

in another terminal, run after a random time:

rm -rf /tmp/tmp

Possible backtraces:

Traceback (most recent call last):
  File "/usr/sbin/sosreport", line 25, in <module>
    main(sys.argv[1:])
  File "/usr/lib/python2.7/site-packages/sos/sosreport.py", line 1637, in main
    sos.execute()
OSError: [Errno 2] No such file or directory: '/tmp/tmp/sos.X05jdV/sosreport-pmoravec-rhel74.gsslab.brq2.redhat.com-20171115084447'

> /usr/lib64/python2.7/shutil.py(237)rmtree()
-> names = os.listdir(path)

or the reported one:

1 	_write_checksum 	/usr/lib/python2.7/site-packages/sos/sosreport.py 	/usr/lib/python2.7/site-packages/sos/sosreport.py 	1471
2 	final_work 	/usr/lib/python2.7/site-packages/sos/sosreport.py 	/usr/lib/python2.7/site-packages/sos/sosreport.py 	1526
3 	execute 	/usr/lib/python2.7/site-packages/sos/sosreport.py 	/usr/lib/python2.7/site-packages/sos/sosreport.py 	1613
4 	main 	/usr/lib/python2.7/site-packages/sos/sosreport.py 	/usr/lib/python2.7/site-packages/sos/sosreport.py 	1634
5 	<module> 	/usr/sbin/sosreport 	/usr/sbin/sosreport 	25



Code fix is "obvious" (catch and react on exceptions when trying to delete tmp dir or writing checksum), but proper reaction might be tricky - dont want to create potential regression in 7.5 so defering to 7.6

Comment 5 Pavel Moravec 2018-03-03 16:52:30 UTC
*** Bug 1548199 has been marked as a duplicate of this bug. ***

Comment 6 Pavel Moravec 2018-04-23 14:26:07 UTC
Much more probable cause:

https://github.com/sosreport/sos/pull/1273

that is due to bz1548199 .

Comment 7 Pavel Moravec 2018-04-23 14:27:49 UTC
(In reply to Pavel Moravec from comment #6)
> Much more probable cause:
> 
> https://github.com/sosreport/sos/pull/1273
> 
> that is due to bz1548199 .

Please ignore, I am wrong (again).

Comment 8 Bryn M. Reeves 2018-04-23 15:08:21 UTC
The argument `archive` is passed None:

sosreport.py:1471:_write_checksum:TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

Traceback (most recent call last):
  File "/usr/sbin/sosreport", line 25, in <module>
    main(sys.argv[1:])
  File "/usr/lib/python2.7/site-packages/sos/sosreport.py", line 1634, in main
    sos.execute()
  File "/usr/lib/python2.7/site-packages/sos/sosreport.py", line 1613, in execute
    return self.final_work()
  File "/usr/lib/python2.7/site-packages/sos/sosreport.py", line 1526, in final_work
    self._write_checksum(archive, hash_name, checksum)
  File "/usr/lib/python2.7/site-packages/sos/sosreport.py", line 1471, in _write_checksum
    fp = open(archive + "." + hash_name, "w")
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

Local variables in innermost frame:
checksum: False
self: <sos.sosreport.SoSReport object at 0x7f7bbcc97910>
hash_name: 'md5'
archive: None

Which is possible due to a buglet in _create_checksum():

1443     def _create_checksum(self, archive, hash_name):
1444         if not archive:
1445             return False <------
1446 
1447         archive_fp = open(archive, 'rb')
1448         digest = hashlib.new(hash_name)
1449         digest.update(archive_fp.read())
1450         archive_fp.close()
1451         return digest.hexdigest()

We should not even be attempting to call these functions if we have no valid archive path.

The real bug is here:

1475             # compression could fail for a number of reasons
1476             try:
1477                 archive = self.archive.finalize(
1478                     self.opts.compression_type)
1479             except (OSError, IOError) as e:
1480                 if e.errno in fatal_fs_errors:
1481                     print("")
1482                     print(_(" %s while finalizing archive" % e.strerror))
1483                     print("")
1484                     self._exit(1)
1485             except:
1486                 if self.opts.debug:
1487                     raise
1488                 else:
1489                     return False

We must have returned from self.archive.finalize() with archive==None: nothing ever checks this, _create_checksum() happily ignores it, and then _write_checksum() dies in the spaghetti...

There's a straightforward fix if this is needed promptly, but really, all of these functions would benefit from a thorough review and more robust error handling.

Comment 9 Pavel Moravec 2018-05-30 10:48:30 UTC
Filip,
I know you collected various reproducers of "sosreport fails on disk full" scenarios - what is their status / some patches available?

Comment 10 Filip Krska 2018-06-01 13:46:27 UTC
Ahoj Pavle, you probably mean the mentioned pull/1273/ (related rather to bug 1548199 than this "TypeError: unsupported operand" bug). And yes, latest patch there attempts to address all "disk full" scenarios I encountered when testing.

Comment 11 Pavel Moravec 2018-06-05 06:56:13 UTC
(In reply to Filip Krska from comment #10)
> Ahoj Pavle, you probably mean the mentioned pull/1273/ (related rather to
> bug 1548199 than this "TypeError: unsupported operand" bug). And yes, latest
> patch there attempts to address all "disk full" scenarios I encountered when
> testing.

Three times is enough! :) to confusingly match this BZ with an independent another one.

https://github.com/sosreport/sos/pull/1329 has a fix for this one.

Comment 12 Pavel Moravec 2018-06-21 10:44:39 UTC
Reproducer steps (deterministic but require code change in sos to catch proper timing):

1) in  /usr/lib/python2.7/site-packages/sos/sosreport.py around line 1493 (depends on sos version):

            try:
                archive = self.archive.finalize(
                    self.opts.compression_type)
            except (OSError, IOError) as e:

add there:

            try:
                import time
                time.sleep(10)
                archive = self.archive.finalize(
                    self.opts.compression_type)
            except (OSError, IOError) as e:


2) Then run e.g.:

sosreport --batch

(you can limit it to few plugins only, but dont use --build !)


3) Check tmp dir from:

An archive containing the collected information will be generated in
/var/tmp/sos.U68EOp and may be provided to a Red Hat support
representative.


4) Once you see:

Creating compressed archive...

delete the temp.dir:

rm -rf /var/tmp/sos.U68EOp


5) wait 10s to complete sosreport's run

Comment 13 Bryn M. Reeves 2018-06-21 11:20:30 UTC
We don't support an external process removing the temporary directory out from under sos while it is running and never have. The correct procedure is:

1. Stop the sosreport process (*which will clean up its temporary directory anyway*)

2. Remove the containing directory if you wish to

We can harden the exception handling to print a better error in this case but just like every other time this has been discussed the best answer is "don't do it".

ABRT has had this problem for years: it sets up the directory, it starts the sosreport process (in correct order!), yet it cannot stop the process before it unlinks the temporary directory .... ?

Comment 17 errata-xmlrpc 2018-10-30 10:31:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:3144


Note You need to log in before you can comment on or make changes to this bug.