Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1358030 - restraintd crashes after erroneously trying to run a job multiple times
Summary: restraintd crashes after erroneously trying to run a job multiple times
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Restraint
Classification: Community
Component: general
Version: master
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: 0.1.25
Assignee: Artem Savkov
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-07-19 21:09 UTC by Jon Orris
Modified: 2016-08-26 04:45 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-08-03 09:04:31 UTC


Attachments (Terms of Use)
Crash log running restraintd in gdb (deleted)
2016-07-19 21:10 UTC, Jon Orris
no flags Details
restraint client log (deleted)
2016-07-19 21:12 UTC, Jon Orris
no flags Details
Patch that prevents crash (deleted)
2016-07-19 21:16 UTC, Jon Orris
no flags Details | Diff

Description Jon Orris 2016-07-19 21:09:34 UTC
Description of problem:

Under a certain set of circumstances involving dependencies, restraintd will crash after trying to run the final job in the set multiple times.

I've narrowed the case down to the following example case:

https://github.com/jorris/restraint_crash_test

This example is pared down from https://gerrit.beaker-project.org/#/c/5059/

It is possible to fix the crash by adding some guard clauses against empty task lists, but that still results in the final job being repeatedly run.

Comment 1 Jon Orris 2016-07-19 21:10:29 UTC
Created attachment 1181834 [details]
Crash log running restraintd in gdb

Comment 2 Jon Orris 2016-07-19 21:12:38 UTC
Created attachment 1181835 [details]
restraint client log

Comment 3 Jon Orris 2016-07-19 21:16:02 UTC
Created attachment 1181836 [details]
Patch that prevents crash

This patch will prevent the crash, but doesn't treat the underlying problem. The final job will still be run multiple times.

Comment 4 Artem Savkov 2016-07-21 10:17:26 UTC
Thanks for the report and a great reproducer. It turned out that restraint_task_result() corrupted the event loop after the first task resulting in duplicate calls of task_handler() and consequently multiple fetches/runs of the following tasks.

I've posted a fix for this issue to gerrit: https://gerrit.beaker-project.org/#/c/5082

I'm intentionally leaving the NULL checks you proposed out as this situation should never happen and I'd prefer restraint to crash in this case, so that we can catch it early.

Comment 5 Jon Orris 2016-07-21 21:36:34 UTC
(In reply to Artem Savkov from comment #4)
> I've posted a fix for this issue to gerrit:
> https://gerrit.beaker-project.org/#/c/5082

Thanks for the quick fix! I've been running on multiple jobs today without any problems.
 
> I'm intentionally leaving the NULL checks you proposed out as this situation
> should never happen and I'd prefer restraint to crash in this case, so that
> we can catch it early.

That's fine. It wasn't a proposed patch, just an attempt to figure out the root problem. Although maybe a g_assert(app_data->tasks != NULL) would make the function preconditions clear?


Note You need to log in before you can comment on or make changes to this bug.