Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1065455 - repo delete method has a race condition
Summary: repo delete method has a race condition
Alias: None
Product: Pulp
Classification: Retired
Component: API/integration
Version: Master
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: ---
Assignee: pulp-bugs
QA Contact: pulp-qe-list
URL: https://pulp-dev-guide.readthedocs.or...
Depends On:
TreeView+ depends on / blocked
Reported: 2014-02-14 16:59 UTC by mkovacik
Modified: 2014-02-17 15:09 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2014-02-17 15:09:39 UTC

Attachments (Terms of Use)

Description mkovacik 2014-02-14 16:59:05 UTC
Description of problem:
Since [deleting] a repository includes major steps, the DELETE method should always return a task-response instead of custom-data response in case the repo doesn't exist anymore. That's because not doing so introduces a race-condition:
- create a repo
- send delete method request
- pulp checks for the presence of the repo
- pulp sleeps in the thread currently checking the repo presence
- send delete method request
- pulp checks for the presence of the repo in a fresh thread
- pulp returns a task
- previously sleeping pulp-thread wakes
- previously sleeping thread gives an "invalid" task

Additional info:

Comment 1 Michael Hrivnak 2014-02-14 17:39:08 UTC
I don't think the scenario you've described accurately represents how pulp behaves. The web response handler will do one of two things:

- queue a task to delete the repo
- return an error that the repo doesn't exist

Either is perfectly safe. Queueing the task makes no guarantees except that at some point in the future, pulp will attempt to delete the repo.

When a task to delete a repo actually runs, as with any task, it will validate the state of any data it needs. This process makes no assumptions about any previous validation done when the web request was made. If any of that validation fails, the task will gracefully fail.

Having multiple delete tasks queued is also fine. They lock the repo, so only one will run at a time.

Given the above, no value would be gained from returning a delete task if the resource does not currently exist.

Does this satisfy you that there is not any unsafe behavior? Or should we discuss further?

Comment 2 mkovacik 2014-02-17 10:12:04 UTC
I'm fine, thanks for the explanation. As far as the race goes, the pulp response behavior seen for 2 concurrent requests:
 - 202:
   - new task created for the delete method
   - list of pending tasks in the body 
 - 409:
   - duplicated task, I guess, when there are distributors/importers to be removed together with the task
   - custom status data in body
 - 404:
  - no such repo (anymore)
  - happens also when the repo is deleted synchronously (no associated importers/distributors)
  - custom status data in body

I'd still vote for the repo delete behaved asynchronously all the time as I think the API would be simpler in that case --- conditions checking required neither on server nor on client (response status code).
Moreover, the server already doesn't distinguish the async/sync processing in the status code i.e. response status code is 202 no matter whether a task is necessary or not. It could give 200 if the delete was performed right away.


Comment 3 Michael Hrivnak 2014-02-17 15:09:39 UTC
The delete is never performed right away, which is why we never return a 200.

Your feedback that the REST API would be simpler with a different design is noted and appreciated, but for now we are going to stick with the design we have on this feature. I certainly agree that we have lots of opportunities to simplify the API.

FWIW, starting with 2.4, the 409 will no longer be returned. It will only be 202 or 404. This change is not reflected in the current alpha build though.

Note You need to log in before you can comment on or make changes to this bug.