Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1690066 - [3.11] Evicted builds don't have a specific status reason, instead are GenericBuildFailure
Summary: [3.11] Evicted builds don't have a specific status reason, instead are Generi...
Keywords:
Status: ASSIGNED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Build
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.11.z
Assignee: Adam Kaplan
QA Contact: Hongkai Liu
URL:
Whiteboard:
Depends On: 1689061
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-18 17:30 UTC by Adam Kaplan
Modified: 2019-04-15 18:37 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: if a build pod was evicted, build reported a GenericBuildFailure Consequence: cluster administrators could not determine why builds failed if the node was under resource pressure Fix: new failure reason `BuildPodEvicted` added Result: builds that fail due to pod eviction report `BuildPodEvicted` in their status reason
Clone Of: 1689061
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)

Description Adam Kaplan 2019-03-18 17:30:10 UTC
+++ This bug was initially created as a clone of Bug #1689061 +++

Builds should get a failure reason for evicted pods and not be GenericBuildFailure.  On 3.11 api.ci we see evictions frequently and they are hard to debug. Builds should report a reason for eviction.

Request backport to origin 3.11 so we can get this on api.ci.

---

apiVersion: build.openshift.io/v1
kind: Build
metadata:
  annotations:
    ci.openshift.io/job-spec: '{"type":"postsubmit","job":"branch-ci-openshift-release-controller-master-images","buildid":"53","prowjobid":"bc04d182-46d5-11e9-b760-0a58ac10b13f","refs":{"org":"openshift","repo":"release-controller","base_ref":"master","base_sha":"253573b4cccb254de4bdd621499bdc30c2769c29","base_link":"https://github.com/openshift/release-controller/compare/4375b4ac2e8d...253573b4cccb"}}'
    openshift.io/build.pod-name: release-controller-build
  creationTimestamp: 2019-03-15T03:53:29Z
  labels:
    build-id: "53"
    created-by-ci: "true"
    creates: release-controller
    job: branch-ci-openshift-release-controller-master-images
    persists-between-builds: "false"
    prow.k8s.io/id: bc04d182-46d5-11e9-b760-0a58ac10b13f
  name: release-controller
  namespace: ci-op-7g4vf063
  ownerReferences:
  - apiVersion: image.openshift.io/v1
    controller: true
    kind: ImageStream
    name: pipeline
    uid: d435bb30-46d5-11e9-9b95-42010a8e0003
  resourceVersion: "90888834"
  selfLink: /apis/build.openshift.io/v1/namespaces/ci-op-7g4vf063/builds/release-controller
  uid: e5043c52-46d5-11e9-9b95-42010a8e0003
spec:
  nodeSelector: null
  output:
    imageLabels:
    - name: vcs-type
      value: git
    - name: vcs-url
      value: https://github.com/openshift/release-controller
    - name: io.openshift.build.name
    - name: io.openshift.build.namespace
    - name: io.openshift.build.commit.ref
      value: master
    - name: io.openshift.build.source-location
      value: https://github.com/openshift/release-controller
    - name: vcs-ref
      value: 253573b4cccb254de4bdd621499bdc30c2769c29
    - name: io.openshift.build.commit.id
      value: 253573b4cccb254de4bdd621499bdc30c2769c29
    - name: io.openshift.build.commit.message
    - name: io.openshift.build.commit.author
    - name: io.openshift.build.commit.date
    - name: io.openshift.build.source-context-dir
    pushSecret:
      name: builder-dockercfg-k4g2k
    to:
      kind: ImageStreamTag
      name: pipeline:release-controller
      namespace: ci-op-7g4vf063
  postCommit: {}
  resources:
    limits:
      memory: 6Gi
    requests:
      cpu: 100m
      memory: 200Mi
  serviceAccount: builder
  source:
    images:
    - as:
      - "0"
      from:
        kind: ImageStreamTag
        name: pipeline:root
      paths: null
    - as: null
      from:
        kind: ImageStreamTag
        name: pipeline:src
      paths:
      - destinationDir: .
        sourcePath: /go/src/github.com/openshift/release-controller///.
    type: Image
  strategy:
    dockerStrategy:
      forcePull: true
      from:
        kind: ImageStreamTag
        name: pipeline:os
        namespace: ci-op-7g4vf063
      imageOptimizationPolicy: SkipLayers
      noCache: true
    type: Docker
  triggeredBy: null
status:
  completionTimestamp: 2019-03-15T03:53:53Z
  message: Generic Build failure - check logs for details.
  output: {}
  outputDockerImageReference: docker-registry.default.svc:5000/ci-op-7g4vf063/pipeline:release-controller
  phase: Failed
  reason: GenericBuildFailed
  startTimestamp: 2019-03-15T03:53:53Z


---

status:
  message: 'Pod The node was low on resource: [DiskPressure]. '
  phase: Failed
  reason: Evicted
  startTime: 2019-03-15T03:53:53Z

--- Additional comment from Clayton Coleman on 2019-03-15 04:17:58 UTC ---

Also note that's *ALL* the status the pod has, so that may be causing other failures in the build controller.

Comment 1 Adam Kaplan 2019-03-18 18:07:23 UTC
API PR: https://github.com/openshift/api/pull/256

Comment 2 Adam Kaplan 2019-04-02 15:00:38 UTC
Origin PR: https://github.com/openshift/origin/pull/22346

Comment 5 Hongkai Liu 2019-04-11 12:40:10 UTC
Let me give it a shot tomorrow.

Comment 6 Hongkai Liu 2019-04-12 13:28:41 UTC
$ git tag  --contains 29cde93
[origin]$ git log --oneline 29cde93..HEAD
9b1e77773a (HEAD -> release-3.11, origin/release-3.11) Merge pull request #22443 from danwinship/sync-inuse-vnids-on-restart-3.11
c137ed0d25 Merge pull request #22397 from jcantrill/1676720
6f59b4eb4c Fix reinitialization of NetworkPolicy state on restart
a2aa67a169 Initialize NetworkPolicy which-namespaces-are-in-use properly on restart
a8f6aec707 Clean up NetworkPolicies on NetNamespace deletion
03b5b9e76a bug 1676720. Check clusterlogging curator for cronjob instead of DC

No 3.11 puddle contains the fix yet.

Comment 7 Hongkai Liu 2019-04-12 13:41:12 UTC
Sorry my bad ... checking ose repo now

Comment 8 Hongkai Liu 2019-04-12 13:42:10 UTC
[hongkliu@MiWiFi-R1CM-srv ose]$ git tag  --contains 29cde93
v3.11.104-1
v3.11.105-1

Comment 9 Hongkai Liu 2019-04-12 16:02:41 UTC
Still saw `GenericBuildFailed`

Every 6.0s: oc get build -n testproject                                                                                                                                             Fri Apr 12 16:01:47 2019

NAME           TYPE      FROM          STATUS                        STARTED             DURATION
django-ex-7    Source    Git@0905223   Complete                      About an hour ago   1m16s
django-ex-8    Source    Git@0905223   Complete                      About an hour ago   1m12s
django-ex-9    Source    Git@0905223   Complete                      44 minutes ago      1m38s
django-ex-10   Source    Git@0905223   Failed (GenericBuildFailed)   41 minutes ago      2m6s
django-ex-12   Source    Git@0905223   Complete                      31 minutes ago      1m16s
django-ex-14   Source    Git           Failed (GenericBuildFailed)   22 minutes ago      40s
django-ex-15   Source    Git@0905223   Complete                      19 minutes ago      1m1s
django-ex-16   Source    Git@0905223   Failed (GenericBuildFailed)   18 minutes ago      53s

Comment 11 Hongkai Liu 2019-04-12 16:12:02 UTC
Only django-ex-10 and django-ex-16 are relevant to disk pressure.
django-ex-14 is something else.

Comment 12 Clayton Coleman 2019-04-15 14:23:56 UTC
Not all evictions are reported to the pod (which is what the build controller uses).  When reproducing eviction related issues, always include the pod yaml of the build pod.

Comment 13 Hongkai Liu 2019-04-15 18:37:31 UTC
Sorry ... did not know the requirement of pod yaml.

A. If it is for the pod definition, then the build is trigger by the bc created by `oc new-app centos/python-35-centos7~https://github.com/sclorg/django-ex`.
B. If it is for the pod status, then I have to redo the test.

@Clayton, Let me know if it is Case B above. Thanks.


Note You need to log in before you can comment on or make changes to this bug.