Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1517605 - [RFE] logging Support crio [NEEDINFO]
Summary: [RFE] logging Support crio
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.7.0
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: 3.9.0
Assignee: Jan Wozniak
QA Contact: Anping Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-27 04:39 UTC by Anping Li
Modified: 2018-03-28 14:13 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Added a parser for cri-o formatted logs.
Clone Of:
Environment:
Last Closed: 2018-03-28 14:13:03 UTC
jcantril: needinfo? (pweil)


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0489 None None None 2018-03-28 14:13:48 UTC
Github openshift openshift-ansible pull 7102 None None None 2018-02-16 18:16:42 UTC
Github openshift origin-aggregated-logging pull 949 None None None 2018-02-13 18:26:54 UTC

Description Anping Li 2017-11-27 04:39:43 UTC
Description of problem:
Seems the logging system don't support crio. there are mainly three issues in crio.
1) The container log are written to /var/log/containers/. For docker daemon, the format is json file. But for crio,  It use rsyslog format.

For example:

2017-11-26T20:29:53.149616443-05:00 stderr I1127 01:29:53.149602       1 round_trippers.go:436] POST https://172.30.0.1:443/apis/authorization.k8s.io/v1beta1/subjectaccessreviews 201 Created in 0 milliseconds
2017-11-26T20:29:53.149668334-05:00 stderr I1127 01:29:53.149661       1 round_trippers.go:442] Response Headers:
2017-11-26T20:29:53.149694943-05:00 stderr I1127 01:29:53.149688       1 round_trippers.go:445]     Cache-Control: no-store
2017-11-26T20:29:53.149722042-05:00 stderr I1127 01:29:53.149710       1 round_trippers.go:445]     Content-Type: application/json
2017-11-26T20:29:53.149749783-05:00 stderr I1127 01:29:53.149743       1 round_trippers.go:445]     Content-Length: 538
2017-11-26T20:29:53.149770970-05:00 stderr I1127 01:29:53.149765       1 round_trippers.go:445]     Date: Mon, 27 Nov 2017 01:29:53 GMT
2017-11-26T20:29:53.149804101-05:00 stderr I1127 01:29:53.149796       1 request.go:836] Response Body: {"kind":"SubjectAccessReview","apiVersion":"authorization.k8s.io/v1beta1","metadata":{"creationTimestamp":null},"spec":{"resourceAttributes":{"verb":"update","group":"servicecatalog.k8s.io","version":"v1beta1","resource":"clusterserviceclasses","name":"5247e02c-d30c-11e7-aaad-fa163e4d160c"},"user":"system:serviceaccount:kube-service-catalog:service-catalog-controller","group":["system:serviceaccounts","system:serviceaccounts:kube-service-catalog","system:authenticated"]},"status":{"allowed":true,"reason":"allowed by cluster rule"}}


2) kibana failed to connect to  Elasticsearch, it report "Unable to connect to Elasticsearch at https://localhost:9200. "
3) curator was restarted many times, i think it couldn't connect to Elasticsearch too.


Version-Release number of selected component (if applicable):
openshift-ansible-3.7.9-1.git.4.d445616.el7.noarch

How reproducible:
always

Steps to Reproduce:
1. install OCP-3.7 with crio
openshift_use_crio=true
openshift_crio_systemcontainer_image_override=registry.access.xxx.redhat.com/openshift3/cri-o:v3.7

2. deploy logging

3. Check the fluent, elastic-search and Kibana.

Actual results:
 1) fluentd print noisy messege
 2) fluentd use docker configure file
 3) the container logs can't be collected.
 4) the kibana couldn't connected to elastic search
 5) curator are restarted many times

Expected results:
Both system and container logs can be collected

Additional info:

Comment 1 Jan Wozniak 2017-12-01 14:24:36 UTC
I am able to reproduce and tried to do a little bit more troubleshooting. There appear to be two issues as correctly observed by QE

1) not overriding image environment variables from kubernetes
- default for an environment variable "ES_HOST=localhost" is in the image [1]
- kubernetes DC overrides the value [2]
- the override is not correctly propagated to the environment of the container

And when I tried to not provide the default "ES_HOST" in the image, kubernetes was able to set the env variable correctly through a DC

2) default logging format is 'text' and the cri-o system container doesn't have it configurable
- https://www.mankier.com/8/crio#--log-format allows to set the log format to 'json'
- but it appears to work only for the 'crio' daemon logs, not the container logs

I think both could be potentially issues rather with crio than with logging alone as I think they go beyond logging


[1] https://github.com/openshift/origin-aggregated-logging/blob/master/curator/Dockerfile.centos7#L8

[2] https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_logging_curator/templates/curator.j2#L68-L69

Comment 2 Frantisek Kluknavsky 2017-12-01 15:37:51 UTC
By grepping the code of cri-o in rhel 7.4, it does not seem to pass the --log-format flag to runc.

The only code touching log-format is:
368         switch c.GlobalString("log-format") {
369         case "text":
370             // retain logrus's default.
371         case "json":
372             logrus.SetFormatter(new(logrus.JSONFormatter))
373         default:
374             return fmt.Errorf("unknown log-format %q", c.GlobalString("log-format"))
375         }

Comment 3 Jan Wozniak 2017-12-04 16:34:41 UTC
There is a way to make our fluentd pipeline able to parse cri-o logs. A workaround until the cri-o container logs respect the --log-format command line option is described in:

https://trello.com/c/ktGIxQGf/585-5-online-crio-fluentd-understands-the-cri-log-format-loggingepic-crio

https://github.com/kubernetes/kubernetes/pull/54777

Comment 6 Jan Wozniak 2018-01-31 08:14:30 UTC
cri-o team is investigating the non-propagating of env variables
https://github.com/kubernetes-incubator/cri-o/issues/1293

Comment 7 Jeff Cantrill 2018-01-31 18:24:20 UTC
Ref changes from upstream: https://github.com/kubernetes/kubernetes/commit/70a0cdfa8e05ac47d7dd04b032ceb79bead3fb5f

Comment 9 Anping Li 2018-02-28 11:10:10 UTC
The crio works with logging-fluentd/images/v3.9.1

Comment 11 Anping Li 2018-03-05 02:00:36 UTC
Moved to verified. The crio container logs can be collected. So no block on logging for our test/release now.

Comment 14 errata-xmlrpc 2018-03-28 14:13:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489


Note You need to log in before you can comment on or make changes to this bug.