Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1515915 - Fluentd unable to get logs from container using Storage=volatile
Summary: Fluentd unable to get logs from container using Storage=volatile
Keywords:
Status: ASSIGNED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.6.0
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
: 3.6.z
Assignee: Rich Megginson
QA Contact: Anping Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-21 15:33 UTC by Ruben Romero Montes
Modified: 2019-03-30 07:07 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:


Attachments (Terms of Use)

Description Ruben Romero Montes 2017-11-21 15:33:15 UTC
Description of problem:
Following the documentation for `Systemd-journald and rsyslog`[1] causes fluentd pods to not be forwarded to ES.
It seems the use of `Storage=volatile` is preventing Fluentd from gathering logs from the right place.

[1] https://docs.openshift.com/container-platform/3.6/install_config/aggregate_logging_sizing.html#install-config-aggregate-logging-sizing-guidelines-rate-limiting

Version-Release number of selected component (if applicable):
openshift3-logging-fluentd-v3.6.173.0.49-4

How reproducible:
Only on customer side

Steps to Reproduce:
1. Customize rsyslog.conf and journald.conf following documentation
2. Restart services
3.

Actual results:
`oc logs $pod` -> shows container logs but logs are not forwarded to ES

Expected results:
Logs to be also forwarded to ES by Fluentd

Additional info:
working example

$ cat /etc/systemd/journald.conf.d/journald-openshift.conf

[Journal]
RateLimitInterval=1s
RateLimitBurst=10000
Compress=no
MaxRetentionSec=5s



non working example :
$ cat /etc/systemd/journald.conf.d/journald-openshift.conf

[Journal]
RateLimitInterval=1s
RateLimitBurst=10000
Storage=volatile
Compress=no
MaxRetentionSec=5s

Comment 2 Rich Megginson 2017-11-21 16:11:11 UTC
please provide the following:

ls -alrtF /var/log/journal

ls -alrtF /run/log/journal

Comment 3 Ruben Romero Montes 2017-11-22 08:35:49 UTC
The prod cluster suffers from log suppression issue as there are more projects and pods. These servers are for testing and to try settings there before implementing in prod.

Removing Storage=volatile from suggested configuration solves problem of the "suppressed messages" but we need to know why is not working with this flag.

Will provide the output of the commands as soon as they're available.

Comment 5 Rich Megginson 2017-11-27 15:40:49 UTC
it should not matter to fluentd - fluentd looks in /var/log/journal if present, or /run/log/journal if not
is docker configured with --log-driver=journald?
if so, you can take a look at the /var/log/journal.pos file that fluentd creates to track its position in the journal
you can see if that file exists, and if it is being updated
another thing you can try - oc set env ds/logging-fluentd DEBUG=true VERBOSE=true
that will allow you to trace the fluentd run.sh script to see if it is not looking for the journal in the right place
that will also dump the fluentd log to /var/log/fluentd.log instead of to the default pod output, so you will need to look at both the output of oc logs $fluentd_pod and the fluentd.log file

Comment 6 Rich Megginson 2017-11-27 16:29:37 UTC
>man journald.conf

       Storage=
           Controls where to store journal data. One of "volatile",
           "persistent", "auto" and "none". If "volatile", journal log data
           will be stored only in memory, i.e. below the /run/log/journal
           hierarchy (which is created if needed).

So they have configured Storage=volatile, but there is no data in /run/log/journal.  That's wrong.

Comment 7 Rich Megginson 2017-11-27 16:30:36 UTC
(In reply to Rich Megginson from comment #6)
> >man journald.conf
> 
>        Storage=
>            Controls where to store journal data. One of "volatile",
>            "persistent", "auto" and "none". If "volatile", journal log data
>            will be stored only in memory, i.e. below the /run/log/journal
>            hierarchy (which is created if needed).
> 
> So they have configured Storage=volatile, but there is no data in
> /run/log/journal.  That's wrong.

But it shouldn't matter to fluentd - fluentd does not look at this setting, it only looks at /var/log/journal and /run/log/journal

Comment 10 Peter Portante 2017-12-18 18:46:44 UTC
Don't the journald APIs handled reading from /var/log/journal vs /run/log/journal?  See https://www.freedesktop.org/software/systemd/man/sd_journal_open.html

Comment 11 Rich Megginson 2017-12-18 23:34:41 UTC
(In reply to Peter Portante from comment #10)
> Don't the journald APIs handled reading from /var/log/journal vs
> /run/log/journal?  See
> https://www.freedesktop.org/software/systemd/man/sd_journal_open.html

Yes, but take a look at fluent-plugin-systemd and underlying ruby code :P

Comment 12 Rich Megginson 2018-01-04 03:18:58 UTC
(In reply to Rich Megginson from comment #11)
> (In reply to Peter Portante from comment #10)
> > Don't the journald APIs handled reading from /var/log/journal vs
> > /run/log/journal?  See
> > https://www.freedesktop.org/software/systemd/man/sd_journal_open.html
> 
> Yes, but take a look at fluent-plugin-systemd and underlying ruby code :P

Specifically: https://github.com/reevoo/fluent-plugin-systemd/blob/master/lib/fluent/plugin/in_systemd.rb#L66

 config_param :path, :string, default: "/var/log/journal"
 ...
 @journal = Systemd::Journal.new(path: @path)

It always uses the `path` argument to the constructor, and `path` always has a value.  This means it calls this: https://github.com/ledbettj/systemd-journal/blob/master/lib/systemd/journal.rb#L33


    # @option opts [String] :path if provided, open the journal files living
    #   in the provided directory only.  Any provided flags will be ignored
    #   since sd_journal_open_directory does not currently accept any flags.

We need to change fluent-plugin-systemd:
- path is an optional argument
- if no path is provided, call Systemd::Journal.new() which will in turn call sd_journal_open instead of sd_journal_open_directory

Comment 17 Marc Jadoul 2018-10-02 09:55:23 UTC
With this info I finally found one of our own issue.....
We have large node.
We effectively configured "volatile" as storage for journald. But it was persistent  initially.

Thus /var/log/journal directory was existing.... and therefore fluentd was not looking in /run/log/journal.

After deleting /var/log/journal directory, logs started arriving in elasticsearch....

Might be usefull to clarify in doc (https://docs.openshift.com/container-platform/3.7/install_config/aggregate_logging_sizing.html)

Comment 18 Jeff Cantrill 2018-10-02 12:21:11 UTC
Rich should we move this to a docs bug?  Or maybe we should just mount and follow both dirs?  Or does our recent changes mitigate this issue?

Comment 19 Rich Megginson 2018-10-02 14:59:26 UTC
(In reply to Jeff Cantrill from comment #18)
> Rich should we move this to a docs bug?

We should open a docs bug and keep this one.  We also need to document that you can change the JOURNAL_SOURCE env. var. in the logging-fluentd daemonset to force fluentd to use /var/log/journal or /run/log/journal.

>Or maybe we should just mount and
> follow both dirs?

Unfortunately, we can't.  The fluent-plugin-systemd is not smart enough because it uses the wrong systemd journald api which requires 1 and only 1 explicit path.  Contrast with rsyslog imjournal which uses the correct api which looks in both /var/log/journal and /run/log/journal and works no matter if using Volatile or Persistent or whatever.  We really need a fix to fluent-plugin-systemd.

> Or does our recent changes mitigate this issue?

One way we could attempt to mitigate this issue is to change the fluentd run.sh script to add some logic like this:

if [ -d /var/log/journal ] && is_not_empty(/var/log/journal) ; then use /var/log/journal

But the problem there is that
- the logic to determine is_not_empty() will be tricky
- the directory may be empty if new or recently purged

So the only way to really mitigate would be to mount /etc/systemd/journald.conf into the container and parse the config file . . . which is a lot of work to do something which should really be done in fluent-plugin-systemd


Note You need to log in before you can comment on or make changes to this bug.