Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1366496 - 1 mkdir generates tons of log messages from dht xlator
Summary: 1 mkdir generates tons of log messages from dht xlator
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: distribute
Version: 3.8
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
Assignee: Nithya Balachandran
QA Contact:
URL:
Whiteboard: dht-log, dht-qe-3.2, dht-3.2.0-proposed
Depends On: 1215816 1258267 1366495
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-12 06:45 UTC by Nithya Balachandran
Modified: 2016-09-16 18:27 UTC (History)
20 users (show)

Fixed In Version: glusterfs-3.8.4
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1366495
Environment:
Last Closed: 2016-09-16 18:27:59 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Nithya Balachandran 2016-08-12 06:45:56 UTC
+++ This bug was initially created as a clone of Bug #1366495 +++

+++ This bug was initially created as a clone of Bug #1258267 +++

+++ This bug was initially created as a clone of Bug #1215816 +++

Description of problem:

a single mkdir command in a Gluster mountpoint generates all the below log messages, just think of what untar would do!

This should be assigned to the disperse translator but I can't find it in list.

Version-Release number of selected component (if applicable):

RHS 3.1 - glusterfs-3.7dev-0.1017.git7fb85e3.el6.x86_64
RHEL6.6 - kernel-2.6.32-504.el6.x86_64

How reproducible:

every time

Steps to Reproduce:
1. create distributed-disperse volume shown below
2. mount it and do "tail -f /var/log/glusterfs/your-mountpoint.log
3. mkdir /your/mountpoint/d

Actual results:

See log messages below, and imagine untarring a large tarball.  excessive logging slows down Gluster and can fill system disk leading to damage to volume

Why is default log level INFO instead of WARNING?  

ec translator:
- Why do we get "MIsmatching xdata in answers of 'LOOKUP'?
- Why do we get operation failed on some subvolumes?  Why isn't this an ERROR?
- Why do we get Mismatching dictionary in answers of 'GF_FOP_XATTROP'?  why isn't this a WARNING?  what is log level 'N' ? 

dht translator:
- why does it have to tell us directory layout?  If anyone is interested they can read it from the xattr.

Expected results:

NOTHING!  Gluster should not be normally be logging events that are commonplace and do not represent a significant failure or state change, unless the user has requested it to do so by raising the log level.

Additional info:

log messages generated from 1 mkdir command are:

[2015-04-27 19:45:14.177042] W [ec-combine.c:867:ec_combine_check] 0-ec42-disperse-0: Mismatching xdata in answers of 'LOOKUP'
[2015-04-27 19:45:14.177282] W [ec-common.c:162:ec_check_status] 0-ec42-disperse-0: Operation failed on some subvolumes (up=3F, mask=3F, remaining=0, good=2F, bad=10)
[2015-04-27 19:45:14.177324] I [ec-heal.c:495:ec_heal_init] 0-ec: Healing '/', gfid 00000000-0000-0000-0000-000000000001
[2015-04-27 19:45:14.177281] W [ec-combine.c:867:ec_combine_check] 0-ec42-disperse-1: Mismatching xdata in answers of 'LOOKUP'
[2015-04-27 19:45:14.177373] W [ec-common.c:162:ec_check_status] 0-ec42-disperse-1: Operation failed on some subvolumes (up=3F, mask=3F, remaining=0, good=1F, bad=20)
[2015-04-27 19:45:14.177406] I [ec-heal.c:495:ec_heal_init] 0-ec: Healing '/', gfid 00000000-0000-0000-0000-000000000001
[2015-04-27 19:45:14.178187] W [ec-combine.c:867:ec_combine_check] 0-ec42-disperse-0: Mismatching xdata in answers of 'LOOKUP'
[2015-04-27 19:45:14.178316] W [ec-combine.c:867:ec_combine_check] 0-ec42-disperse-1: Mismatching xdata in answers of 'LOOKUP'
[2015-04-27 19:45:14.178657] W [ec-combine.c:867:ec_combine_check] 0-ec42-disperse-11: Mismatching xdata in answers of 'LOOKUP'
[2015-04-27 19:45:14.178745] W [ec-common.c:162:ec_check_status] 0-ec42-disperse-11: Operation failed on some subvolumes (up=3F, mask=3F, remaining=0, good=2F, bad=10)
[2015-04-27 19:45:14.178787] I [ec-heal.c:495:ec_heal_init] 0-ec: Healing '/', gfid 00000000-0000-0000-0000-000000000001
[2015-04-27 19:45:14.179487] W [ec-combine.c:867:ec_combine_check] 0-ec42-disperse-11: Mismatching xdata in answers of 'LOOKUP'
[2015-04-27 19:45:14.234873] I [dht-selfheal.c:1587:dht_selfheal_layout_new_directory] 0-ec42-dht: chunk size = 0xffffffff / 43428960 = 0x62
[2015-04-27 19:45:14.234906] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-ec42-dht: assigning range size 0x1523d510 to ec42-disperse-9
[2015-04-27 19:45:14.234921] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-ec42-dht: assigning range size 0x1523d510 to ec42-disperse-0
[2015-04-27 19:45:14.234934] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-ec42-dht: assigning range size 0x1523d510 to ec42-disperse-1
[2015-04-27 19:45:14.234948] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-ec42-dht: assigning range size 0x1523d510 to ec42-disperse-10
[2015-04-27 19:45:14.234963] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-ec42-dht: assigning range size 0x1523d510 to ec42-disperse-11
[2015-04-27 19:45:14.234970] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-ec42-dht: assigning range size 0x1523d510 to ec42-disperse-2
[2015-04-27 19:45:14.234978] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-ec42-dht: assigning range size 0x1523d510 to ec42-disperse-3
[2015-04-27 19:45:14.234986] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-ec42-dht: assigning range size 0x1523d510 to ec42-disperse-4
[2015-04-27 19:45:14.234993] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-ec42-dht: assigning range size 0x1523d510 to ec42-disperse-5
[2015-04-27 19:45:14.235002] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-ec42-dht: assigning range size 0x1523d510 to ec42-disperse-6
[2015-04-27 19:45:14.235009] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-ec42-dht: assigning range size 0x1523d510 to ec42-disperse-7
[2015-04-27 19:45:14.235016] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-ec42-dht: assigning range size 0x1523d510 to ec42-disperse-8
[2015-04-27 19:45:15.718230] N [ec-generic.c:1303:ec_combine_xattrop] 0-ec42-disperse-0: Mismatching dictionary in answers of 'GF_FOP_XATTROP'
[2015-04-27 19:45:15.718811] N [ec-generic.c:1303:ec_combine_xattrop] 0-ec42-disperse-1: Mismatching dictionary in answers of 'GF_FOP_XATTROP'
[2015-04-27 19:45:15.718846] N [ec-generic.c:1303:ec_combine_xattrop] 0-ec42-disperse-11: Mismatching dictionary in answers of 'GF_FOP_XATTROP'
[2015-04-27 19:45:14.241533] I [MSGID: 109036] [dht-common.c:6476:dht_log_new_layout_for_dir_selfheal] 0-ec42-dht: Setting layout of /ben with [Subvol_name: ec42-disperse-0, Err: -1 , Start: 354669840 , Stop: 709339679 ], [Subvol_name: ec42-disperse-1, Err: -1 , Start: 709339680 , Stop: 1064009519 ], [Subvol_name: ec42-disperse-10, Err: -1 , Start: 1064009520 , Stop: 1418679359 ], [Subvol_name: ec42-disperse-11, Err: -1 , Start: 1418679360 , Stop: 1773349199 ], [Subvol_name: ec42-disperse-2, Err: -1 , Start: 1773349200 , Stop: 2128019039 ], [Subvol_name: ec42-disperse-3, Err: -1 , Start: 2128019040 , Stop: 2482688879 ], [Subvol_name: ec42-disperse-4, Err: -1 , Start: 2482688880 , Stop: 2837358719 ], [Subvol_name: ec42-disperse-5, Err: -1 , Start: 2837358720 , Stop: 3192028559 ], [Subvol_name: ec42-disperse-6, Err: -1 , Start: 3192028560 , Stop: 3546698399 ], [Subvol_name: ec42-disperse-7, Err: -1 , Start: 3546698400 , Stop: 3901368239 ], [Subvol_name: ec42-disperse-8, Err: -1 , Start: 3901368240 , Stop: 4294967295 ], [Subvol_name: ec42-disperse-9, Err: -1 , Start: 0 , Stop: 354669839 ], 

The mountpoint process is:

root     28619  0.0  0.3 1155156 193956 ?      Ssl  Apr26   0:09 /usr/sbin/glusterfs --volfile-server=gprfs022-10ge --volfile-id=/ec42 /mnt/ec42

The volume is:

[root@gprfs024 ~]# gluster v i
 
Volume Name: ec42
Type: Distributed-Disperse
Volume ID: 61ffe960-77b7-4f4d-9988-2fc3ca65b561
Status: Started
Number of Bricks: 12 x (4 + 2) = 72
Transport-type: tcp
Bricks:
Brick1: gprfs022-10ge:/bricks/b01/g
Brick2: gprfs024-10ge:/bricks/b01/g
Brick3: gprfs037-10ge:/bricks/b01/g
Brick4: gprfs038-10ge:/bricks/b01/g
Brick5: gprfs039-10ge:/bricks/b01/g
Brick6: gprfs040-10ge:/bricks/b01/g
Brick7: gprfs022-10ge:/bricks/b02/g
Brick8: gprfs024-10ge:/bricks/b02/g
Brick9: gprfs037-10ge:/bricks/b02/g
Brick10: gprfs038-10ge:/bricks/b02/g
Brick11: gprfs039-10ge:/bricks/b02/g
Brick12: gprfs040-10ge:/bricks/b02/g
Brick13: gprfs022-10ge:/bricks/b03/g
Brick14: gprfs024-10ge:/bricks/b03/g
Brick15: gprfs037-10ge:/bricks/b03/g
Brick16: gprfs038-10ge:/bricks/b03/g
Brick17: gprfs039-10ge:/bricks/b03/g
Brick18: gprfs040-10ge:/bricks/b03/g
Brick19: gprfs022-10ge:/bricks/b04/g
Brick20: gprfs024-10ge:/bricks/b04/g
Brick21: gprfs037-10ge:/bricks/b04/g
Brick22: gprfs038-10ge:/bricks/b04/g
Brick23: gprfs039-10ge:/bricks/b04/g
Brick24: gprfs040-10ge:/bricks/b04/g
Brick25: gprfs022-10ge:/bricks/b05/g
Brick26: gprfs024-10ge:/bricks/b05/g
Brick27: gprfs037-10ge:/bricks/b05/g
Brick28: gprfs038-10ge:/bricks/b05/g
Brick29: gprfs039-10ge:/bricks/b05/g
Brick30: gprfs040-10ge:/bricks/b05/g
Brick31: gprfs022-10ge:/bricks/b06/g
Brick32: gprfs024-10ge:/bricks/b06/g
Brick33: gprfs037-10ge:/bricks/b06/g
Brick34: gprfs038-10ge:/bricks/b06/g
Brick35: gprfs039-10ge:/bricks/b06/g
Brick36: gprfs040-10ge:/bricks/b06/g
Brick37: gprfs022-10ge:/bricks/b07/g
Brick38: gprfs024-10ge:/bricks/b07/g
Brick39: gprfs037-10ge:/bricks/b07/g
Brick40: gprfs038-10ge:/bricks/b07/g
Brick41: gprfs039-10ge:/bricks/b07/g
Brick42: gprfs040-10ge:/bricks/b07/g
Brick43: gprfs022-10ge:/bricks/b08/g
Brick44: gprfs024-10ge:/bricks/b08/g
Brick45: gprfs037-10ge:/bricks/b08/g
Brick46: gprfs038-10ge:/bricks/b08/g
Brick47: gprfs039-10ge:/bricks/b08/g
Brick48: gprfs040-10ge:/bricks/b08/g
Brick49: gprfs022-10ge:/bricks/b09/g
Brick50: gprfs024-10ge:/bricks/b09/g
Brick51: gprfs037-10ge:/bricks/b09/g
Brick52: gprfs038-10ge:/bricks/b09/g
Brick53: gprfs039-10ge:/bricks/b09/g
Brick54: gprfs040-10ge:/bricks/b09/g
Brick55: gprfs022-10ge:/bricks/b10/g
Brick56: gprfs024-10ge:/bricks/b10/g
Brick57: gprfs037-10ge:/bricks/b10/g
Brick58: gprfs038-10ge:/bricks/b10/g
Brick59: gprfs039-10ge:/bricks/b10/g
Brick60: gprfs040-10ge:/bricks/b10/g
Brick61: gprfs022-10ge:/bricks/b11/g
Brick62: gprfs024-10ge:/bricks/b11/g
Brick63: gprfs037-10ge:/bricks/b11/g
Brick64: gprfs038-10ge:/bricks/b11/g
Brick65: gprfs039-10ge:/bricks/b11/g
Brick66: gprfs040-10ge:/bricks/b11/g
Brick67: gprfs022-10ge:/bricks/b12/g
Brick68: gprfs024-10ge:/bricks/b12/g
Brick69: gprfs037-10ge:/bricks/b12/g
Brick70: gprfs038-10ge:/bricks/b12/g
Brick71: gprfs039-10ge:/bricks/b12/g
Brick72: gprfs040-10ge:/bricks/b12/g
Options Reconfigured:
cluster.lookup-unhashed: off
client.event-threads: 8
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on

--- Additional comment from RHEL Product and Program Management on 2015-04-27 16:22:40 EDT ---

Since this issue was entered in bugzilla, the release flag has been
set to ? to ensure that it is properly evaluated for this release.

--- Additional comment from Rejy M Cyriac on 2015-08-07 02:58:48 EDT ---

Since this bug has been approved for the z-stream release of Red Hat Gluster Storage 3, through release flag 'rhgs-3.1.z+', and has been marked for RHGS 3.1 Update 1 release through the Internal Whiteboard entry of '3.1.1', the Target Release is being set to 'RHGS 3.1.1'

--- Additional comment from Rejy M Cyriac on 2015-08-10 00:12:30 EDT ---

Attached to RHGS 3.1 Update 1 (z-stream) Tracker BZ

--- Additional comment from Ashish Pandey on 2015-08-19 06:14:42 EDT ---

Fix for this bug  is present in following patch  - 

https://code.engineering.redhat.com/gerrit/55582

--- Additional comment from Pranith Kumar K on 2015-08-30 14:48:55 EDT ---

All ec logs in the common code path are not appearing. Raising a new bug for dht.

--- Additional comment from Red Hat Bugzilla Rules Engine on 2015-08-30 14:49:27 EDT ---

This bug is automatically being proposed for the current z-stream release of Red Hat Gluster Storage 3 by setting the release flag 'rhgs‑3.1.z' to '?'. 

If this bug should be proposed for a different release, please manually change the proposed release flag.

--- Additional comment from Red Hat Bugzilla Rules Engine on 2015-08-30 14:49:27 EDT ---

Since this bug does not have release flag 'rhgs-3.1.z+', the Target Release is being automatically reset to '---'

--- Additional comment from Sakshi on 2015-10-30 03:41:04 EDT ---

Of the 2 dht related logs mentioned above:
1) Log in this function dht_log_new_layout_for_dir_selfheal is important and if the directory has been healed its new layout must be logged
2) Log in the function dht_selfheal_layout_new_directory has debug mode from 3.1 and I suppose should not be removed.

--- Additional comment from Pranith Kumar K on 2016-06-03 03:21:58 EDT ---

Sakshi,
        Ashish worked on this logging patch and I believe they are merged now. So this shouldn't happen.

Pranith

--- Additional comment from Nithya Balachandran on 2016-08-01 04:44:26 EDT ---

The following message should have their log levels changed to DEBUG:



In dht_fix_layout_of_directory ():
gf_msg (this->name, GF_LOG_INFO, 0,
                                DHT_MSG_SUBVOL_INFO,
                                "subvolume %d (%s): %u chunks", i,
                                priv->subvolumes[i]->name,
                                priv->du_stats[i].chunks);


This message alone took up about 2GB of a 2.9GB rebalance log file.

--- Additional comment from Red Hat Bugzilla Rules Engine on 2016-08-09 07:17:39 EDT ---

Since this bug has been approved for the RHGS 3.2.0 release of Red Hat Gluster Storage 3, through release flag 'rhgs-3.2.0+', and through the Internal Whiteboard entry of '3.2.0', the Target Release is being automatically set to 'RHGS 3.2.0'

Comment 1 Worker Ant 2016-09-07 03:54:48 UTC
REVIEW: http://review.gluster.org/15413 (cluster/dht: move layout logs to DEBUG level) posted (#1) for review on release-3.8 by Susant Palai (spalai@redhat.com)

Comment 2 Worker Ant 2016-09-07 14:57:20 UTC
COMMIT: http://review.gluster.org/15413 committed in release-3.8 by Niels de Vos (ndevos@redhat.com) 
------
commit 2fdfe1a6dcde741af1215b738689b2fcf822f235
Author: Susant Palai <spalai@redhat.com>
Date:   Wed Sep 7 09:21:12 2016 +0530

    cluster/dht: move layout logs to DEBUG level
    
    > Reviewed-on: http://review.gluster.org/15343
    > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
    > CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
    > Reviewed-by: N Balachandran <nbalacha@redhat.com>
    > Smoke: Gluster Build System <jenkins@build.gluster.org>
    > Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
    > Signed-off-by: Susant Palai <spalai@redhat.com>
    (cherry picked from commit 15c790b502ba92caa17f2d1870c3d75d547e6bad)
    
    Change-Id: Iad96256218be643b272762b5638a3f6837aff28d
    BUG: 1366496
    Signed-off-by: Susant Palai <spalai@redhat.com>
    Reviewed-on: http://review.gluster.org/15413
    Reviewed-by: N Balachandran <nbalacha@redhat.com>
    Smoke: Gluster Build System <jenkins@build.gluster.org>
    NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.org>

Comment 3 Niels de Vos 2016-09-12 05:37:48 UTC
All 3.8.x bugs are now reported against version 3.8 (without .x). For more information, see http://www.gluster.org/pipermail/gluster-devel/2016-September/050859.html

Comment 4 Niels de Vos 2016-09-16 18:27:59 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.4, please open a new bug report.

glusterfs-3.8.4 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://www.gluster.org/pipermail/announce/2016-September/000060.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.