Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1694201 - cifs repeatedly tries to open a file using smb v1 on an smb2 mount after receiving STATUS_SHARING_VIOLATION
Summary: cifs repeatedly tries to open a file using smb v1 on an smb2 mount after rece...
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel
Version: 7.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Ronnie Sahlberg
QA Contact: xiaoli feng
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-29 18:40 UTC by Frank Sorenson
Modified: 2019-04-15 17:49 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-04-11 05:03:21 UTC
Target Upstream Version:


Attachments (Terms of Use)
packet capture (deleted)
2019-03-29 18:40 UTC, Frank Sorenson
no flags Details
patch to prevent falback to smb (deleted)
2019-04-15 17:45 UTC, Frank Sorenson
no flags Details | Diff

Description Frank Sorenson 2019-03-29 18:40:31 UTC
Created attachment 1549585 [details]
packet capture

Description of problem:

On an smb2 mount, if an attempt to open a file fails with STATUS_SHARING_VIOLATION, the client will retry the open operation with an smb v1 'NT Create AndX'.

This results in the server closing the TCP connection.  The client then re-establishes the tcp connection, renegotiates smb2, sets up a new session, connects to the tree, and retries the smb v1 operation.  The server again closes the connection, and the client loops continually.

The application initiating the access is blocked.



Version-Release number of selected component (if applicable):

tested with several RHEL7 kernels, including:
  3.10.0-862.9.1.el7.x86_64
  3.10.0-1006.el7.x86_64 (nightly)


How reproducible:

easy, see below

Steps to Reproduce:

in terminal 1:
    # mount -ocredentials=/root/.user1_smb_creds,vers=2.0 //vm3/user1 /mnt/vm3
    # touch /mnt/vm3/testfile

    # smbclient -A /root/.user1_smb_creds //vm3/user1
    Try "help" to get a list of possible commands.
    smb: \> open testfile
    open file \testfile: for read/write fnum 1

in terminal 2:
    # mv /mnt/vm3/testfile /mnt/vm3/testfile.bak


Actual results:

    the client repeatedly attempts an smb v1 operation over smb2
    the 'mv' will never complete, either successfully or with an error


Expected results:

    the client only uses valid operations (not smb calls over smb2)
    the 'mv' returns an error (presumably EBUSY)


Additional info:

from the packet capture:

  vm3 - server
  vm7 - client

hosts: 
192.168.122.71  vm3
192.168.122.60  vm7

tshark -H hosts -2 -r trace.pcap.gz

smb2 open fails with STATUS_SHARING_VIOLATION
  120 52.420169670          vm7 → vm3          SMB2 214 Create Request File: testfile
  121 52.420736516          vm3 → vm7          SMB2 143 Create Response, Error: STATUS_SHARING_VIOLATION

client retries opening with smb 'NT Create AndX'
  122 52.421352572          vm7 → vm3          SMB 163 NT Create AndX Request, Path: \testfile

server disconnects the client
  123 52.435305763          vm3 → vm7          TCP 66 445 → 39224 [FIN, ACK] Seq=346440821 Ack=2786851610 Win=44032 Len=0 TSval=2420834511 TSecr=1547066566
  124 52.435789451          vm7 → vm3          TCP 66 39224 → 445 [FIN, ACK] Seq=2786851610 Ack=346440822 Win=48512 Len=0 TSval=1547066581 TSecr=2420834511
  125 52.435812384          vm3 → vm7          TCP 66 445 → 39224 [ACK] Seq=346440822 Ack=2786851611 Win=44032 Len=0 TSval=2420834512 TSecr=1547066581


client does tcp reconnect, negotiate smb2, session setup, tree connect

  126 52.436150145          vm7 → vm3          TCP 74 39228 → 445 [SYN] Seq=3370372604 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=1547066581 TSecr=0 WS=128
  127 52.436193928          vm3 → vm7          TCP 74 445 → 39228 [SYN, ACK] Seq=71852149 Ack=3370372605 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=2420834512 TSecr=1547066581 WS=128
  128 52.436588837          vm7 → vm3          TCP 66 39228 → 445 [ACK] Seq=3370372605 Ack=71852150 Win=29312 Len=0 TSval=1547066581 TSecr=2420834512
  129 52.436792226          vm7 → vm3          SMB2 172 Negotiate Protocol Request 
  130 52.436807047          vm3 → vm7          TCP 66 445 → 39228 [ACK] Seq=71852150 Ack=3370372711 Win=29056 Len=0 TSval=2420834513 TSecr=1547066582 
  131 52.441860471          vm3 → vm7          SMB2 272 Negotiate Protocol Response 
  132 52.442298599          vm7 → vm3          TCP 66 39228 → 445 [ACK] Seq=3370372711 Ack=71852356 Win=30336 Len=0 TSval=1547066587 TSecr=2420834518 
  133 52.446443896          vm7 → vm3          SMB2 190 Session Setup Request, NTLMSSP_NEGOTIATE 
  134 52.447394014          vm3 → vm7          SMB2 332 Session Setup Response, Error: STATUS_MORE_PROCESSING_REQUIRED, NTLMSSP_CHALLENGE 
  135 52.448873796          vm7 → vm3          SMB2 422 Session Setup Request, NTLMSSP_AUTH, User: \user1 
  136 52.454091885          vm3 → vm7          SMB2 142 Session Setup Response 
  137 52.454961286          vm7 → vm3          SMB2 166 Tree Connect Request Tree: \\vm3\user1 
  138 52.456747268          vm3 → vm7          SMB2 150 Tree Connect Response 
  139 52.457521077          vm7 → vm3          SMB2 164 Tree Connect Request Tree: \\vm3\IPC$ 
  140 52.458883197          vm3 → vm7          SMB2 150 Tree Connect Response 
  141 52.498909139          vm7 → vm3          TCP 66 39228 → 445 [ACK] Seq=3370373389 Ack=71852866 Win=31360 Len=0 TSval=1547066644 TSecr=2420834535 
  142 54.913057435          vm7 → vm3          SMB2 138 KeepAlive Request 
  143 54.913315594          vm3 → vm7          SMB2 138 KeepAlive Response 
  144 54.913891376          vm7 → vm3          TCP 66 39226 → 445 [ACK] Seq=2374753979 Ack=651874105 Win=33536 Len=0 TSval=1547069059 TSecr=2420836989
  145 59.925659973          vm7 → vm3          SMB2 138 KeepAlive Request
  146 59.925992968          vm3 → vm7          SMB2 138 KeepAlive Response
  147 59.926747160          vm7 → vm3          TCP 66 39226 → 445 [ACK] Seq=2374754051 Ack=651874177 Win=33536 Len=0 TSval=1547074071 TSecr=2420842002


10 seconds after initiating the new tcp connection, the client again tries to open with smb 'NT Create AndX'

  148 62.436158004          vm7 → vm3          SMB 163 NT Create AndX Request, Path: \testfile

server again disconnects
  149 62.444862128          vm3 → vm7          TCP 66 445 → 39228 [FIN, ACK] Seq=71852866 Ack=3370373486 Win=30080 Len=0 TSval=2420844521 TSecr=1547076581
  150 62.445651520          vm7 → vm3          TCP 66 39228 → 445 [FIN, ACK] Seq=3370373486 Ack=71852867 Win=31360 Len=0 TSval=1547076590 TSecr=2420844521

Comment 2 Frank Sorenson 2019-03-29 19:28:32 UTC
the upstream client is also broken, but in a different way.  Instead of an smb request, it appears to send some invalid bytes, followed by an smb request (which may or may not be the same as what rhel 7 sends).  Instead of closing the connection with FIN, the server sends RST.

However, the upstream client still loops forever, with the 'mv' operation hanging (just with different invalid communication).

Comment 3 Frank Sorenson 2019-04-01 22:13:59 UTC
I can confirm that the upstream client does send the same 'NT Create AndX Request', however the NBSS header is repeated, making the payload invalid.

Here is the tcp payload sent by the upstream client:

0000   00 00 00 5d 00 00 00 5d ff 53 4d 42 a2 00 00 00  ...]...].SMB....
0010   00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0020   5e 15 77 09 df 6a 21 03 18 ff 00 00 00 00 0a 00  ^.w..j!.........
0030   00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00  ................
0040   00 00 00 00 80 00 00 00 07 00 00 00 01 00 00 00  ................
0050   40 00 00 00 02 00 00 00 03 0a 00 5c 74 65 73 74  @..........\test
0060   66 69 6c 64 00                                   file.

the first 4 bytes decode as nbss:
NetBIOS Session Service
    Message Type: Session message (0x00)
    Flags: 0x00
    Length: 0x005d

However, note that these 4 bytes are then repeated:
0000   00 00 00 5d 00 00 00 5d

Also, due to the addition of these 4 bytes, the nbss encapsulates 4 fewer bytes at the end, so the filename ends with 'testf', and there are 4 extra bytes of tcp payload at the end.

I modified the frame to get rid of the additional 4 bytes of header, and it decodes as valid SMB again:
3231 97.377091804 192.168.122.150 → 192.168.122.71 SMB 167 NT Create AndX Request, Path: \testfile 

Sending SMB over the SMB2 session is still invalid, however this does explain the difference between RHEL 7 and upstream behavior.  For some reason, in addition to switching from smb2 to smb, it adds a second header.

Comment 4 Frank Sorenson 2019-04-02 12:35:18 UTC
in cifs_do_rename(), we first try to call the smb-version-specific path-based rename function, then later try to open the file using the function CIFS_open()


static int
cifs_do_rename(const unsigned int xid, struct dentry *from_dentry,
               const char *from_path, struct dentry *to_dentry,
               const char *to_path)
{
...

        /* try path-based rename first */
        rc = server->ops->rename(xid, tcon, from_path, to_path, cifs_sb);
...
        /* open the file to be renamed -- we need DELETE perms */
        oparms.desired_access = DELETE;
        oparms.create_options = CREATE_NOT_DIR;
        oparms.disposition = FILE_OPEN;
        oparms.path = from_path;
        oparms.fid = &fid;
        oparms.reconnect = false;

...
        rc = CIFS_open(xid, &oparms, &oplock, NULL);
        if (rc == 0) {
                rc = CIFSSMBRenameOpenFile(xid, tcon, fid.netfid,
                                (const char *) to_dentry->d_name.name,
                                cifs_sb->local_nls, cifs_remap(cifs_sb));
                CIFSSMBClose(xid, tcon, fid.netfid);

However, CIFS_open, CIFSSMBRenameOpenFile, and CIFSSMBClose are all smb1-specific


in CIFS_open():
        rc = smb_init(SMB_COM_NT_CREATE_ANDX, 24, tcon, (void **)&req,

in CIFSSMBRenameOpenFile():
        rc = smb_init(SMB_COM_TRANSACTION2, 15, pTcon, (void **) &pSMB,

in CIFSSMBClose():
        rc = small_smb_init(SMB_COM_CLOSE, 3, tcon, (void **) &pSMB);


So if the 'path-based rename' fails while using a higher smb version, it falls back to using smb v1 calls, which is clearly wrong.



smb1ops.c:struct smb_version_operations smb1_operations = {
smb1ops.c:	.rename_pending_delete = cifs_rename_pending_delete,
smb1ops.c:	.rename = CIFSSMBRename,
smb1ops.c:	.open = cifs_open_file,
smb1ops.c:	.close = cifs_close_file,
smb1ops.c:	.close_dir = cifs_close_dir,

smb2ops.c:struct smb_version_operations smb20_operations = {
smb2ops.c:	.rename = smb2_rename_path,
smb2ops.c:	.open = smb2_open_file,
smb2ops.c:	.close = smb2_close_file,
smb2ops.c:	.close_dir = smb2_close_dir,

smb2ops.c:struct smb_version_operations smb21_operations = {
smb2ops.c:	.rename = smb2_rename_path,
smb2ops.c:	.open = smb2_open_file,
smb2ops.c:	.close = smb2_close_file,
smb2ops.c:	.close_dir = smb2_close_dir,

smb2ops.c:struct smb_version_operations smb30_operations = {
smb2ops.c:	.rename = smb2_rename_path,
smb2ops.c:	.open = smb2_open_file,
smb2ops.c:	.close = smb2_close_file,
smb2ops.c:	.close_dir = smb2_close_dir,

smb2ops.c:struct smb_version_operations smb311_operations = {
smb2ops.c:	.rename = smb2_rename_path,
smb2ops.c:	.open = smb2_open_file,
smb2ops.c:	.close = smb2_close_file,
smb2ops.c:	.close_dir = smb2_close_dir,


so it seems like the open in cifs_do_rename should really look like this:

    rc = server->ops->open(xid, &oparms, &oplock, NULL);

and the CIFSSMBClose replaced:
    server->ops->close(xid, tcon, fid.netfid);

there is no replacement for CIFSSMBRenameOpenFile ... can this be done using smb2 semantics?

*****

and finally, when trying to rename the open file on an smb1 mount, the operation fails, which seems like it may be the correct response anyway:
11358 07:06:54.545544 rename("/mnt/vm3_a/testfile", "/mnt/vm3_a/testfile.bak") = -1 EBUSY (Device or resource busy) <1.925719>

the 1-2 second delay is expected; in the samba code (for smb1) (I believe there may be 2 waits)

source3/include/local.h:
/* Number of microseconds to wait before a sharing violation. */
#define SHARING_VIOLATION_USEC_WAIT 950000

Comment 6 Ronnie Sahlberg 2019-04-11 05:03:21 UTC
During the recent big credits cleanup in the smb2 codebase we have recently fixed a handful of similar issues.
But these changes can not easily be backported as they build on and depend on a lot of other unrelated changes.

However, there is a workaround. I suggest for customers that suffer this, then can switch back to vers=1.0
until they get a chance to upgrade to rhel8.

Comment 7 Frank Sorenson 2019-04-15 17:45:08 UTC
Created attachment 1555290 [details]
patch to prevent falback to smb

A path-based rename returning EBUSY will incorrectly try opening
the file with a cifs (NT Create AndX) operation on an smb2+ mount,
which causes the server to force a session close.

If the mount is smb2+, skip the fallback.

Comment 8 Frank Sorenson 2019-04-15 17:49:18 UTC
The credits cleanup did not touch this, and does not fix it.

The attached patch fixes this bug, and applies cleanly to both upstream and RHEL7.


Note You need to log in before you can comment on or make changes to this bug.