Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1059778 - [engine] Engine sends DisconnectStorageServer with empty connectionList
Summary: [engine] Engine sends DisconnectStorageServer with empty connectionList
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: General
Version: ---
Hardware: All
OS: Linux
unspecified
high vote
Target Milestone: ovirt-4.0.0-alpha
: ---
Assignee: Daniel Erez
QA Contact: Aharon Canan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-01-30 16:04 UTC by Gadi Ickowicz
Modified: 2016-03-28 14:33 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-03-28 14:33:11 UTC
oVirt Team: Storage
tnisan: ovirt-4.0.0?
ylavi: planning_ack?
amureini: devel_ack+
ylavi: testing_ack?


Attachments (Terms of Use)
engine logs (deleted)
2014-01-30 16:04 UTC, Gadi Ickowicz
no flags Details


Links
System ID Priority Status Summary Last Updated
oVirt gerrit 47544 master ABANDONED core: Manage iscsi sessions 2016-02-24 09:55:38 UTC

Description Gadi Ickowicz 2014-01-30 16:04:01 UTC
Created attachment 857550 [details]
engine logs

Description of problem:
Engine sends DisconnectStorageServer with empty connectionList during remove data center flow:

2014-01-30 15:28:04,600 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStorageServerVDSCommand] (org.ovirt.thread.pool-6-thread-25) [43035533] START, DisconnectStorageServerVDSCommand(HostName = gold-v
2014-01-30 15:28:05,248 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStorageServerVDSCommand] (org.ovirt.thread.pool-6-thread-25) [43035533] FINISH, DisconnectStorageServerVDSCommand, return: {}, log

Version-Release number of selected component (if applicable):
ovirt-engine-3.4.0-0.2.master.20140112020439.git9ad8529.el6.noarch

How reproducible:
100%

Steps to Reproduce:
DC with 1 storage domain, 1 host
1. Move domain to maintenance
2. Remove datacenter

Actual results:
Engine logs show DisconnectStorageServer with empty connectionList sent

Expected results:
If there are no servers to disconnect, there is no need to send DisconnectStorageServer

Additional info:

Comment 1 Allon Mureinik 2014-02-03 07:37:59 UTC
Federico, does this seem related to your recent refactoring?

Comment 2 Liron Aravot 2014-06-22 12:39:21 UTC
When executing disconnectStorageServer, storage cache refresh is triggered regardless of the passed connections list.
Federico, it it safe to remove that at least in some of the cases? or do we want to leave it?

Comment 3 Federico Simoncelli 2014-08-12 12:49:07 UTC
(In reply to Allon Mureinik from comment #1)
> Federico, does this seem related to your recent refactoring?

What I've been able to reproduce is a broader issue that is not related to any DisconnectStorageServer refactoring.

It seems that the logic in filterConnectionsUsedByOthers (ISCSIStorageHelper.java:~108) is fairly simple and if you have more than one domain on the same connection (on different luns) it decides that we should not disconnect from the server.

It doesn't take in account any of the followings (even if it should):

- consider only the storage domains of the pool(s) that the vds is connected to
- consider only the direct luns that are currently in use by a VM
- if all the remaining relevant storage domains are in maintenance then don't filter (=> allow disconnection)

Once we solved these issues that are the pre-requisites we'll see if there's something more to address in the specific DestroyStoragePool flow.

Comment 4 Liron Aravot 2014-08-12 14:05:10 UTC
(In reply to Federico Simoncelli from comment #3)
> (In reply to Allon Mureinik from comment #1)
> > Federico, does this seem related to your recent refactoring?
> 
> What I've been able to reproduce is a broader issue that is not related to
> any DisconnectStorageServer refactoring.
> 
> It seems that the logic in filterConnectionsUsedByOthers
> (ISCSIStorageHelper.java:~108) is fairly simple and if you have more than
> one domain on the same connection (on different luns) it decides that we
> should not disconnect from the server.
> 
> It doesn't take in account any of the followings (even if it should):
> 
> - consider only the storage domains of the pool(s) that the vds is connected
> to
> - consider only the direct luns that are currently in use by a VM
> - if all the remaining relevant storage domains are in maintenance then
> don't filter (=> allow disconnection)
> 
> Once we solved these issues that are the pre-requisites we'll see if there's
> something more to address in the specific DestroyStoragePool flow.

I agree, there are issues on that helper that should addressed, i assume that the reason that it wasn't done before is to avoid synchronization between the different flows.

Fede, regardless of that - can you please reply to https://bugzilla.redhat.com/show_bug.cgi?id=1059778#c2 ?

Comment 5 Federico Simoncelli 2014-08-12 14:44:12 UTC
(In reply to Liron Aravot from comment #2)
> When executing disconnectStorageServer, storage cache refresh is triggered
> regardless of the passed connections list.
> Federico, it it safe to remove that at least in some of the cases? or do we
> want to leave it?

If I understand correctly you're suggesting to skip sdCache.refreshStorage() when connList for disconnectStorageServer is empty.

Can you explain how is that relevant to the bz?

My take on this is that engine shouldn't send connect/disconnectStorageServer with an empty connList and on the other side vdsm should raise an exception in that case (we would have caught this much earlier).

Comment 6 Liron Aravot 2014-08-12 14:52:28 UTC
(In reply to Federico Simoncelli from comment #5)
> (In reply to Liron Aravot from comment #2)
> > When executing disconnectStorageServer, storage cache refresh is triggered
> > regardless of the passed connections list.
> > Federico, it it safe to remove that at least in some of the cases? or do we
> > want to leave it?
> 
> If I understand correctly you're suggesting to skip sdCache.refreshStorage()
> when connList for disconnectStorageServer is empty.
> 
> 
> My take on this is that engine shouldn't send
> connect/disconnectStorageServer with an empty connList and on the other side
> vdsm should raise an exception in that case (we would have caught this much
> earlier).

That's exactly what I've asked, if we can avoid executing it when the connection list is empty. iirc we needed it because of the refreshStorage - can it be safely not be executed for all dc versions?

thanks.

Comment 7 Federico Simoncelli 2014-08-12 15:53:49 UTC
(In reply to Liron Aravot from comment #6)
> (In reply to Federico Simoncelli from comment #5)
> > (In reply to Liron Aravot from comment #2)
> > > When executing disconnectStorageServer, storage cache refresh is triggered
> > > regardless of the passed connections list.
> > > Federico, it it safe to remove that at least in some of the cases? or do we
> > > want to leave it?
> > 
> > If I understand correctly you're suggesting to skip sdCache.refreshStorage()
> > when connList for disconnectStorageServer is empty.
> > 
> > 
> > My take on this is that engine shouldn't send
> > connect/disconnectStorageServer with an empty connList and on the other side
> > vdsm should raise an exception in that case (we would have caught this much
> > earlier).
> 
> That's exactly what I've asked, if we can avoid executing it when the
> connection list is empty.

As I said I expect vdsm to raise an exception early if connList is empty (on master, not 3.5 material).
(Unrelated to the solution for this bug)

> iirc we needed it because of the refreshStorage -

What is "it"? (Previous "it" was "refreshStorage" but here it doesn't make sense)

> can it be safely not be executed for all dc versions?

This is something that you're probably more qualified to answer, did anyone ever use disconnectStorageServer to trigger a refreshStorage as an undocumented side-effect?

Anyway that is not relevant to this bz, this bug is about engine sending disconnectStorageServer with an empty connList where instead it should contain the connections.

The discussion about an undocumented use of disconnectStorageServer and refreshStorage should take place somewhere else.

Comment 8 Liron Aravot 2014-08-13 15:26:08 UTC
(In reply to Federico Simoncelli from comment #7)
> (In reply to Liron Aravot from comment #6)
> > (In reply to Federico Simoncelli from comment #5)
> > > (In reply to Liron Aravot from comment #2)
> > > > When executing disconnectStorageServer, storage cache refresh is triggered
> > > > regardless of the passed connections list.
> > > > Federico, it it safe to remove that at least in some of the cases? or do we
> > > > want to leave it?
> > > 
> > > If I understand correctly you're suggesting to skip sdCache.refreshStorage()
> > > when connList for disconnectStorageServer is empty.
> > > 
> > > 
> > > My take on this is that engine shouldn't send
> > > connect/disconnectStorageServer with an empty connList and on the other side
> > > vdsm should raise an exception in that case (we would have caught this much
> > > earlier).
> > 
> > That's exactly what I've asked, if we can avoid executing it when the
> > connection list is empty.
> 
> As I said I expect vdsm to raise an exception early if connList is empty (on
> master, not 3.5 material).
> (Unrelated to the solution for this bug)
> 
> > iirc we needed it because of the refreshStorage -
> 
> What is "it"? (Previous "it" was "refreshStorage" but here it doesn't make
> sense)
> 
> > can it be safely not be executed for all dc versions?
> 
> This is something that you're probably more qualified to answer, did anyone
> ever use disconnectStorageServer to trigger a refreshStorage as an
> undocumented side-effect?
> 
> Anyway that is not relevant to this bz, this bug is about engine sending
> disconnectStorageServer with an empty connList where instead it should
> contain the connections.
> 
> The discussion about an undocumented use of disconnectStorageServer and
> refreshStorage should take place somewhere else.

Today disconnectStorageServer/connectStorageServer is executed even without connection and triggers refreshStorage on the vdsm side, my question was to clarify whether we can safely remove that without risk on all versions, as refreshStorage() is executed we won't be able to do that.
The second issue to inspect is how we choose which connections to pass on the disconnectStorageServer call and if we do pass all needed connections (which we'll probably won't change for 3.5).

Comment 9 Liron Aravot 2014-08-13 15:31:29 UTC
> Today disconnectStorageServer/connectStorageServer is executed even without
> connection and triggers refreshStorage on the vdsm side, my question was to
> clarify whether we can safely remove that without risk on all versions, as
> refreshStorage() is executed we won't be able to do that.

bad phrasing here, what i meant to ask is whether removing it can cause to regression, as it might. we won't remove the calls at least for now.

Comment 10 Allon Mureinik 2014-08-21 11:37:05 UTC
Seems like a low impact bug, with a fix the involves high risk (probably need a synchronization mechanism around connections) - deferring.

Comment 12 Sandro Bonazzola 2015-09-04 09:01:20 UTC
This is an automated message.
This Bugzilla report has been opened on a version which is not maintained anymore.
Please check if this bug is still relevant in oVirt 3.5.4.
If it's not relevant anymore, please close it (you may use EOL or CURRENT RELEASE resolution)
If it's an RFE please update the version to 4.0 if still relevant.

Comment 13 Red Hat Bugzilla Rules Engine 2015-10-19 10:57:05 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 14 Yaniv Lavi 2015-10-29 12:04:27 UTC
In oVirt testing is done on single stream by default. Therefore I'm removing the 4.0 flag. If you think this bug must be tested in 4.0 as well, please re-add the flag. Please note we might not have testing resources to handle the 4.0 clone.

Comment 15 Ala Hino 2015-11-23 15:25:33 UTC
Fixing this bug involves major code changes at the engine side and potentially at vdsm side.

Approaches that we need to evaluate:
* vdsm "new" conn mgmt api
* deamon maintenance
* ref counting at vdsm side

All approaches involves major code changes hence, the decision to move this bug to 4.0.0.

Comment 16 Yaniv Kaul 2016-03-10 10:34:47 UTC
(In reply to Ala Hino from comment #15)
> Fixing this bug involves major code changes at the engine side and
> potentially at vdsm side.
> 
> Approaches that we need to evaluate:
> * vdsm "new" conn mgmt api
> * deamon maintenance
> * ref counting at vdsm side
> 
> All approaches involves major code changes hence, the decision to move this
> bug to 4.0.0.

So why do it all? What's the REAL value in fixing this?

I'd CLOSE-WONTFIX it.

Comment 17 Daniel Erez 2016-03-28 13:35:20 UTC
(In reply to Yaniv Kaul from comment #16)
> (In reply to Ala Hino from comment #15)
> > Fixing this bug involves major code changes at the engine side and
> > potentially at vdsm side.
> > 
> > Approaches that we need to evaluate:
> > * vdsm "new" conn mgmt api
> > * deamon maintenance
> > * ref counting at vdsm side
> > 
> > All approaches involves major code changes hence, the decision to move this
> > bug to 4.0.0.
> 
> So why do it all? What's the REAL value in fixing this?
> 
> I'd CLOSE-WONTFIX it.

@Allon - can we close this? (IIUC, it's merely a performance issue at best?)


Note You need to log in before you can comment on or make changes to this bug.