Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1023145 - Storage and dc are up although there is no host
Summary: Storage and dc are up although there is no host
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.4.0
Assignee: Martin Perina
QA Contact: Tareq Alayan
URL:
Whiteboard: infra
: 1032972 (view as bug list)
Depends On:
Blocks: 1051890 rhev3.4beta 1142926
TreeView+ depends on / blocked
 
Reported: 2013-10-24 17:27 UTC by Ohad Basan
Modified: 2016-02-10 19:05 UTC (History)
14 users (show)

Fixed In Version: ovirt-3.4.0-alpha1
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1023730 1051890 (view as bug list)
Environment:
Last Closed: 2013-11-03 13:12:18 UTC
oVirt Team: Infra
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
oVirt gerrit 21231 None None None Never

Description Ohad Basan 2013-10-24 17:27:44 UTC
Description of problem:
My engine is in a situation where there is an iscsi dc in UP state, an iscsi storage in UP state but no active host
what I did was set up a full iscsi dc (dc,host,storage,vm) started the vm, created a snapshot, exported a template and then I moved the host to maintenence, moved it to a difference dc (NFS). the host is now up in the new dc but the previous iscsi dc and storage are still up - impossible state

Comment 3 Ohad Basan 2013-10-24 17:54:59 UTC
in addition, even though the storage was up in the new dc (nfs) when I tried to connect an nfs storage to it I encountered a failure.

Comment 4 Liron Aravot 2013-10-27 13:10:26 UTC
in bug https://bugzilla.redhat.com/show_bug.cgi?id=1023730 , there are never polled tasks, the host is being moved to maintenance - although spm stop (vdsm verb, not the engine command) isn't being called  as there are tasks on the host the host still moves to maintenance, while in the engine cache the host is still marked as the spm (which cause the pool to remain in status "UP").

Comment 5 Allon Mureinik 2013-10-27 16:33:12 UTC
(In reply to Liron Aravot from comment #4)
> in bug https://bugzilla.redhat.com/show_bug.cgi?id=1023730 , there are never
> polled tasks, the host is being moved to maintenance - although spm stop
> (vdsm verb, not the engine command) isn't being called  as there are tasks
> on the host the host still moves to maintenance, while in the engine cache
> the host is still marked as the spm (which cause the pool to remain in
> status "UP").
So what's the action item here? fail maintenance?

Comment 6 Liron Aravot 2013-10-28 07:49:11 UTC
yep - host shouldn't move to maintenance while it's spm

Comment 7 Ayal Baron 2013-10-28 13:59:05 UTC
This is a host life cycle issue.

Comment 8 Barak 2013-11-03 13:12:18 UTC

*** This bug has been marked as a duplicate of bug 975742 ***

Comment 9 Barak 2013-11-06 06:44:55 UTC
(In reply to Liron Aravot from comment #4)
> in bug https://bugzilla.redhat.com/show_bug.cgi?id=1023730 , there are never
> polled tasks, the host is being moved to maintenance - although spm stop
> (vdsm verb, not the engine command) isn't being called  as there are tasks
> on the host the host still moves to maintenance, while in the engine cache
> the host is still marked as the spm (which cause the pool to remain in
> status "UP").

Why dosn't the engine detects the host is not SPM anymore ?
Engine should be polling for spmStatus ?

Comment 10 Liron Aravot 2013-11-06 06:59:01 UTC
that's exactly the issue, the engine cache saving which host is the spm isn't being cleared as spm stop hasn't been performed, yet the host still moves to maintenance/prepare for maintenance..if spm stop wasn't executed for some reason, the host shouldn't move to maintenance.

Comment 11 Barak 2013-11-07 18:59:03 UTC
Please refresh my memory,
Shouldn't stopSPM be called on the move to maintenance ?

Comment 12 Liron Aravot 2013-11-10 07:48:40 UTC
it is called, but as there are tasks running on the host..spm stop doesn't perform anything.
the problem is that the fact that spm stop vdsm verb wasn't executed on the host and the engine "cache" wasn't cleared, it still moves to maintenance.

Comment 13 Liron Aravot 2013-11-10 12:46:40 UTC
Something similar happend to me upstream:


2013-11-10 14:43:07,128 ERROR [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-8) Host encounter a problem moving to maintenance mode, probably error during disconnecting it from pool VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to DisconnectStoragePoolVDS, error = Operation not allowed while SPM is active: ('69676911-8344-42c2-adc5-033676b88a09',) (Failed with error IsSpm and code 656). The Host will stay in Maintenance

Comment 14 Barak 2013-11-10 18:58:31 UTC
(In reply to Liron Aravot from comment #13)
> Something similar happend to me upstream:
> 
> 
> 2013-11-10 14:43:07,128 ERROR
> [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo]
> (DefaultQuartzScheduler_Worker-8) Host encounter a problem moving to
> maintenance mode, probably error during disconnecting it from pool
> VdcBLLException:
> org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException:
> VDSGenericException: VDSErrorException: Failed to DisconnectStoragePoolVDS,
> error = Operation not allowed while SPM is active:
> ('69676911-8344-42c2-adc5-033676b88a09',) (Failed with error IsSpm and code
> 656). The Host will stay in Maintenance

Don't understand - the move to Maintenance should fail, and be reflected in the host status. above you stated "The Host will stay in Maintenance" ?

Comment 15 Liron Aravot 2013-11-11 07:22:25 UTC
Barak, that's exactly the issue, the host still moves to maintenance.
That quote isn't something i stated, it's also a copy from the log.

Comment 16 Barak 2013-11-11 12:12:04 UTC
Summary of the problem:

- when one tries to move a host to maintenance the engine start by moving the 
  host to prepare for maintenance.
- during that phase a stopSPM call is made to the vdsm (in case it is SPM), but 
  vdsm will fail if it has existing async tasks running.
- The engine in such a case (move to maintenance) will ignore the error and 
  continue with the sate transition, and will not even clear the irsBroker 
  reference in use.
- hence this enables the user to move the host (SPM of DC X) to another DC (Y) 
  while being the SPM of DC X.
 
Either we:
1. fail the move to maintenance in such a case,
or
2. fail the move to another DC.

I personally prefer number 1 as it is clearer to the user.

Comment 17 Barak 2013-11-11 12:15:30 UTC
This is

Comment 18 Barak 2013-11-11 12:16:51 UTC
Ayal, Arthur - what do you think ?

Comment 19 Ayal Baron 2013-11-21 13:05:39 UTC
(In reply to Barak from comment #16)
> Summary of the problem:
> 
> - when one tries to move a host to maintenance the engine start by moving
> the 
>   host to prepare for maintenance.
> - during that phase a stopSPM call is made to the vdsm (in case it is SPM),
> but 
>   vdsm will fail if it has existing async tasks running.
> - The engine in such a case (move to maintenance) will ignore the error and 
>   continue with the sate transition, and will not even clear the irsBroker 
>   reference in use.
> - hence this enables the user to move the host (SPM of DC X) to another DC
> (Y) 
>   while being the SPM of DC X.
>  
> Either we:
> 1. fail the move to maintenance in such a case,
> or
> 2. fail the move to another DC.
> 
> I personally prefer number 1 as it is clearer to the user.

Ack, but user should have a way to fence the host so move to maintenance would work.

Comment 20 Ayal Baron 2013-11-21 13:06:44 UTC
*** Bug 1032972 has been marked as a duplicate of this bug. ***

Comment 21 Barak 2013-11-21 17:15:08 UTC
(In reply to Ayal Baron from comment #19)
> (In reply to Barak from comment #16)

....

> >  
> > Either we:
> > 1. fail the move to maintenance in such a case,
> > or
> > 2. fail the move to another DC.
> > 
> > I personally prefer number 1 as it is clearer to the user.
> 
> Ack, but user should have a way to fence the host so move to maintenance
> would work.

It is possible to fence manually (restart) a host even when it is up.

Arthur ?

Comment 22 Arthur Berezin 2013-11-28 18:11:18 UTC
If vdsm has async tasks, shouldn't it wait to finish all async tasks, and only then to execute stopSPM call and move host to maintenance mode?

if the user wouldn't like to wait for all async tasks to finish he could fence the host.

Comment 25 Sandro Bonazzola 2014-01-14 08:42:55 UTC
ovirt 3.4.0 alpha has been released

Comment 26 Tareq Alayan 2014-02-17 12:56:54 UTC
If host has Async tasks the foolowing msg appears: 
Error while executing action: Cannot switch Host to Maintenance mode. Host has asynchronous running tasks,
wait for operation to complete and retry.


verified on ovirt-engine-3.4.0-0.7.beta2.el6.noarch

Comment 28 Itamar Heim 2014-06-12 14:08:48 UTC
Closing as part of 3.4.0


Note You need to log in before you can comment on or make changes to this bug.