9.3.2. Spostare risorse in seguito ad un fallimento

New in 1.0 is the concept of a migration threshold. ^[13]

Simply define migration-threshold=N for a resource and it will migrate to a new node after N failures. There is no threshold defined by default. To determine the resource’s current failure status and limits, use crm_mon --failcounts.

By default, once the threshold has been reached, this node will no longer be allowed to run the failed resource until the administrator manually resets the resource’s failcount using crm_failcount (after hopefully first fixing the failure’s cause). However it is possible to expire them by setting the resource’s failure-timeout option.

So a setting of migration-threshold=2 and failure-timeout=60s would cause the resource to move to a new node after 2 failures, and allow it to move back (depending on the stickiness and constraint scores) after one minute.

There are two exceptions to the migration threshold concept; they occur when a resource either fails to start or fails to stop. Start failures cause the failcount to be set to INFINITY and thus always cause the resource to move immediately.

I fallimenti in fase di stop sono leggermente differenti e cruciali. Se una risorsa fallisce lo stop e STONITH è abilitato, allora il cluster effettuerà un fence del nodo in modo da essere in grado di avviare la risorsa altrove. Se STONITH non è abilitato, allora il cluster non ha modo di continuare e non cercherà di avviare la risorsa altrove, ma continuerà a tentare di stopparla superato il failure timeout.

Importante

Prima di abilitare questa opzione, si prega di leggere Sezione 8.7, «Assicurarsi che le regole basate sugli orari abbiano effetto».

^[13] The naming of this option was perhaps unfortunate as it is easily confused with true migration, the process of moving a resource from one node to another without stopping it. Xen virtual guests are the most common example of resources that can be migrated in this manner.