[corosync] [PATCH] flatiron cpg: Enhance downlist selection algorithm

Andrew Beekhof andrew at beekhof.net
Mon Jun 18 23:56:30 GMT 2012


On Fri, Jun 15, 2012 at 4:52 PM, Jan Friesse <jfriesse at redhat.com> wrote:
> Andrew,
>
> Andrew Beekhof napsal(a):
>
>> On Thu, Jun 14, 2012 at 11:19 PM, Jan Friesse<jfriesse at redhat.com>  wrote:
>>>
>>> Let's say we have 2 nodes:
>>> - node 2 is paused
>>> - node 1 create membership (one node)
>>> - node 2 is unpaused
>>>
>>> Result is that node 1 downlist is selected, so it means that
>>> from node 2 point of view, node 1 was never down.
>>
>>
>> This behaviour makes sense to me.
>>
>> Although ideally node2 wouldn't get a membership event until everyone
>> agreed whether it was a member or not*.
>> Is that feasible?
>>
>
> I'm unsure what do you mean. Everyone in this specific case is only node 1
> and node 2. When node 2 was paused, node 1 had only chance to create
> membership, and this was one node membership. When node 2 was unpaused, we
> must "simulate" all events which happened in time of it's pause. This was,
> creation of one node membership (only node 2). And after that, we can
> process new membership (node 2 + node 1). So I can say that it's true that
> everyone else (node 1) agreed that node 2 was not part of membership and now
> it is.
>
> Also keep in mind that this patch fixes behavior which is not happening so
> often. Usually we have odd number of nodes AND more then 1, so 3, 5, ... and
> in such situation, this patch doesn't have any effect (because of test #1).
>
> But maybe I didn't understood requirement. If so, can you please elaborate
> little more how do you think membership should look like?

I just meant that in an ideal world, Pacemaker would prefer that node2
didn't find out that (from node1's PoV) it was dropped from the
membership.
Perhaps thats already the case?

The history of events is less important to us than getting the new
membership list (ie. "node1 + node2" in your example) ASAP.

>
>
>> * That or clients might need to be made more tolerant of being kicked
>> out of the membership list.
>>
>
> With 3 patches I've send, there shouldn't happen that node itself is kicked
> from membership (with one exception - localhost rebind what I'm working on
> to fix) by other nodes. It's always "other nodes left membership" on that
> node and on other nodes it is "that node left membership".
>
> Regards,
>  Honza
>
>
>>>
>>> Patch solves situation by adding additional check for largest
>>> previous membership.
>>>
>>> So current tests are:
>>> 1) largest (previous #nodes - #nodes know to have left)
>>> 2) (then) largest previous membership
>>> 3) (and last as a tie-breaker) node with smallest nodeid
>>>
>>> Signed-off-by: Jan Friesse<jfriesse at redhat.com>
>>> ---
>>>  services/cpg.c |   17 +++++++++--------
>>>  1 files changed, 9 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/services/cpg.c b/services/cpg.c
>>> index 7e62260..533f0c9 100644
>>> --- a/services/cpg.c
>>> +++ b/services/cpg.c
>>> @@ -816,16 +816,17 @@ static struct downlist_msg* downlist_master_choose
>>> (void)
>>>                best_members = best->old_members - best->left_nodes;
>>>                cmp_members = cmp->old_members - cmp->left_nodes;
>>>
>>> -               if (cmp_members<  best_members) {
>>> -                       continue;
>>> -               }
>>> -               else if (cmp_members>  best_members) {
>>> -                       best = cmp;
>>> -               }
>>> -               else if (cmp->sender_nodeid<  best->sender_nodeid) {
>>> +               if (cmp_members>  best_members) {
>>>                        best = cmp;
>>> +               } else if (cmp_members == best_members) {
>>> +                       if (cmp->old_members>  best->old_members) {
>>> +                               best = cmp;
>>> +                       } else if (cmp->old_members == best->old_members)
>>> {
>>> +                               if (cmp->sender_nodeid<
>>>  best->sender_nodeid) {
>>> +                                       best = cmp;
>>> +                               }
>>> +                       }
>>>                }
>>> -
>>>        }
>>>
>>>        assert (best != NULL);
>>> --
>>> 1.7.1
>>>
>>> _______________________________________________
>>> discuss mailing list
>>> discuss at corosync.org
>>> http://lists.corosync.org/mailman/listinfo/discuss
>
>


More information about the discuss mailing list