[corosync] Corosync and ip link set eth1 down

Steven Dake sdake at redhat.com
Tue Jul 2 17:04:30 UTC 2013


On 07/02/2013 07:47 AM, Marek Skubela wrote:
> Hello,
>
> I'm running Corosync 1.4.2-2 with Pacemaker 1.1.6-2ubuntu3
> on Ubuntu 12.04.2 LTS both installed using ubuntu packages
> and I'm having following issues:
>
> 1) When I set bind interface down and then up, with udp
>    sometimes it takes up to 8 minutes for the node on
>    which the interface was taken down to reconnect with
>    other nodes, sometimes it reconnects immediately, see
>    log [1],
>
> 2) When I use udpu, after bringing the interface up,
>    sometimes corosync crashes, sometimes I see "Totem is
>    unable to form a cluster because of an operating system
>    or network fault." message (corosync restart usually
>    helps) and sometimes it reconnects flawlessly,
>
> 3) When using udpu, I have to add the node on which I run
>    corosync to the members list. When I use its IP
>    (so 10.1.1.165 for example), the node doesn't recognise,
>    when interface is brought down, and so it keeps all the
>    services running, while on other nodes it is seen as an
>    offline node. I went around it by using localhost IP for
>    such node.
>
> The firewall and AppArmor both are disabled.
>
> On wiki [2] I've read that bringing interface down and up
> is not how one should test the split brain scenarios, but
> then how does bringing interface down and unplugging a cable
> differ from the corosync point of view?
>

Corosync actively scans the interfaces for their current up status every 
1 second.  If they appear down, Corosync executes some special logic to 
try to recover.  The recovery doesn't work very well, and it is better 
to be avoided entirely..

If you want to test split brain, use iptables.

Regards
-steve

> Why it takes so long for the node to reconnect when using
> udp?
>
> Should I be using 127.0.0.1 for the node given corosync is
> running on when using udpu or am I missing something there?
>
> Configuration:
>
> UDPU:
> compatibility: none
>
> totem {
>         version: 2
>         secauth: off
>         threads: 0
>         nodeid: 2
>         rrp_mode: none
>
>         interface {
>                 member {
>                         memberaddr: 10.1.1.165
>                 }
>                 member {
>                         memberaddr: 10.1.1.162
>                 }
>                 member {
>                         memberaddr: 10.1.1.164
>                 }
>                 ringnumber: 0
>                 bindnetaddr: 10.1.0.0
>                 mcastport: 5405
>         }
>         transport: udpu
> }
>
> service {
>         ver:       0
>         name:      pacemaker
> }
>
> aisexec {
>         user:   root
>         group:  root
> }
>
> UDP:
> compatibility: none
>
> totem {
>         version: 2
>         secauth: off
>         threads: 0
>         nodeid: 2
>         rrp_mode: none
>
>         interface {
>                 ringnumber: 0
>                 bindnetaddr: 10.1.0.0
>                 mcastaddr: 226.94.1.1
>                 mcastport: 5405
>         }
> }
>
> service {
>         ver:       0
>         name:      pacemaker
> }
>
> aisexec {
>         user:   root
>         group:  root
> }
>
> [1] https://gist.github.com/anonymous/5909784
> [2] 
> https://github.com/corosync/corosync/wiki/Corosync-and-ifdown-on-active-network-interface
>
> Thank you in advance for any hints,
>



More information about the discuss mailing list