[corosync] Corosync instances seems to ignore each other despite many UDP chat without firewall

Jan Friesse jfriesse at redhat.com
Thu Jun 7 08:16:10 GMT 2012


David,
there seems to be one issue and at least two possibilities of another 
problem:
1.) (this is issue)

Config look like:
...
              member {
                           memberaddr: 176.31.238.131
                   }
                   ringnumber: 0
                   bindnetaddr: 37.59.18.208
...

There must be at least all members (including processor local ip), so 
let's say you have:
node 1 (176.31.238.131)
node 2 (37.59.18.208)

you must have:
member {
                           memberaddr: 176.31.238.131
             }
member {
                           memberaddr: 37.59.18.208
             }
2.) Firewall. Even it looks ok, just make sure that you have opened 
everything corosync need, what is:
- listening on port 5409
- ability to send to 5409
- ability to send from any port (there is basically port per udpu 
member, and port number is allocated by kernel). I've committed patch 
which binds that socket to concrete IP, for older version (currently 
anything !master) there is sender 0.0.0.0.

3.)
176.31.238.131 and 37.59.18.208 doesn't seems to be on same network. 
There may be problem with router between this nets which may block traffic.

But as a first thing, add member addr.

Regards,
   Honza

David Guyot napsal(a):
> Hello, everybody.
>
> I'm trying to establish a 2-node Debian Squeeze x64 cluster with
> Corosync and Pacemaker, but I'm hanged with a strange issue : despite a
> lot of UDP chatting between the nodes (so network is OK but), each
> Corosync instance seems to ignore each other : the other node is never
> detected, and crm_mon --one-shot -V only says "Connection to cluster
> failed: connection failed". But the strangest in there is that both
> Corosync nodes are filling their logs with error messages saying "Totem
> is unable to form a cluster because of an operating system or network
> fault. The most common cause of this message is that the local firewall
> is configured improperly.". I tcpdumped all traffic between the hosts,
> and I have 2-way traffic between them. I tried to use backports versions
> of all Corosync- and Pacemaker-related packages, without improvement.
>
> I must add that, due to my hosting company network policy, I was forced
> to use UPD-Unicast instead of multicast, because multicast is blocked.
>
> Here comes my config :
> corosync.conf :
> # Please read the corosync.conf.5 manual page
> compatibility: whitetank
>
> totem {
>          version: 2
>          secauth: on
>          interface {
>                  member {
>                          memberaddr: 176.31.238.131
>                  }
>                  ringnumber: 0
>                  bindnetaddr: 37.59.18.208
>                  mcastport: 5405
>                  ttl: 1
>          }
>          transport: udpu
> }
>
> logging {
>          fileline: off
>          to_logfile: yes
>          to_syslog: yes
>          debug: on
>          logfile: /var/log/corosync.log
>          debug: off
>          timestamp: on
>          logger_subsys {
>                  subsys: AMF
>                  debug: off
>          }
> }
>
> Log messages :
> Jun 06 16:35:14 corosync [MAIN  ] Corosync Cluster Engine ('1.4.2'):
> started and ready to provide service.
> Jun 06 16:35:14 corosync [MAIN  ] Corosync built-in features: nss
> Jun 06 16:35:14 corosync [MAIN  ] Successfully read main configuration
> file '/etc/corosync/corosync.conf'.
> Jun 06 16:35:14 corosync [TOTEM ] Initializing transport (UDP/IP Unicast).
> Jun 06 16:35:14 corosync [TOTEM ] Initializing transmit/receive
> security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> Jun 06 16:35:14 corosync [TOTEM ] The network interface [37.59.18.208]
> is now up.
> Jun 06 16:35:14 corosync [SERV  ] Service engine loaded: corosync
> extended virtual synchrony service
> Jun 06 16:35:14 corosync [SERV  ] Service engine loaded: corosync
> configuration service
> Jun 06 16:35:14 corosync [SERV  ] Service engine loaded: corosync
> cluster closed process group service v1.01
> Jun 06 16:35:14 corosync [SERV  ] Service engine loaded: corosync
> cluster config database access v1.01
> Jun 06 16:35:14 corosync [SERV  ] Service engine loaded: corosync
> profile loading service
> Jun 06 16:35:14 corosync [SERV  ] Service engine loaded: corosync
> cluster quorum service v0.1
> Jun 06 16:35:14 corosync [MAIN  ] Compatibility mode set to whitetank.
> Using V1 and V2 of the synchronization engine.
> Jun 06 16:35:23 corosync [TOTEM ] Totem is unable to form a cluster
> because of an operating system or network fault. The most common cause
> of this message is that the local firewall is configured improperly.
> Jun 06 16:35:25 corosync [TOTEM ] Totem is unable to form a cluster
> because of an operating system or network fault. The most common cause
> of this message is that the local firewall is configured improperly.
> Jun 06 16:35:27 corosync [TOTEM ] Totem is unable to form a cluster
> because of an operating system or network fault. The most common cause
> of this message is that the local firewall is configured improperly.
> Jun 06 16:35:30 corosync [TOTEM ] Totem is unable to form a cluster
> because of an operating system or network fault. The most common cause
> of this message is that the local firewall is configured improperly.
>
> # uname -a
> Linux Vindemiatrix 3.2.13-grsec-xxxx-grs-ipv6-64 #1 SMP Thu Mar 29
> 09:48:59 UTC 2012 x86_64 GNU/Linux
>
> # iptables -nvL
> Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
>   pkts bytes target     prot opt in     out     source
> destination
>      0     0 ACCEPT     all  --  tun0   *       0.0.0.0/0
> 0.0.0.0/0
>      0     0 ACCEPT     all  --  lo     *       0.0.0.0/0
> 0.0.0.0/0
>      0     0            tcp  --  *      *       0.0.0.0/0
> 0.0.0.0/0           tcp dpt:22 state NEW recent: SET name: SSH side: source
>      0     0 LOGDROP    tcp  --  *      *       0.0.0.0/0
> 0.0.0.0/0           tcp dpt:22 state NEW recent: UPDATE seconds: 60
> hit_count: 6 TTL-Match name: SSH side: source
>      0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0
> 0.0.0.0/0           tcp dpt:22 state NEW
>      0     0 LOGDROP    tcp  --  *      *       0.0.0.0/0
> 0.0.0.0/0           tcp flags:0x17/0x02 multiport dports 80,443 #conn/32
>> 100
>      1    48 ACCEPT     tcp  --  *      *       0.0.0.0/0
> 0.0.0.0/0           tcp flags:0x17/0x02 multiport dports 80,443
>      0     0 ACCEPT     tcp  --  eth0   *       0.0.0.0/0
> 0.0.0.0/0           tcp dpt:21 flags:0x17/0x02 limit: avg 5/min burst 50
> recent: SET name: FTP side: source
>      0     0 LOGDROP    tcp  --  eth0   *       0.0.0.0/0
> 0.0.0.0/0           tcp dpt:21 flags:0x17/0x02 recent: UPDATE seconds:
> 60 hit_count: 6 TTL-Match name: FTP side: source
>      0     0 ACCEPT     tcp  --  eth0   *       0.0.0.0/0
> 0.0.0.0/0           tcp dpt:21 flags:0x17/0x02
>      0     0 ACCEPT     tcp  --  eth0   *       0.0.0.0/0
> 0.0.0.0/0           tcp dpts:50000:50500 state RELATED,ESTABLISHED
>      0     0 ACCEPT     tcp  --  eth0   *       176.31.238.131
> 0.0.0.0/0           tcp dpt:1194
> 11867 3145K ACCEPT     udp  --  *      *       0.0.0.0/0
> 0.0.0.0/0           udp dpt:5405 /* Corosync */
>     35  9516 ACCEPT     all  --  eth0   *       0.0.0.0/0
> 0.0.0.0/0           state NEW limit: avg 30/sec burst 200
>      0     0 LOGDROP    tcp  --  eth0   *       0.0.0.0/0
> 0.0.0.0/0           tcp dpt:80 STRING match "w00tw00t.at.ISC.SANS." ALGO
> name bm TO 65535
>      0     0 ACCEPT     icmp --  *      *       0.0.0.0/0
> 0.0.0.0/0           limit: avg 10/sec burst 5
>      0     0 LOGDROP    icmp --  *      *       0.0.0.0/0
> 0.0.0.0/0
>   1031 70356 ACCEPT     all  --  *      *       0.0.0.0/0
> 0.0.0.0/0           state RELATED,ESTABLISHED
>      3   132 LOGDROP    all  --  *      *       0.0.0.0/0
> 0.0.0.0/0
>
> Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
>   pkts bytes target     prot opt in     out     source
> destination
>      0     0 LOGDROP    all  --  *      *       0.0.0.0/0
> 0.0.0.0/0
>
> Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
>   pkts bytes target     prot opt in     out     source
> destination
>      0     0 ACCEPT     all  --  *      tun0    0.0.0.0/0
> 0.0.0.0/0
>      0     0 ACCEPT     all  --  *      lo      0.0.0.0/0
> 0.0.0.0/0
>      0     0 LOGDROP    tcp  --  *      eth0    0.0.0.0/0
> 0.0.0.0/0           tcp dpt:80 owner UID match 33
>      0     0 LOGDROP    udp  --  *      eth0    0.0.0.0/0
> 0.0.0.0/0           udp dpt:80 owner UID match 33
>      0     0 LOGDROP    tcp  --  *      eth0    0.0.0.0/0
> 0.0.0.0/0           tcp dpt:443 owner UID match 33
>      0     0 LOGDROP    udp  --  *      eth0    0.0.0.0/0
> 0.0.0.0/0           udp dpt:443 owner UID match 33
>      0     0 ACCEPT     tcp  --  *      eth0    0.0.0.0/0
> 176.31.238.131      tcp dpt:1194
> 11871 3146K ACCEPT     udp  --  *      *       0.0.0.0/0
> 0.0.0.0/0           udp dpt:5405 /* Corosync */
>      0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0
> 0.0.0.0/0           tcp dpt:22
>      0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0
> 0.0.0.0/0           tcp dpt:25
>      0     0 ACCEPT     tcp  --  *      eth0    0.0.0.0/0
> 0.0.0.0/0           tcp dpt:43
>      0     0 ACCEPT     tcp  --  *      eth0    0.0.0.0/0
> 0.0.0.0/0           tcp dpt:53
>      0     0 ACCEPT     udp  --  *      eth0    0.0.0.0/0
> 0.0.0.0/0           udp dpt:53
>      0     0 ACCEPT     tcp  --  *      eth0    0.0.0.0/0
> 0.0.0.0/0           tcp dpt:80
>      0     0 ACCEPT     udp  --  *      eth0    0.0.0.0/0
> 0.0.0.0/0           udp dpt:123
>      0     0 ACCEPT     tcp  --  *      eth0    0.0.0.0/0
> 0.0.0.0/0           tcp dpt:443
>      0     0 ACCEPT     tcp  --  *      eth0    0.0.0.0/0
> 0.0.0.0/0           tcp dpt:873
>     11   924 ACCEPT     icmp --  *      *       0.0.0.0/0
> 0.0.0.0/0
>   1071  712K ACCEPT     all  --  *      *       0.0.0.0/0
> 0.0.0.0/0           state RELATED,ESTABLISHED
>     67 14013 LOGDROP    all  --  *      *       0.0.0.0/0
> 0.0.0.0/0
>
> Chain LOGDROP (12 references)
>   pkts bytes target     prot opt in     out     source
> destination
>     57 11655 LOG        all  --  *      *       0.0.0.0/0
> 0.0.0.0/0           limit: avg 1/sec burst 5 LOG flags 0 level 5 prefix
> `iptables rejected: '
>     70 14145 DROP       all  --  *      *       0.0.0.0/0
> 0.0.0.0/0
>
> # corosync -v
> Corosync Cluster Engine, version '1.4.2'
> Copyright (c) 2006-2009 Red Hat, Inc.
>
> I've been trying to solve this problem the 2 last days, without any
> result. Any help welcome.
>
> Thank ou in advance!
>
> Regards.
>
>
>
>
> _______________________________________________
> discuss mailing list
> discuss at corosync.org
> http://lists.corosync.org/mailman/listinfo/discuss



More information about the discuss mailing list