[corosync] Corosync instances seems to ignore each other despite many UDP chat without firewall

Jan Friesse jfriesse at redhat.com
Thu Jun 7 08:17:18 GMT 2012


This is expected behavior, and even more makes me sure that whole 
problem is really hidden in nonexisting local member addr in your config.

Honza

David Guyot napsal(a):
> Hello again, everybody.
>
> I just noticed that, when I tried to set secauth to off, during the
> period of time in which one node accepted secured connections one the
> other unsecured connections, the network fault message were replaced by
> these :
> Jun 06 17:16:17 corosync [TOTEM ] Received message has invalid digest...
> ignoring.
> Jun 06 17:16:17 corosync [TOTEM ] Invalid packet data
> Jun 06 17:16:17 corosync [TOTEM ] Received message has invalid digest...
> ignoring.
> Jun 06 17:16:17 corosync [TOTEM ] Invalid packet data
> Jun 06 17:16:17 corosync [TOTEM ] Received message has invalid digest...
> ignoring.
> Jun 06 17:16:17 corosync [TOTEM ] Invalid packet data
> Jun 06 17:16:17 corosync [TOTEM ] Received message has invalid digest...
> ignoring.
> Jun 06 17:16:17 corosync [TOTEM ] Invalid packet data
> Jun 06 17:16:17 corosync [TOTEM ] Received message has invalid digest...
> ignoring.
> Jun 06 17:16:17 corosync [TOTEM ] Invalid packet data
> Jun 06 17:16:17 corosync [TOTEM ] Received message has invalid digest...
> ignoring.
> Jun 06 17:16:17 corosync [TOTEM ] Invalid packet data
> Jun 06 17:16:17 corosync [TOTEM ] Received message has invalid digest...
> ignoring.
> Jun 06 17:16:17 corosync [TOTEM ] Invalid packet data
> Jun 06 17:16:18 corosync [TOTEM ] Received message has invalid digest...
> ignoring.
> Jun 06 17:16:18 corosync [TOTEM ] Invalid packet data
> Jun 06 17:16:18 corosync [TOTEM ] Received message has invalid digest...
> ignoring.
> Jun 06 17:16:18 corosync [TOTEM ] Invalid packet data
> Jun 06 17:16:18 corosync [TOTEM ] Received message has invalid digest...
> ignoring.
> Jun 06 17:16:18 corosync [TOTEM ] Invalid packet data
> Jun 06 17:16:18 corosync [TOTEM ] Received message has invalid digest...
> ignoring.
> Jun 06 17:16:18 corosync [TOTEM ] Invalid packet data
> Jun 06 17:16:18 corosync [TOTEM ] Received message has invalid digest...
> ignoring.
> Jun 06 17:16:18 corosync [TOTEM ] Invalid packet data
> Jun 06 17:16:18 corosync [TOTEM ] Received message has invalid digest...
> ignoring.
> Jun 06 17:16:18 corosync [TOTEM ] Invalid packet data
> Jun 06 17:16:18 corosync [TOTEM ] Received message has invalid digest...
> ignoring.
> Jun 06 17:16:18 corosync [TOTEM ] Invalid packet data
> Jun 06 17:16:18 corosync [TOTEM ] Received message has invalid digest...
> ignoring.
> Jun 06 17:16:18 corosync [TOTEM ] Invalid packet data
> Jun 06 17:16:18 corosync [TOTEM ] Received message has invalid digest...
> ignoring.
> Jun 06 17:16:18 corosync [TOTEM ] Invalid packet data
> Jun 06 17:16:18 corosync [TOTEM ] Received message has invalid digest...
> ignoring.
> Jun 06 17:16:18 corosync [TOTEM ] Invalid packet data
> Jun 06 17:16:18 corosync [TOTEM ] Received message has invalid digest...
> ignoring.
> Jun 06 17:16:18 corosync [TOTEM ] Invalid packet data
> Jun 06 17:16:18 corosync [TOTEM ] Received message has invalid digest...
> ignoring.
> Jun 06 17:16:18 corosync [TOTEM ] Invalid packet data
> Jun 06 17:16:18 corosync [TOTEM ] Received message has invalid digest...
> ignoring.
> Jun 06 17:16:18 corosync [TOTEM ] Invalid packet data
>
> If this is relevant...
>
> Thank you in advance.
>
> Regards.
>
> Le 06/06/2012 17:05, David Guyot a écrit :
>> Hello, everybody.
>>
>> I'm trying to establish a 2-node Debian Squeeze x64 cluster with
>> Corosync and Pacemaker, but I'm hanged with a strange issue : despite a
>> lot of UDP chatting between the nodes (so network is OK but), each
>> Corosync instance seems to ignore each other : the other node is never
>> detected, and crm_mon --one-shot -V only says "Connection to cluster
>> failed: connection failed". But the strangest in there is that both
>> Corosync nodes are filling their logs with error messages saying "Totem
>> is unable to form a cluster because of an operating system or network
>> fault. The most common cause of this message is that the local firewall
>> is configured improperly.". I tcpdumped all traffic between the hosts,
>> and I have 2-way traffic between them. I tried to use backports versions
>> of all Corosync- and Pacemaker-related packages, without improvement.
>>
>> I must add that, due to my hosting company network policy, I was forced
>> to use UPD-Unicast instead of multicast, because multicast is blocked.
>>
>> Here comes my config :
>> corosync.conf :
>> # Please read the corosync.conf.5 manual page
>> compatibility: whitetank
>>
>> totem {
>>          version: 2
>>          secauth: on
>>          interface {
>>                  member {
>>                          memberaddr: 176.31.238.131
>>                  }
>>                  ringnumber: 0
>>                  bindnetaddr: 37.59.18.208
>>                  mcastport: 5405
>>                  ttl: 1
>>          }
>>          transport: udpu
>> }
>>
>> logging {
>>          fileline: off
>>          to_logfile: yes
>>          to_syslog: yes
>>          debug: on
>>          logfile: /var/log/corosync.log
>>          debug: off
>>          timestamp: on
>>          logger_subsys {
>>                  subsys: AMF
>>                  debug: off
>>          }
>> }
>>
>> Log messages :
>> Jun 06 16:35:14 corosync [MAIN  ] Corosync Cluster Engine ('1.4.2'):
>> started and ready to provide service.
>> Jun 06 16:35:14 corosync [MAIN  ] Corosync built-in features: nss
>> Jun 06 16:35:14 corosync [MAIN  ] Successfully read main configuration
>> file '/etc/corosync/corosync.conf'.
>> Jun 06 16:35:14 corosync [TOTEM ] Initializing transport (UDP/IP Unicast).
>> Jun 06 16:35:14 corosync [TOTEM ] Initializing transmit/receive
>> security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
>> Jun 06 16:35:14 corosync [TOTEM ] The network interface [37.59.18.208]
>> is now up.
>> Jun 06 16:35:14 corosync [SERV  ] Service engine loaded: corosync
>> extended virtual synchrony service
>> Jun 06 16:35:14 corosync [SERV  ] Service engine loaded: corosync
>> configuration service
>> Jun 06 16:35:14 corosync [SERV  ] Service engine loaded: corosync
>> cluster closed process group service v1.01
>> Jun 06 16:35:14 corosync [SERV  ] Service engine loaded: corosync
>> cluster config database access v1.01
>> Jun 06 16:35:14 corosync [SERV  ] Service engine loaded: corosync
>> profile loading service
>> Jun 06 16:35:14 corosync [SERV  ] Service engine loaded: corosync
>> cluster quorum service v0.1
>> Jun 06 16:35:14 corosync [MAIN  ] Compatibility mode set to whitetank.
>> Using V1 and V2 of the synchronization engine.
>> Jun 06 16:35:23 corosync [TOTEM ] Totem is unable to form a cluster
>> because of an operating system or network fault. The most common cause
>> of this message is that the local firewall is configured improperly.
>> Jun 06 16:35:25 corosync [TOTEM ] Totem is unable to form a cluster
>> because of an operating system or network fault. The most common cause
>> of this message is that the local firewall is configured improperly.
>> Jun 06 16:35:27 corosync [TOTEM ] Totem is unable to form a cluster
>> because of an operating system or network fault. The most common cause
>> of this message is that the local firewall is configured improperly.
>> Jun 06 16:35:30 corosync [TOTEM ] Totem is unable to form a cluster
>> because of an operating system or network fault. The most common cause
>> of this message is that the local firewall is configured improperly.
>>
>> # uname -a
>> Linux Vindemiatrix 3.2.13-grsec-xxxx-grs-ipv6-64 #1 SMP Thu Mar 29
>> 09:48:59 UTC 2012 x86_64 GNU/Linux
>>
>> # iptables -nvL
>> Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
>>   pkts bytes target     prot opt in     out     source
>> destination
>>      0     0 ACCEPT     all  --  tun0   *       0.0.0.0/0
>> 0.0.0.0/0
>>      0     0 ACCEPT     all  --  lo     *       0.0.0.0/0
>> 0.0.0.0/0
>>      0     0            tcp  --  *      *       0.0.0.0/0
>> 0.0.0.0/0           tcp dpt:22 state NEW recent: SET name: SSH side: source
>>      0     0 LOGDROP    tcp  --  *      *       0.0.0.0/0
>> 0.0.0.0/0           tcp dpt:22 state NEW recent: UPDATE seconds: 60
>> hit_count: 6 TTL-Match name: SSH side: source
>>      0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0
>> 0.0.0.0/0           tcp dpt:22 state NEW
>>      0     0 LOGDROP    tcp  --  *      *       0.0.0.0/0
>> 0.0.0.0/0           tcp flags:0x17/0x02 multiport dports 80,443 #conn/32
>>> 100
>>      1    48 ACCEPT     tcp  --  *      *       0.0.0.0/0
>> 0.0.0.0/0           tcp flags:0x17/0x02 multiport dports 80,443
>>      0     0 ACCEPT     tcp  --  eth0   *       0.0.0.0/0
>> 0.0.0.0/0           tcp dpt:21 flags:0x17/0x02 limit: avg 5/min burst 50
>> recent: SET name: FTP side: source
>>      0     0 LOGDROP    tcp  --  eth0   *       0.0.0.0/0
>> 0.0.0.0/0           tcp dpt:21 flags:0x17/0x02 recent: UPDATE seconds:
>> 60 hit_count: 6 TTL-Match name: FTP side: source
>>      0     0 ACCEPT     tcp  --  eth0   *       0.0.0.0/0
>> 0.0.0.0/0           tcp dpt:21 flags:0x17/0x02
>>      0     0 ACCEPT     tcp  --  eth0   *       0.0.0.0/0
>> 0.0.0.0/0           tcp dpts:50000:50500 state RELATED,ESTABLISHED
>>      0     0 ACCEPT     tcp  --  eth0   *       176.31.238.131
>> 0.0.0.0/0           tcp dpt:1194
>> 11867 3145K ACCEPT     udp  --  *      *       0.0.0.0/0
>> 0.0.0.0/0           udp dpt:5405 /* Corosync */
>>     35  9516 ACCEPT     all  --  eth0   *       0.0.0.0/0
>> 0.0.0.0/0           state NEW limit: avg 30/sec burst 200
>>      0     0 LOGDROP    tcp  --  eth0   *       0.0.0.0/0
>> 0.0.0.0/0           tcp dpt:80 STRING match "w00tw00t.at.ISC.SANS." ALGO
>> name bm TO 65535
>>      0     0 ACCEPT     icmp --  *      *       0.0.0.0/0
>> 0.0.0.0/0           limit: avg 10/sec burst 5
>>      0     0 LOGDROP    icmp --  *      *       0.0.0.0/0
>> 0.0.0.0/0
>>   1031 70356 ACCEPT     all  --  *      *       0.0.0.0/0
>> 0.0.0.0/0           state RELATED,ESTABLISHED
>>      3   132 LOGDROP    all  --  *      *       0.0.0.0/0
>> 0.0.0.0/0
>>
>> Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
>>   pkts bytes target     prot opt in     out     source
>> destination
>>      0     0 LOGDROP    all  --  *      *       0.0.0.0/0
>> 0.0.0.0/0
>>
>> Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
>>   pkts bytes target     prot opt in     out     source
>> destination
>>      0     0 ACCEPT     all  --  *      tun0    0.0.0.0/0
>> 0.0.0.0/0
>>      0     0 ACCEPT     all  --  *      lo      0.0.0.0/0
>> 0.0.0.0/0
>>      0     0 LOGDROP    tcp  --  *      eth0    0.0.0.0/0
>> 0.0.0.0/0           tcp dpt:80 owner UID match 33
>>      0     0 LOGDROP    udp  --  *      eth0    0.0.0.0/0
>> 0.0.0.0/0           udp dpt:80 owner UID match 33
>>      0     0 LOGDROP    tcp  --  *      eth0    0.0.0.0/0
>> 0.0.0.0/0           tcp dpt:443 owner UID match 33
>>      0     0 LOGDROP    udp  --  *      eth0    0.0.0.0/0
>> 0.0.0.0/0           udp dpt:443 owner UID match 33
>>      0     0 ACCEPT     tcp  --  *      eth0    0.0.0.0/0
>> 176.31.238.131      tcp dpt:1194
>> 11871 3146K ACCEPT     udp  --  *      *       0.0.0.0/0
>> 0.0.0.0/0           udp dpt:5405 /* Corosync */
>>      0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0
>> 0.0.0.0/0           tcp dpt:22
>>      0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0
>> 0.0.0.0/0           tcp dpt:25
>>      0     0 ACCEPT     tcp  --  *      eth0    0.0.0.0/0
>> 0.0.0.0/0           tcp dpt:43
>>      0     0 ACCEPT     tcp  --  *      eth0    0.0.0.0/0
>> 0.0.0.0/0           tcp dpt:53
>>      0     0 ACCEPT     udp  --  *      eth0    0.0.0.0/0
>> 0.0.0.0/0           udp dpt:53
>>      0     0 ACCEPT     tcp  --  *      eth0    0.0.0.0/0
>> 0.0.0.0/0           tcp dpt:80
>>      0     0 ACCEPT     udp  --  *      eth0    0.0.0.0/0
>> 0.0.0.0/0           udp dpt:123
>>      0     0 ACCEPT     tcp  --  *      eth0    0.0.0.0/0
>> 0.0.0.0/0           tcp dpt:443
>>      0     0 ACCEPT     tcp  --  *      eth0    0.0.0.0/0
>> 0.0.0.0/0           tcp dpt:873
>>     11   924 ACCEPT     icmp --  *      *       0.0.0.0/0
>> 0.0.0.0/0
>>   1071  712K ACCEPT     all  --  *      *       0.0.0.0/0
>> 0.0.0.0/0           state RELATED,ESTABLISHED
>>     67 14013 LOGDROP    all  --  *      *       0.0.0.0/0
>> 0.0.0.0/0
>>
>> Chain LOGDROP (12 references)
>>   pkts bytes target     prot opt in     out     source
>> destination
>>     57 11655 LOG        all  --  *      *       0.0.0.0/0
>> 0.0.0.0/0           limit: avg 1/sec burst 5 LOG flags 0 level 5 prefix
>> `iptables rejected: '
>>     70 14145 DROP       all  --  *      *       0.0.0.0/0
>> 0.0.0.0/0
>>
>> # corosync -v
>> Corosync Cluster Engine, version '1.4.2'
>> Copyright (c) 2006-2009 Red Hat, Inc.
>>
>> I've been trying to solve this problem the 2 last days, without any
>> result. Any help welcome.
>>
>> Thank ou in advance!
>>
>> Regards.
>>
>
>
>
>
>
> _______________________________________________
> discuss mailing list
> discuss at corosync.org
> http://lists.corosync.org/mailman/listinfo/discuss



More information about the discuss mailing list