[corosync] Memory leak on 1.2.3
sdake at redhat.com
Tue Dec 20 15:11:49 GMT 2011
On 12/20/2011 03:12 AM, Chris Alexander wrote:
> Hi all,
> We are using Corosync as part of the Redhat cluster stack. Their
> currently supported version is 1.2.3.
While Red Hat's corosync is "version 1.2.3" the z stream almost entirely
matches the flatiron 1.4 branch. I take patches and apply them to the RPM.
> Every few days our nodes are (non-simultaneously) being fenced due to
> corosync taking up vast amounts of memory (i.e. 100% of the box). Please
> see a sample log message, we have several just like this,  which
> occurs when this happens. Note that it is not always corosync being
> killed - but it is clearly corosync eating all the memory (see top
> output from three servers at various times since their last reboot, 
>  ).
> The corosync version is 1.2.3:
> [g at cluster1 ~]$ corosync -v
> Corosync Cluster Engine, version '1.2.3'
> Copyright (c) 2006-2009 Red Hat, Inc.
> We had a bit of a dig around and there are a significant number of
> bugfix updates which address various segfaults, crashes, memory leaks
> etc. in this minor as well as subsequent minor versions.   However
> it seems the Redhat repos haven't been updated past 1.2.3 as yet.
> We're trialling the Fedora 14 (fc14) RPMs for corosync and corosynclib
> (v1.4.2) to see if it fixes the particular issue we are seeing (i.e.
> whether or not the memory keeps spiralling way out of control).
The latest z stream would be your best solution here.
> Has anyone else seen an issue like this, and is there any known way to
> debug or fix it? If we can assist debugging by providing further
> information, please specify what this is (and, if non-obvious, how to
> get it). Any additional tips also welcome.
I haven't seen this problem in the field. Please report to it to
support. They may have seen it and can map it to a BZ, or if not help
reproduce it and get it fixed.
> Thanks again for your help
>  http://pastebin.com/CbyERaRT
>  http://pastebin.com/uk9ZGL7H
>  http://pastebin.com/H4w5Zg46
>  http://pastebin.com/KPZxL6UB
>  http://rhn.redhat.com/errata/RHBA-2011-1361.html
>  http://rhn.redhat.com/errata/RHBA-2011-1515.html
> discuss mailing list
> discuss at corosync.org
More information about the discuss