[corosync] Troubleshooting methods for failed process

Earl Ruby eruby at webcdr.com
Thu Nov 10 02:43:27 GMT 2011


I've set up a 2-node Corosync cluster with Master/Slave DRBD, ClusterIP,
a Filesystem resource, and Apache.

Everything works fine except Apache. I can start Apache from the command
line just fine, but when I shut it off on both nodes and then run:

crm resource cleanup WebSite

It fails to start. The Apache error_log on both nodes shows two lines
each time I run cleanup:

[Thu Nov 10 02:37:33 2011] [notice] Apache/2.2.17 (Linux/SUSE)
mod_ssl/2.2.17 OpenSSL/1.0.0c mod_perl/2.0.5 Perl/v5.12.3 configured --
resuming normal operations
[Thu Nov 10 02:37:34 2011] [notice] caught SIGTERM, shutting down

"grep -i apache /var/log/corosync.log" gives no useful info.

Any idea on what command Pacemaker uses to start Apache? As I said, *I*
can start it from the command line no problem, but Pacemaker fails.

Any suggestions on how I should go about troubleshooting this? What I
should be looking at?

My config looks like this:

node install0
node install1
primitive ClusterIP ocf:heartbeat:IPaddr2 \
        params ip="192.168.1.24" cidr_netmask="32" \
        op monitor interval="30s"
primitive FileSystemDRBD ocf:heartbeat:Filesystem \
        params device="/dev/drbd0" directory="/home/src" fstype="ext3" \
        op monitor interval="60" timeout="40" start-delay="10" \
        op start interval="0" timeout="60" \
        op stop interval="0" timeout="60"
primitive VolumeDRBD ocf:linbit:drbd \
        params drbd_resource="install" \
        operations $id="VolumeDRBD-operations" \
        op start interval="0" timeout="240" \
        op promote interval="0" timeout="90" \
        op demote interval="0" timeout="90" \
        op stop interval="0" timeout="100" \
        op monitor interval="10" timeout="20" start-delay="0" \
        op notify interval="0" timeout="90" \
        meta target-role="started"
primitive WebSite ocf:heartbeat:apache \
        params configfile="/etc/apache2/httpd.conf" \
        op monitor interval="1min"
group Cluster ClusterIP FileSystemDRBD WebSite \
        meta target-role="Started"
ms MasterDRBD VolumeDRBD \
        meta clone-max="2" notify="true" target-role="started"
colocation WebServerWithIP inf: Cluster MasterDRBD:Master
order StartFileSystemFirst inf: MasterDRBD:promote Cluster:start
property $id="cib-bootstrap-options" \
        dc-version="1.1.5-ecb6baaf7fc091b023d6d4ba7e0fce26d32cf5c8" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        stonith-enabled="false" \
        no-quorum-policy="ignore" \
        last-lrm-refresh="1320891100"

-- 
Earl C. Ruby III
Director of Engineering


More information about the discuss mailing list