Testing high availability

View: New views
9 Messages — Rating Filter:   Alert me  

Testing high availability

by Paras pradhan :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hey all:

It seems like my question is related to ha, drbd and xen . Hence posting to
all of them at once.

I have two nodes setup with xen 3.0.3, drbd82, heartbeat 2 under centos
5.2. As I was testing this cluster for high availibility, I noticed some
issues


1)  domA is running under node1. when I manually shutdown node 1, sometimes
it is migrated automatically to node2 and sometimes it is restarted in
node2. Why is this happening?

2) domA is running under node1. when I pull off the network cable, domA is
restarted in node 2 with no problem. But when the node1 comes back, domA is
not migrated to node1 and if i do 'xm list' under node1, I see
"migrating-domain". This is complicating everything.


My ha.cf file looks:


logfacility local0
udpport 694
keepalive 1
deadtime 5
warntime 3
initdead 10
ucast eth0 10.42.40.198
ucast eth0 10.42.40.26
auto_failback on
watchdog /dev/watchdog
debugfile /var/log/ha-debug
node ha1.domain.local
node ha2.domain.local


Help !

Thanks in advance
Paras.
_______________________________________________
Linux-HA mailing list
Linux-HA@...
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: Testing high availability

by Dejan Muhamedagic :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

On Thu, Oct 02, 2008 at 04:55:03PM -0500, Paras pradhan wrote:
> Hey all:
>
> It seems like my question is related to ha, drbd and xen . Hence posting to
> all of them at once.

Not sure if that's a good idea.

> I have two nodes setup with xen 3.0.3, drbd82, heartbeat 2 under centos
> 5.2. As I was testing this cluster for high availibility, I noticed some
> issues
>
>
> 1)  domA is running under node1. when I manually shutdown node 1, sometimes
> it is migrated automatically to node2 and sometimes it is restarted in
> node2. Why is this happening?

How can it be restarted on node2 if it was running on node1?

> 2) domA is running under node1. when I pull off the network cable, domA is
> restarted in node 2 with no problem. But when the node1 comes back, domA is
> not migrated to node1 and if i do 'xm list' under node1, I see
> "migrating-domain". This is complicating everything.

Indeed. Did you take a look at the logs?

> My ha.cf file looks:
>
>
> logfacility local0
> udpport 694
> keepalive 1
> deadtime 5
> warntime 3
> initdead 10
> ucast eth0 10.42.40.198
> ucast eth0 10.42.40.26
> auto_failback on
> watchdog /dev/watchdog
> debugfile /var/log/ha-debug
> node ha1.domain.local
> node ha2.domain.local

Do you run v1/haresources or v2/CIB style configuration?

>
> Help !
>
> Thanks in advance
> Paras.
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@...
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA@...
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: Testing high availability

by voip crazy :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

This could the solution for your second question.

Voipcrazy


auto_failback directive - set failback policy

The auto_failback option determines whether a resource will
automatically fail back to its "primary" node, or remain on whatever
node is serving it until that node fails, or an administrator
intervenes.

The possible values for auto_failback are:

    *

      on - enable automatic failbacks
    *

      off - disable automatic failbacks
    *

      legacy - enable automatic failbacks in systems where all nodes
in the cluster do not yet support the auto_failback option.

Both the auto_failback on and off are backwards compatible with the
old "nice_failback on" setting.

See the FAQ document for information on how to convert from "legacy"
to "on" without a flash cut (i.e., using a RollingUpgrade process)

>From URL:
http://www.linux-ha.org/ha.cf/AutoFailbackDirective


2008/10/3 Dejan Muhamedagic <dejanmm@...>:

> Hi,
>
> On Thu, Oct 02, 2008 at 04:55:03PM -0500, Paras pradhan wrote:
>> Hey all:
>>
>> It seems like my question is related to ha, drbd and xen . Hence posting to
>> all of them at once.
>
> Not sure if that's a good idea.
>
>> I have two nodes setup with xen 3.0.3, drbd82, heartbeat 2 under centos
>> 5.2. As I was testing this cluster for high availibility, I noticed some
>> issues
>>
>>
>> 1)  domA is running under node1. when I manually shutdown node 1, sometimes
>> it is migrated automatically to node2 and sometimes it is restarted in
>> node2. Why is this happening?
>
> How can it be restarted on node2 if it was running on node1?
>
>> 2) domA is running under node1. when I pull off the network cable, domA is
>> restarted in node 2 with no problem. But when the node1 comes back, domA is
>> not migrated to node1 and if i do 'xm list' under node1, I see
>> "migrating-domain". This is complicating everything.
>
> Indeed. Did you take a look at the logs?
>
>> My ha.cf file looks:
>>
>>
>> logfacility local0
>> udpport 694
>> keepalive 1
>> deadtime 5
>> warntime 3
>> initdead 10
>> ucast eth0 10.42.40.198
>> ucast eth0 10.42.40.26
>> auto_failback on
>> watchdog /dev/watchdog
>> debugfile /var/log/ha-debug
>> node ha1.domain.local
>> node ha2.domain.local
>
> Do you run v1/haresources or v2/CIB style configuration?
>
>>
>> Help !
>>
>> Thanks in advance
>> Paras.
>> _______________________________________________
>> Linux-HA mailing list
>> Linux-HA@...
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@...
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
Linux-HA@...
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: Testing high availability

by Paras pradhan :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Oct 3, 2008 at 9:11 AM, Dejan Muhamedagic <dejanmm@...>wrote:

> Hi,
>
> On Thu, Oct 02, 2008 at 04:55:03PM -0500, Paras pradhan wrote:
> > Hey all:
> >
> > It seems like my question is related to ha, drbd and xen . Hence posting
> to
> > all of them at once.
>
> Not sure if that's a good idea.
>
> > I have two nodes setup with xen 3.0.3, drbd82, heartbeat 2 under centos
> > 5.2. As I was testing this cluster for high availibility, I noticed some
> > issues
> >
> >
> > 1)  domA is running under node1. when I manually shutdown node 1,
> sometimes
> > it is migrated automatically to node2 and sometimes it is restarted in
> > node2. Why is this happening?
>
> How can it be restarted on node2 if it was running on node1?


Yes. I don't know what is going on. It should be migrated rather
than relocating or restarting on the other node when I do manual
shutdown/restart.



>
>
> > 2) domA is running under node1. when I pull off the network cable, domA
> is
> > restarted in node 2 with no problem. But when the node1 comes back, domA
> is
> > not migrated to node1 and if i do 'xm list' under node1, I see
> > "migrating-domain". This is complicating everything.
>
> Indeed. Did you take a look at the logs?


It seems when I pull off the cable, the domains are still running under
node1. And when I plug back the cable, domain guests are running to be on
both nodes. Any solution?

>
>
> > My ha.cf file looks:
> >
> >
> > logfacility local0
> > udpport 694
> > keepalive 1
> > deadtime 5
> > warntime 3
> > initdead 10
> > ucast eth0 10.42.40.198
> > ucast eth0 10.42.40.26
> > auto_failback on
> > watchdog /dev/watchdog
> > debugfile /var/log/ha-debug
> > node ha1.domain.local
> > node ha2.domain.local
>
> Do you run v1/haresources or v2/CIB style configuration?


V1 Style.



>
>
> >
> > Help !
> >
> > Thanks in advance
> > Paras.
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA@...
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@...
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>



Thanks
Paras.
_______________________________________________
Linux-HA mailing list
Linux-HA@...
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: Testing high availability

by Dejan Muhamedagic :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Oct 03, 2008 at 05:32:02PM -0500, Paras pradhan wrote:

> On Fri, Oct 3, 2008 at 9:11 AM, Dejan Muhamedagic <dejanmm@...>wrote:
>
> > Hi,
> >
> > On Thu, Oct 02, 2008 at 04:55:03PM -0500, Paras pradhan wrote:
> > > Hey all:
> > >
> > > It seems like my question is related to ha, drbd and xen . Hence posting
> > to
> > > all of them at once.
> >
> > Not sure if that's a good idea.
> >
> > > I have two nodes setup with xen 3.0.3, drbd82, heartbeat 2 under centos
> > > 5.2. As I was testing this cluster for high availibility, I noticed some
> > > issues
> > >
> > >
> > > 1)  domA is running under node1. when I manually shutdown node 1,
> > sometimes
> > > it is migrated automatically to node2 and sometimes it is restarted in
> > > node2. Why is this happening?
> >
> > How can it be restarted on node2 if it was running on node1?
>
>
> Yes. I don't know what is going on. It should be migrated rather
> than relocating or restarting on the other node when I do manual
> shutdown/restart.

Sorry, I still don't understand what you are trying to describe.

> > > 2) domA is running under node1. when I pull off the network cable, domA
> > is
> > > restarted in node 2 with no problem. But when the node1 comes back, domA
> > is
> > > not migrated to node1 and if i do 'xm list' under node1, I see
> > > "migrating-domain". This is complicating everything.
> >
> > Indeed. Did you take a look at the logs?
>
> It seems when I pull off the cable, the domains are still running under
> node1. And when I plug back the cable, domain guests are running to be on
> both nodes. Any solution?

That's called split brain and the only solution is fencing, i.e.
stonith. And try to make sure that that happens as seldom as
possible.

Thanks,

Dejan

> > > My ha.cf file looks:
> > >
> > >
> > > logfacility local0
> > > udpport 694
> > > keepalive 1
> > > deadtime 5
> > > warntime 3
> > > initdead 10
> > > ucast eth0 10.42.40.198
> > > ucast eth0 10.42.40.26
> > > auto_failback on
> > > watchdog /dev/watchdog
> > > debugfile /var/log/ha-debug
> > > node ha1.domain.local
> > > node ha2.domain.local
> >
> > Do you run v1/haresources or v2/CIB style configuration?
>
>
> V1 Style.
>
>
>
> >
> >
> > >
> > > Help !
> > >
> > > Thanks in advance
> > > Paras.
> > > _______________________________________________
> > > Linux-HA mailing list
> > > Linux-HA@...
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > See also: http://linux-ha.org/ReportingProblems
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA@...
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
>
>
>
> Thanks
> Paras.
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@...
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA@...
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Parent Message unknown Re: [Xen-users] Testing high availability

by Paras pradhan :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sun, Oct 5, 2008 at 3:54 AM, Daniel Asplund <danielsaori@...>wrote:

> >
> > Hey all:
> >
> > It seems like my question is related to ha, drbd and xen . Hence posting
> to all of them at once.
> > I have two nodes setup with xen 3.0.3, drbd82, heartbeat 2 under centos
> 5.2. As I was testing this cluster for high availibility, I noticed some
> issues
> >
> > 1)  domA is running under node1. when I manually shutdown node 1,
> sometimes it is migrated automatically to node2 and sometimes it is
> restarted in node2. Why is this happening?
> > 2) domA is running under node1. when I pull off the network cable, domA
> is restarted in node 2 with no problem. But when the node1 comes back, domA
> is not migrated to node1 and if i do 'xm list' under node1, I see
> "migrating-domain". This is complicating everything.
> >
>
> 1) Most likely live migration fails for some reason and therefore the
> domA is restarted in node2. Could be a timer issue or a problem with
> release of resources. You should be able to see something from the
> logs during shutdown on node1.
>
> 2) heartbeat on node1 will sense an error and try to migrate domA to
> node2 when node1 is up again. But the node2 has already started domA
> and you basically have domA running on both nodes. To avoid split
> situations like this you should really use a STONITH device that can
> reboot the other node, a hardware device connected via serial cable is
> most secure, but a cheaper alternative is to use soft stonith device
> that can reboot the other node via SSH or telnet. You probably need to
> tweak heartbeat as well to allow it to do further checks, for example
> test connectivity to your gateway.


Yes it seems I need Stonith. At least for now I want to use stonith ssh for
testing purposes. One thing that i am confused, how do i configure stonith
and what is the typical practise. In above scenario, node1 should be
rebooted or node2.

What i did is under node1, I added "stonith_host * ssh node2" to ha.cf and
under node2: "stonith_host * ssh node1".  But this is not working.

Is that the way to configure stonith. I have checked linux-ha.org + google,
but this confusion persists.


What I want is, if there is a network outage in node1, it should be
automatically rebooted or shutdown migrating all domUs to node2.




>
>
> Do you have two NICs in both nodes or are you running DRBD, HA and
> data traffic over same NIC?


Daniel, Yes I have 2 NICs in both nodes.


>
> Regards, Daniel
> http://www.asplund.nu/xencluster.html
>



Thanks
Paras.
_______________________________________________
Linux-HA mailing list
Linux-HA@...
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Xen-users] Testing high availability

by Paras pradhan :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Oct 6, 2008 at 2:53 PM, Paras pradhan <pradhanparas@...>wrote:

>
>
> On Sun, Oct 5, 2008 at 3:54 AM, Daniel Asplund <danielsaori@...>wrote:
>
>> >
>> > Hey all:
>> >
>> > It seems like my question is related to ha, drbd and xen . Hence posting
>> to all of them at once.
>> > I have two nodes setup with xen 3.0.3, drbd82, heartbeat 2 under centos
>> 5.2. As I was testing this cluster for high availibility, I noticed some
>> issues
>> >
>> > 1)  domA is running under node1. when I manually shutdown node 1,
>> sometimes it is migrated automatically to node2 and sometimes it is
>> restarted in node2. Why is this happening?
>> > 2) domA is running under node1. when I pull off the network cable, domA
>> is restarted in node 2 with no problem. But when the node1 comes back, domA
>> is not migrated to node1 and if i do 'xm list' under node1, I see
>> "migrating-domain". This is complicating everything.
>> >
>>
>> 1) Most likely live migration fails for some reason and therefore the
>> domA is restarted in node2. Could be a timer issue or a problem with
>> release of resources. You should be able to see something from the
>> logs during shutdown on node1.
>>
>> 2) heartbeat on node1 will sense an error and try to migrate domA to
>> node2 when node1 is up again. But the node2 has already started domA
>> and you basically have domA running on both nodes. To avoid split
>> situations like this you should really use a STONITH device that can
>> reboot the other node, a hardware device connected via serial cable is
>> most secure, but a cheaper alternative is to use soft stonith device
>> that can reboot the other node via SSH or telnet. You probably need to
>> tweak heartbeat as well to allow it to do further checks, for example
>> test connectivity to your gateway.
>
>
> Yes it seems I need Stonith. At least for now I want to use stonith ssh for
> testing purposes. One thing that i am confused, how do i configure stonith
> and what is the typical practise. In above scenario, node1 should be
> rebooted or node2.
>
> What i did is under node1, I added "stonith_host * ssh node2" to ha.cf and
> under node2: "stonith_host * ssh node1".  But this is not working.
>
> Is that the way to configure stonith. I have checked linux-ha.org +
> google, but this confusion persists.
>
>
> What I want is, if there is a network outage in node1, it should be
> automatically rebooted or shutdown migrating all domUs to node2.
>
>
>
>
>>
>>
>> Do you have two NICs in both nodes or are you running DRBD, HA and
>> data traffic over same NIC?
>
>
> Daniel, Yes I have 2 NICs in both nodes.
>
>
>>
>> Regards, Daniel
>> http://www.asplund.nu/xencluster.html
>>
>
>
>
> Thanks
> Paras.
>



One more thing as I am testing my ha cluster.


I think I have a satisfactory HA cluster setup which I am planning to put in
production. But i think I am too far away from it.

I need some advices... how do I test this cluster?

Few scenarios I have done to test:

1) stop heart beat daemons --> Working fine

2) Reboot and Shutdown nodes --> Working fine

3) Pull of the network cable or did 'service network stop' --> Working.. but
split brain need to be manually taken care of. Which solution is ideal?
stonith meatware, stonith suicide or stonith ssh.

4) Any other tips on how to test the cluster?


Thanks
Paras.
_______________________________________________
Linux-HA mailing list
Linux-HA@...
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: Re: [Xen-users] Testing high availability

by Dejan Muhamedagic :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Oct 07, 2008 at 04:14:39PM -0500, Paras pradhan wrote:

> On Mon, Oct 6, 2008 at 2:53 PM, Paras pradhan <pradhanparas@...>wrote:
>
> >
> >
> > On Sun, Oct 5, 2008 at 3:54 AM, Daniel Asplund <danielsaori@...>wrote:
> >
> >> >
> >> > Hey all:
> >> >
> >> > It seems like my question is related to ha, drbd and xen . Hence posting
> >> to all of them at once.
> >> > I have two nodes setup with xen 3.0.3, drbd82, heartbeat 2 under centos
> >> 5.2. As I was testing this cluster for high availibility, I noticed some
> >> issues
> >> >
> >> > 1)  domA is running under node1. when I manually shutdown node 1,
> >> sometimes it is migrated automatically to node2 and sometimes it is
> >> restarted in node2. Why is this happening?
> >> > 2) domA is running under node1. when I pull off the network cable, domA
> >> is restarted in node 2 with no problem. But when the node1 comes back, domA
> >> is not migrated to node1 and if i do 'xm list' under node1, I see
> >> "migrating-domain". This is complicating everything.
> >> >
> >>
> >> 1) Most likely live migration fails for some reason and therefore the
> >> domA is restarted in node2. Could be a timer issue or a problem with
> >> release of resources. You should be able to see something from the
> >> logs during shutdown on node1.
> >>
> >> 2) heartbeat on node1 will sense an error and try to migrate domA to
> >> node2 when node1 is up again. But the node2 has already started domA
> >> and you basically have domA running on both nodes. To avoid split
> >> situations like this you should really use a STONITH device that can
> >> reboot the other node, a hardware device connected via serial cable is
> >> most secure, but a cheaper alternative is to use soft stonith device
> >> that can reboot the other node via SSH or telnet. You probably need to
> >> tweak heartbeat as well to allow it to do further checks, for example
> >> test connectivity to your gateway.
> >
> >
> > Yes it seems I need Stonith. At least for now I want to use stonith ssh for
> > testing purposes. One thing that i am confused, how do i configure stonith
> > and what is the typical practise. In above scenario, node1 should be
> > rebooted or node2.
> >
> > What i did is under node1, I added "stonith_host * ssh node2" to ha.cf and
> > under node2: "stonith_host * ssh node1".  But this is not working.
> >
> > Is that the way to configure stonith. I have checked linux-ha.org +
> > google, but this confusion persists.
> >
> >
> > What I want is, if there is a network outage in node1, it should be
> > automatically rebooted or shutdown migrating all domUs to node2.
> >
> >
> >
> >
> >>
> >>
> >> Do you have two NICs in both nodes or are you running DRBD, HA and
> >> data traffic over same NIC?
> >
> >
> > Daniel, Yes I have 2 NICs in both nodes.
> >
> >
> >>
> >> Regards, Daniel
> >> http://www.asplund.nu/xencluster.html
> >>
> >
> >
> >
> > Thanks
> > Paras.
> >
>
>
>
> One more thing as I am testing my ha cluster.
>
>
> I think I have a satisfactory HA cluster setup which I am planning to put in
> production. But i think I am too far away from it.
>
> I need some advices... how do I test this cluster?
>
> Few scenarios I have done to test:
>
> 1) stop heart beat daemons --> Working fine
>
> 2) Reboot and Shutdown nodes --> Working fine
>
> 3) Pull of the network cable or did 'service network stop' --> Working.. but
> split brain need to be manually taken care of. Which solution is ideal?
> stonith meatware, stonith suicide or stonith ssh.

A real stonith device. Suicide may be of use in some setups, but
I wouldn't recommend it in general. No ssh in production.

> 4) Any other tips on how to test the cluster?

Disk full.

We need a list of things which may fail in interesting ways.

Thanks,

Dejan

>
> Thanks
> Paras.
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@...
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA@...
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: Re: [Xen-users] Testing high availability

by Paras pradhan :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Oct 8, 2008 at 10:19 AM, Dejan Muhamedagic <dejanmm@...>wrote:

> On Tue, Oct 07, 2008 at 04:14:39PM -0500, Paras pradhan wrote:
> > On Mon, Oct 6, 2008 at 2:53 PM, Paras pradhan <pradhanparas@...
> >wrote:
> >
> > >
> > >
> > > On Sun, Oct 5, 2008 at 3:54 AM, Daniel Asplund <danielsaori@...
> >wrote:
> > >
> > >> >
> > >> > Hey all:
> > >> >
> > >> > It seems like my question is related to ha, drbd and xen . Hence
> posting
> > >> to all of them at once.
> > >> > I have two nodes setup with xen 3.0.3, drbd82, heartbeat 2 under
> centos
> > >> 5.2. As I was testing this cluster for high availibility, I noticed
> some
> > >> issues
> > >> >
> > >> > 1)  domA is running under node1. when I manually shutdown node 1,
> > >> sometimes it is migrated automatically to node2 and sometimes it is
> > >> restarted in node2. Why is this happening?
> > >> > 2) domA is running under node1. when I pull off the network cable,
> domA
> > >> is restarted in node 2 with no problem. But when the node1 comes back,
> domA
> > >> is not migrated to node1 and if i do 'xm list' under node1, I see
> > >> "migrating-domain". This is complicating everything.
> > >> >
> > >>
> > >> 1) Most likely live migration fails for some reason and therefore the
> > >> domA is restarted in node2. Could be a timer issue or a problem with
> > >> release of resources. You should be able to see something from the
> > >> logs during shutdown on node1.
> > >>
> > >> 2) heartbeat on node1 will sense an error and try to migrate domA to
> > >> node2 when node1 is up again. But the node2 has already started domA
> > >> and you basically have domA running on both nodes. To avoid split
> > >> situations like this you should really use a STONITH device that can
> > >> reboot the other node, a hardware device connected via serial cable is
> > >> most secure, but a cheaper alternative is to use soft stonith device
> > >> that can reboot the other node via SSH or telnet. You probably need to
> > >> tweak heartbeat as well to allow it to do further checks, for example
> > >> test connectivity to your gateway.
> > >
> > >
> > > Yes it seems I need Stonith. At least for now I want to use stonith ssh
> for
> > > testing purposes. One thing that i am confused, how do i configure
> stonith
> > > and what is the typical practise. In above scenario, node1 should be
> > > rebooted or node2.
> > >
> > > What i did is under node1, I added "stonith_host * ssh node2" to ha.cfand
> > > under node2: "stonith_host * ssh node1".  But this is not working.
> > >
> > > Is that the way to configure stonith. I have checked linux-ha.org +
> > > google, but this confusion persists.
> > >
> > >
> > > What I want is, if there is a network outage in node1, it should be
> > > automatically rebooted or shutdown migrating all domUs to node2.
> > >
> > >
> > >
> > >
> > >>
> > >>
> > >> Do you have two NICs in both nodes or are you running DRBD, HA and
> > >> data traffic over same NIC?
> > >
> > >
> > > Daniel, Yes I have 2 NICs in both nodes.
> > >
> > >
> > >>
> > >> Regards, Daniel
> > >> http://www.asplund.nu/xencluster.html
> > >>
> > >
> > >
> > >
> > > Thanks
> > > Paras.
> > >
> >
> >
> >
> > One more thing as I am testing my ha cluster.
> >
> >
> > I think I have a satisfactory HA cluster setup which I am planning to put
> in
> > production. But i think I am too far away from it.
> >
> > I need some advices... how do I test this cluster?
> >
> > Few scenarios I have done to test:
> >
> > 1) stop heart beat daemons --> Working fine
> >
> > 2) Reboot and Shutdown nodes --> Working fine
> >
> > 3) Pull of the network cable or did 'service network stop' --> Working..
> but
> > split brain need to be manually taken care of. Which solution is ideal?
> > stonith meatware, stonith suicide or stonith ssh.
>
> A real stonith device. Suicide may be of use in some setups, but
> I wouldn't recommend it in general. No ssh in production.


If I pull off the network cable in a node. Is it possible for that node to
be poweroff using suicide or ssh?  It seems yes, but is not working. Any
tips on this?



>
>
> > 4) Any other tips on how to test the cluster?
>
> Disk full.
>
> We need a list of things which may fail in interesting ways.
>
> Thanks,
>
> Dejan
>
> >
> > Thanks
> > Paras.
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA@...
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@...
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>



Thanks
Paras.
_______________________________________________
Linux-HA mailing list
Linux-HA@...
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
LightInTheBox - Buy quality products at wholesale price!