Skip to main content
Question

Best JSS Fail Over Practice

  • May 18, 2010
  • 31 replies
  • 178 views

Show first post

31 replies

donmontalvo
Forum|alt.badge.img+36
  • Hall of Fame
  • June 1, 2010

Hi Cem,

The Master can't connect because the Fail Over box has assumed the IP address. This is expected behavior, and why we schedule off hour maintenance. We can then shut down the Fail Over, bring up the Master, then when the Fail Over box comes back up, it resumes as DHCP.

Don


Forum|alt.badge.img+19
  • Author
  • Contributor
  • June 1, 2010

In this scenario, has Master Server has dhcp or fixed ip? As far as my
environment concern I can not get FQDN if I don't have a fixed IP address.

We can then shut down the Fail Over, bring up the Master, then when the Fail Over box comes back up, it resumes as DHCP.

If Master has fixed IP, Failover Server will force the Master Server out the
network again after being rebooted. That is what I have monitored yesterday.

Sorry repeat, that is why I have come up with this plan of action;

Master Server fails

-->Backup Server takes IP over automatically (because Casper database
already rsynced it is up to date and service won't be interrupted)

-->Master Server repaired or recovered, but cannot use its Unique IP

-->Remove IP Failover info from /etc/hostconfig from Backup Server -->reboot

-->Boot or reboot Master Server (after repair or recovery) IP is now useable

-->Add the info back into /etc/hostconfig file on Backup Server-->reboot

-->IP Failover is now ready again

Cem


donmontalvo
Forum|alt.badge.img+36
  • Hall of Fame
  • June 1, 2010

Hi Cem,

Once the Fail Over box is taken down, the Master would be brought back up with it's original static IP address (nothing changes). Perform any maintenance needed on the Master, then bring the Fail Over box back up. The Fail Over would need to come back up as DHCP (don't remember if we had to do this last step manually).

Don


donmontalvo
Forum|alt.badge.img+36
  • Hall of Fame
  • June 1, 2010

Cem,

I dug up my notes, the Fail Over will in fact release the IP address once it is brought back up and notices the Master is up. I'll send you my notes in a separate email.

Don


Forum|alt.badge.img+19
  • Author
  • Contributor
  • July 3, 2010

Hi,

OSx10.6 Server IP Fail Over is broken. But I have managed to fixed it. If
anyone interested, here is how its done:

I have created Automator app with Unix command and added in login items of
Master Server. (Launchd (Lingon) didn't work for me)...as heartbeatd wasn't
sending pulses...
Here is the command;

heartbeatd -d fwIPAddress serverIPAdress
(example: heartbeatd -d 10.0.0.2 192.168.0.2 )

I now fully automated Ip Failover with mail notifications (notifies failing
and coming back IPs). Data Center guys just need to put Master server back
on :)

This is what it says on Terminal -d, --debug as -x, but print debug output to terminal

And more here on man page:
http://developer.apple.com/mac/library/documentation/Darwin/Reference/ManPag
es/man8/heartbeatd.8.html

Next challenge is get syncing Casper MySQL db to backup server, so JSS keeps
running after FailOver. I will keep you all updated.

Cem


donmontalvo
Forum|alt.badge.img+36
  • Hall of Fame
  • July 4, 2010

Now that we have our isolated LAB environment built here in Dallas, one of the things we're going to be testing is running MySQL (VM'd on our LAB Xserve for now) to see if we can offload the MySQL database from the Xserve. This would make it easier to get another box up and running. When we imported our database, the compressed database was 2.4G and took several hours to expand...and many more hours to import. We're hoping to never have to go through that process again. :)

Don