Posted on 08-06-2010 07:14 AM
Just wanted to share this with the list....
We are deploying laptops right now in our 1:1 program. We are all working 65+ hour weeks here in IT to get all last minute stuff done before school starts, to pass out laptops, and ensure users get logged in. Two nights ago at 10PM my blackberry went off and it was an email from our server monitor. One of my main Open Directory Relay servers had lost power. Of course I could not get into the buildings until 7AM the next day. Long story short that power outage corrupted the Password Server service and thus relayed that corruption to the two tier two replicas. To make matters worse, we started deploying laptops that morning. As in at 8AM...
Luckily, I have tons of local user account scripts already pre-baked in Casper and the JSS and all I had to do is flip a switch. So that all computers got a local account they could log into. So, any user that could not log in because their home folder was on a downed server, could now log in as a local account. When they come back in for the first day of school, their mobile accounts will be fixed. I will then change the password and disable the local account and force them to log into their mobile account.
This also made me realize that all late enrollees, or kids who somehow slip through my mass import lists into OD will also use this local account. We used to try to make OD accounts on the fly if a student didn't have one. Now I just use Casper to deploy the local account to those machines in scope, and each building has an unique account/password, and all late enrollees can be dumped to a CSV file by the enrollment people and I can import it into OD right when school starts and not have to fuss with manual user account creation for those who enroll late or slip through the cracks.
If I had not had Casper at my disposal when those servers crashed, I wouldn't have been able to come up with plan B. Instead, I would be stressing out.
If any of you work in a 1:1 and do massive laptop roll outs, learn from my experience here. You also know you will have massive amounts of students and parents coming in to pick up the laptop and log into it for the first time. If you have a bottle neck of people coming in, it creates tension and makes the whole process exponentially worse. No matter what Apple tells you, OD has issues in 10.5 and a lot of times a simple power outage can destroy it. Have a plan B, and use your casper tools to execute that plan B if you need it. Also have a plan B on your OD. I have a back up script that backs up OD every day, and then only keeps 30 days of back ups. I was able to demote all servers, destroy LDAP, wipe and reload that one server that could not be fixed in a timely manner, and bring it back into OD and restore a back up from a few days ago so nothing was lost.
Just FYI
Posted on 08-06-2010 07:38 AM
No matter what Apple tells you, OD has issues in 10.5 and a lot of times
a simple power outage can destroy it. Have a plan B,
We have graceful shutdown via UPS. They never come back on when the
power is restored (problem with how OS X handles it I read) but at least
they go down correctly and don't cause corruption. Glad you were able to
- JD
Posted on 08-06-2010 07:42 AM
We have UPS systems that shut down servers, however OS X 10.5.8 is
prone to hanging on shutdown. A lot of times when I manually reboot the
servers via ARD commands after an update or just to reboot them every
once in a while it will hang. I have to ssh and force a reboot via
shutdown -r NOW command.
I have seen 10.5 server hang on shutdown since the beginning of us
running it.