Something.... broke

gersteina1
New Contributor III

I have about 100 machines that have been imaged with our Fall 2012 image (10.7.4). We have had to use Deploy Studio to get some of them imaged (10.6 won't netboot to a 10.7 based NetBoot set, no matter what I do - another topic for another day), but some have also been done by using Casper.

All are being managed by Casper at this point.

I have, just this afternoon, had a big problem: when anyone tries to login, they are greeted by a grey screen. It sits there for a very long time - 30 minutes for the one next to me as I write this.

I thought it was related to a mix up earlier today with a Config profile that was supposed to propogate the Energy Saving schedule that we use (turn wake/turn on in the morning, turn off at night, etc) - when I added a room that I was in to the group, the machine I was on immediately wanted to turn off for some reason.

I then discovered the login issue (above). I've tried removing the Config profile from the machines, no change. I've gone through other changes that have been made in the last few days, nothing. I can't, for the life of me, figure it out.

Has anyone else seen this behavior??? So far the only fix is to re-image, but I don't relish having to spend what time is left today and part of next week re-doing the last three days work if I can avoid it.

7 REPLIES 7

lisacherie
Contributor II

If the management user is able to log in quickly, however new users are slow to complete first login (and fast for subsequent logins), the user template might be very large.

Use this as the root user to check the size:

cd /System/Library/User Template/; du -kh -d 1 .

gersteina1
New Contributor III

Thanks for the reply - the English template (our only one) is 389mb. The problem is that no-one can log in - not even my local admin account.

jarednichols
Honored Contributor

If you've got management on the machine, see if you can SSH into it while logging in graphically and see what's being dumped into the logs.

gersteina1
New Contributor III

That is an excellent idea - and if they hadn't just closed the building due to a small flood in the basement, I'd tell you what I find...

Monday it is!

Thanks!

tomt
Valued Contributor

This reminds me a lot of the issue I had a while back while pushing Adobe Acrobat. Let me do a little searching as I don't remember it exactly at the moment.

Ok, found it. It was an issue where the package was installing a version of /private/var/audit/current that was captured during the packaging. I removed that file from the effected client machines and all would then log in normally. Try removing that file (I actually removed the whole /private folder from my package).

Original thread here:
https://jamfnation.jamfsoftware.com/discussion.html?id=46

gersteina1
New Contributor III

Thanks for the suggestion, Tomt, but that wasn't it. I just went back and re-imaged a bunch of them - deadlines looming, last thing I want to do right now is make that harder to deal with.

andyinindy
Contributor II

Same situation here.

We are seeing a very high failure rate with our lab imaging via Casper. It seems that the post-imaging tasks/installations do not occur, so we end up with half-imaged systems that are basically unusable.

The solution has been to go back and reimage them, which usually works. However, this begs the question: why are so many failing the first time through? We had a zero percent failure rate with Deploystudio, which really makes me want to abandon Casper imaging altogether and return to a solution that works reliably.

Incidentally, the casper and deploystudio netboot images are hosted on the same server, so there shouldn't be any discrepancy with regard to hardware capacity, network bandwidth, etc.

The only conclusion that I can reasonably make at this point is that Casper imaging needs work.