Posted on 08-15-2013 11:32 AM
We're in the process of attempting to image sets of 31 new 2013 iMac's in school labs all across our district and we're running into a ton of grief..
Has anyone out there experienced problems imaging too many systems at once? When we NetBoot all 31 iMacs at a school and start each one imaging, inevitably a large number (majority) of the systems will fail to install all of the packages correctly. We end up having to do 10 or 15 at a time and then things seems to image more reliably. There is a Mac mini/Promise RAID server at each school providing NetBoot services which also hosts the Casper images locally.
The labs are made up of 31 iMacs all with gigabit network connections back to Cisco switches. Prior labs were made up of 2006 iMacs on 100mbit network connections on older Cisco gear and we occasionally ran into the same problems. Any input or advice would be greatly appreciated!
Posted on 08-15-2013 11:39 AM
Have not seen this... typically a network bandwidth or throughput issue. Do the network admins have SNMP enabled and/or using any kind of monitoring software (nmap/zenmap) to see exactly what is going on?
Don't know if it's possible or feasible, but could you utilize a dumb Gigabit switch and patch the Casper server directly into that switch, along with the lab machines? That would confirm that it was a network configuration and/or bandwidth issue.
Posted on 08-15-2013 12:09 PM
When we used NB we limited it to 10 because even though our Network is huge and fast, something was seeing the NB traffic and throttling it. Redtape aside we just moved to USB sticks.
Posted on 08-15-2013 12:09 PM
Double :(
Posted on 08-16-2013 04:45 AM
Separate the NetBoot service and Distribution points into 2 different servers and test.
Other thing you can do is to use the RAM Disk method for your NetBoot images to lower the server utilisation as described here;
http://www.macos.utah.edu/documentation/administration/setup_netboot_service_on_mac_os_x_10.6.x_clie...
Posted on 08-16-2013 06:24 AM
Have you checked the I/O load on your NetBoot server (specifically the Mac Mini not the RAID. We run between 30-50 machines at a time in labs but we made sure that we purpose built our netboot servers to fly at that load. A promise RAID should be fast enough to handle the shadow volumes... Just to make sure, your .nbi is hosted from the RAID correct, not just the DP correct?
Posted on 08-16-2013 06:25 AM
Have you checked the I/O load on your NetBoot server (specifically the Mac Mini not the RAID. We run between 30-50 machines at a time in labs but we made sure that we purpose built our netboot servers to fly at that load. A promise RAID should be fast enough to handle the shadow volumes... Just to make sure, your .nbi is hosted from the RAID correct, not just the DP correct?
Posted on 06-09-2014 12:51 PM
I have the same problems ... but it comes down to only 10 machines... with super fast server and network...
Posted on 06-09-2014 12:55 PM
Posted on 06-09-2014 12:57 PM
hi
thats a procedure we already applied on our net boot image...
i will double check again...
Posted on 06-10-2014 07:29 AM
I have to ask. Are you running this in multicast? I've seen these problems with other folks if so.
Posted on 06-10-2014 07:31 AM
hi
actually im not even sure if im using unicast or multicast... im just imaging basic and standard image from simple net boot... so i assume its unicast
Posted on 06-10-2014 07:37 AM
In that case I would agree. In general, you have to turn on multicast. Now, with that said, I'm also not sure that NetBoot is your issue as it seems that the machines are booting properly but failing or dropping off during transfer. Diskless netboot is critical, however it will then create shadow volumes on your netboot server. If you don't have the space or the RAM to deal with it along with all the other requests this is the behavior you will see.
Also a 100Mbit network will slow down the process but it shouldn't kill any of the streams. I've had network segments in the past where there was a 100 Mbit link and it just slowed everything down to a crawl and that's it. Everything kept functioning normally.
Posted on 06-10-2014 07:43 AM
Hi
Whats weird... is that back in january with our old installation casper 8.73, everything was fine with the lab... imaging fine etc... now with that casper 9.31 failing.... but only on more than 4-5 machines...
1-2 machines works fine every time... all the time... my net boot is diskless for sure, and the user have access everywhere on our repository for net boot.
I also checked our server disk, network, ram utilisation while imaging... the server is mostly asking himself what to do during imaging... nothing much on that side... but on the client side... its 100mb... but i know its been slow past in january but all computers were imaging... not dropping...
Posted on 06-10-2014 07:50 AM
Interesting. Those numbers are WAY too low. Beyond that I only just realized (after my last post) that you've got everything coming out of a promise RAID. In which case we can discount my mention of storage space (Assuming it's not full).
Looks like I'm going to have to fall back on the network infrastructure path. There IS something getting in your way. There should be no reason you're limited to such a small number of successful imaging connections. It also may be worth a rebuild of the netboot server just in case. I don't know what anyones upgrade policies are on servers but it's something I NEVER do for these reasons. Regardless, I'm grasping at straws on that one. Only you know how it was set up.
Do you have someone that can evaluate the connections on the network?
Posted on 06-10-2014 07:53 AM
i am not getting from promise RAID... im getting that from ATTO thunderbolt to fiber adapter... then this connected to an IBM SAN...
i have plenty of space... 3TB and 2TB free
We checked network log and everything on that side and nothing shows up as problems... and the only thing that changed from the last deployment in january and now... is the new server mac pro and casper 9.31...
i know its a lot but should not be a problem... thats weird considering 1 to 3 machines at a same time works fine...
Posted on 06-10-2014 07:55 AM
Fair enough... quick question. Which version of Casper imaging are you using on your .nbi?
Posted on 06-10-2014 08:08 AM
9.3
Posted on 06-10-2014 08:18 AM
Hrm... there goes another one. I am also running 9.3 I was just wondering because there were known issues with casper Imaging 9.23 through the next few versions. 9.3 was the first safe one I was willing to use. You said your new Mac Pro was built from scratch to host the JSS in January correct?
Posted on 06-10-2014 08:19 AM
yes... its brand new...
BUT !!! hey... multicast running now... probably wrong configuration of the casper utility for multicasting...
its now installing my image... ill let you guys know in couple of minutes... with 1 computer testing... and later today with hole lab... 25 machines...
Posted on 06-10-2014 08:22 AM
I would be really really surprised if that helps. Then again, I've seen stranger stuff. It will be interesting to hear regardless. Generally multicast is a horrible idea in this circumstance and causes far more issues than it was intended to solve. Yet, any info is good info while we're trouble shooting.
Posted on 06-10-2014 09:55 AM
welll bad news...
the client and the server initiate the multicast session... the server sends tons of GB of information... but no data is received on the client...
Posted on 06-10-2014 10:22 AM
... wait a min. Are you saying that the client cannot connect to the DP?
Posted on 06-10-2014 12:04 PM
that's not what im saying... all my clients connects to the DP...
What im saying is that when i initiate the multicast server, i see my server throwing tons of data on the network...
then i start my client on net boot and with the multicast restore.sh script... and it stays there forever and no data is copied to the client
Posted on 06-11-2014 08:20 AM
Ahhh... how interesting. So no success at all via multicast. I'm sure that tells us something but I am certain NOT a network expert. There must be a way to narrow this down. Is there anyway for you to test this on it's own desecrate network? Grabbing a 16 port gig switch and turning on DHCP (assuming that you've got the Mac's DNS service on and pointing to itself) ought to get you there. If it works on 15ish machines on a discrete switch you'll know it's a network related issue.
Posted on 06-12-2014 08:04 AM
Hi
i tested 2 machines on simple switch... 1 acting as the multicast server and the other one client...
The multicast process starts... the server output a lot of data... the client input a lot of data through the network...
But it never ends !!! it copies and copies and copies data without ending... !!! There is no read write info on the hdd... only on the network...
and on the client machine... i have this line in terminal and nothing shows up after...
PSTT 0 100 START RESTORE
Nothing after that....
Posted on 06-12-2014 08:13 AM
Well, the good news is that you've at least narrowed the issue down to server configuration. Unfortunately, my knowledge of multicast troubleshooting is rather limited. Anyone else have any ideas? That said, are you planning on moving to multicast for production? I thought that you were using it as a test. If you're not going to use multicast I would go back to testing this issue via unicast so as to avoid troubleshooting issues that aren't relevant to you.
Posted on 06-12-2014 08:19 AM
hi...
the thing is... im having the same problem on my real server... AND on my little test machine acting as a multicast server... So 2 multicast server giving the same error... must be something im missing...
Unicast... as soon as i unicast 4-5 machines or more... the unicast crash... which was working fine in january with casper 8.73... and now with 9.3 crashes...
Posted on 06-12-2014 08:30 AM
Bummer of course. Yet, so far as I know 9.3 doesn't really cause this to happen. I know that it's the last thing that was done before the issue but many of us are running 9.3, 9.31 and 9.32. I mean you can try the upgrade to 9.32 (where I'm at at the moment).
With that said, which version of MySQL is your JSS running?
Posted on 06-12-2014 08:37 AM
Ver 14.14 Distrib 5.6.15
Posted on 06-12-2014 08:42 AM
Ahhh... I've personally experienced significant issues with 5.6 as it enforces strict mode as well as a few other funky things. Check out this thread
https://jamfnation.jamfsoftware.com/discussion.html?id=6493
I stick with 5.5
Posted on 06-12-2014 08:47 AM
ill make the downgrade process and let this thread know
Posted on 06-16-2014 11:27 AM
hi everyone... just to let you know...
downgrading the mysql did not fix our problem...
impossible to unicast the base image for more than 3-4 machines... while back in january with old server 8.73 everything was working fine !!!
im getting bit tired of working on that...
Posted on 06-17-2014 05:26 AM
How interesting and unfortunate. I know that the "everything worked in 8.73" thing going on. Can you setup a quick test environment for 8.73 for verification purposes? I happen to know many folks (myself included) that simply aren't seeing the issues that you're having. Something deeper is going on here. Also, (don't flame me for asking) do you have a support ticket open with JAMF? Not that it's the ultimate solution but another set of eyes needs to take a look at your install/network.
At this point I'd even suggest setting up that test JSS if you don't already have one. Even make it it's own netboot and master DP. It doesn't matter how fast or slow it is as it's the number of maintained connections you want to test.
Posted on 06-17-2014 06:15 AM
WOW... finally got the solution !!!!
back in january, we decided to publish our share with SMB instead of AFP because of Apple maybe letting AFP go in the future...
The SMB was causing all the trouble... putting all the shares back to AFP, and hop my unicast imaging works fine again and everything goes well
Posted on 06-17-2014 06:27 AM
Nice to see you have the solution! Make sure that you figure that one out with how aggressively Apple seems to be wanting to move away from AFP in favor of their own version of SMB2!
It's always something isn't it!