Imaging overload?

design_ev
New Contributor

We're in the process of attempting to image sets of 31 new 2013 iMac's in school labs all across our district and we're running into a ton of grief..

Has anyone out there experienced problems imaging too many systems at once? When we NetBoot all 31 iMacs at a school and start each one imaging, inevitably a large number (majority) of the systems will fail to install all of the packages correctly. We end up having to do 10 or 15 at a time and then things seems to image more reliably. There is a Mac mini/Promise RAID server at each school providing NetBoot services which also hosts the Casper images locally.

The labs are made up of 31 iMacs all with gigabit network connections back to Cisco switches. Prior labs were made up of 2006 iMacs on 100mbit network connections on older Cisco gear and we occasionally ran into the same problems. Any input or advice would be greatly appreciated!

35 REPLIES 35

RobertHammen
Valued Contributor II

Have not seen this... typically a network bandwidth or throughput issue. Do the network admins have SNMP enabled and/or using any kind of monitoring software (nmap/zenmap) to see exactly what is going on?

Don't know if it's possible or feasible, but could you utilize a dumb Gigabit switch and patch the Casper server directly into that switch, along with the lab machines? That would confirm that it was a network configuration and/or bandwidth issue.

Matt
Valued Contributor

When we used NB we limited it to 10 because even though our Network is huge and fast, something was seeing the NB traffic and throttling it. Redtape aside we just moved to USB sticks.

Matt
Valued Contributor

Double :(

Kumarasinghe
Valued Contributor

Separate the NetBoot service and Distribution points into 2 different servers and test.

Other thing you can do is to use the RAM Disk method for your NetBoot images to lower the server utilisation as described here;
http://www.macos.utah.edu/documentation/administration/setup_netboot_service_on_mac_os_x_10.6.x_clie...

Chris_Hafner
Valued Contributor II

Have you checked the I/O load on your NetBoot server (specifically the Mac Mini not the RAID. We run between 30-50 machines at a time in labs but we made sure that we purpose built our netboot servers to fly at that load. A promise RAID should be fast enough to handle the shadow volumes... Just to make sure, your .nbi is hosted from the RAID correct, not just the DP correct?

Chris_Hafner
Valued Contributor II

Have you checked the I/O load on your NetBoot server (specifically the Mac Mini not the RAID. We run between 30-50 machines at a time in labs but we made sure that we purpose built our netboot servers to fly at that load. A promise RAID should be fast enough to handle the shadow volumes... Just to make sure, your .nbi is hosted from the RAID correct, not just the DP correct?

jmercier
Contributor II

I have the same problems ... but it comes down to only 10 machines... with super fast server and network...

jmercier
Contributor II

hi

thats a procedure we already applied on our net boot image...

i will double check again...

Chris_Hafner
Valued Contributor II

I have to ask. Are you running this in multicast? I've seen these problems with other folks if so.

jmercier
Contributor II

hi

actually im not even sure if im using unicast or multicast... im just imaging basic and standard image from simple net boot... so i assume its unicast

Chris_Hafner
Valued Contributor II

In that case I would agree. In general, you have to turn on multicast. Now, with that said, I'm also not sure that NetBoot is your issue as it seems that the machines are booting properly but failing or dropping off during transfer. Diskless netboot is critical, however it will then create shadow volumes on your netboot server. If you don't have the space or the RAM to deal with it along with all the other requests this is the behavior you will see.

Also a 100Mbit network will slow down the process but it shouldn't kill any of the streams. I've had network segments in the past where there was a 100 Mbit link and it just slowed everything down to a crawl and that's it. Everything kept functioning normally.

jmercier
Contributor II

Hi

Whats weird... is that back in january with our old installation casper 8.73, everything was fine with the lab... imaging fine etc... now with that casper 9.31 failing.... but only on more than 4-5 machines...

1-2 machines works fine every time... all the time... my net boot is diskless for sure, and the user have access everywhere on our repository for net boot.

I also checked our server disk, network, ram utilisation while imaging... the server is mostly asking himself what to do during imaging... nothing much on that side... but on the client side... its 100mb... but i know its been slow past in january but all computers were imaging... not dropping...

Chris_Hafner
Valued Contributor II

Interesting. Those numbers are WAY too low. Beyond that I only just realized (after my last post) that you've got everything coming out of a promise RAID. In which case we can discount my mention of storage space (Assuming it's not full).

Looks like I'm going to have to fall back on the network infrastructure path. There IS something getting in your way. There should be no reason you're limited to such a small number of successful imaging connections. It also may be worth a rebuild of the netboot server just in case. I don't know what anyones upgrade policies are on servers but it's something I NEVER do for these reasons. Regardless, I'm grasping at straws on that one. Only you know how it was set up.

Do you have someone that can evaluate the connections on the network?

jmercier
Contributor II

i am not getting from promise RAID... im getting that from ATTO thunderbolt to fiber adapter... then this connected to an IBM SAN...

i have plenty of space... 3TB and 2TB free

We checked network log and everything on that side and nothing shows up as problems... and the only thing that changed from the last deployment in january and now... is the new server mac pro and casper 9.31...

i know its a lot but should not be a problem... thats weird considering 1 to 3 machines at a same time works fine...

Chris_Hafner
Valued Contributor II

Fair enough... quick question. Which version of Casper imaging are you using on your .nbi?

jmercier
Contributor II

9.3

Chris_Hafner
Valued Contributor II

Hrm... there goes another one. I am also running 9.3 I was just wondering because there were known issues with casper Imaging 9.23 through the next few versions. 9.3 was the first safe one I was willing to use. You said your new Mac Pro was built from scratch to host the JSS in January correct?

jmercier
Contributor II

yes... its brand new...

BUT !!! hey... multicast running now... probably wrong configuration of the casper utility for multicasting...

its now installing my image... ill let you guys know in couple of minutes... with 1 computer testing... and later today with hole lab... 25 machines...

Chris_Hafner
Valued Contributor II

I would be really really surprised if that helps. Then again, I've seen stranger stuff. It will be interesting to hear regardless. Generally multicast is a horrible idea in this circumstance and causes far more issues than it was intended to solve. Yet, any info is good info while we're trouble shooting.

jmercier
Contributor II

welll bad news...

the client and the server initiate the multicast session... the server sends tons of GB of information... but no data is received on the client...

Chris_Hafner
Valued Contributor II

... wait a min. Are you saying that the client cannot connect to the DP?

jmercier
Contributor II

that's not what im saying... all my clients connects to the DP...

What im saying is that when i initiate the multicast server, i see my server throwing tons of data on the network...

then i start my client on net boot and with the multicast restore.sh script... and it stays there forever and no data is copied to the client

Chris_Hafner
Valued Contributor II

Ahhh... how interesting. So no success at all via multicast. I'm sure that tells us something but I am certain NOT a network expert. There must be a way to narrow this down. Is there anyway for you to test this on it's own desecrate network? Grabbing a 16 port gig switch and turning on DHCP (assuming that you've got the Mac's DNS service on and pointing to itself) ought to get you there. If it works on 15ish machines on a discrete switch you'll know it's a network related issue.

jmercier
Contributor II

Hi

i tested 2 machines on simple switch... 1 acting as the multicast server and the other one client...

The multicast process starts... the server output a lot of data... the client input a lot of data through the network...

But it never ends !!! it copies and copies and copies data without ending... !!! There is no read write info on the hdd... only on the network...

and on the client machine... i have this line in terminal and nothing shows up after...

PSTT 0 100 START RESTORE

Nothing after that....

Chris_Hafner
Valued Contributor II

Well, the good news is that you've at least narrowed the issue down to server configuration. Unfortunately, my knowledge of multicast troubleshooting is rather limited. Anyone else have any ideas? That said, are you planning on moving to multicast for production? I thought that you were using it as a test. If you're not going to use multicast I would go back to testing this issue via unicast so as to avoid troubleshooting issues that aren't relevant to you.

jmercier
Contributor II

hi...

the thing is... im having the same problem on my real server... AND on my little test machine acting as a multicast server... So 2 multicast server giving the same error... must be something im missing...

Unicast... as soon as i unicast 4-5 machines or more... the unicast crash... which was working fine in january with casper 8.73... and now with 9.3 crashes...

Chris_Hafner
Valued Contributor II

Bummer of course. Yet, so far as I know 9.3 doesn't really cause this to happen. I know that it's the last thing that was done before the issue but many of us are running 9.3, 9.31 and 9.32. I mean you can try the upgrade to 9.32 (where I'm at at the moment).

With that said, which version of MySQL is your JSS running?

jmercier
Contributor II

Ver 14.14 Distrib 5.6.15

Chris_Hafner
Valued Contributor II

Ahhh... I've personally experienced significant issues with 5.6 as it enforces strict mode as well as a few other funky things. Check out this thread

https://jamfnation.jamfsoftware.com/discussion.html?id=6493

I stick with 5.5

jmercier
Contributor II

ill make the downgrade process and let this thread know

jmercier
Contributor II

hi everyone... just to let you know...

downgrading the mysql did not fix our problem...

impossible to unicast the base image for more than 3-4 machines... while back in january with old server 8.73 everything was working fine !!!

im getting bit tired of working on that...

Chris_Hafner
Valued Contributor II

How interesting and unfortunate. I know that the "everything worked in 8.73" thing going on. Can you setup a quick test environment for 8.73 for verification purposes? I happen to know many folks (myself included) that simply aren't seeing the issues that you're having. Something deeper is going on here. Also, (don't flame me for asking) do you have a support ticket open with JAMF? Not that it's the ultimate solution but another set of eyes needs to take a look at your install/network.

At this point I'd even suggest setting up that test JSS if you don't already have one. Even make it it's own netboot and master DP. It doesn't matter how fast or slow it is as it's the number of maintained connections you want to test.

jmercier
Contributor II

WOW... finally got the solution !!!!

back in january, we decided to publish our share with SMB instead of AFP because of Apple maybe letting AFP go in the future...

The SMB was causing all the trouble... putting all the shares back to AFP, and hop my unicast imaging works fine again and everything goes well

Chris_Hafner
Valued Contributor II

Nice to see you have the solution! Make sure that you figure that one out with how aggressively Apple seems to be wanting to move away from AFP in favor of their own version of SMB2!

It's always something isn't it!