NetBoot not working, what's going on here?

skeb1ns
Contributor

Hi,

I've spend the last 2 days trying to get NetBoot working and I'm pulling my hair out right now.

Situation:

Mac Mini, OS X El Capitan with Server configured. Created a BootImage based on El Capitan with AutoCasperNBI, everything is set up using info from various sources.

Trying to perform a NetBoot from a MacBook Pro (mid 2015) with a network connection that is in the same VLAN as the Mac Mini results in a black screen and a reboot to it's internal SSD after a timeout. I used crsutil in the recovery partition to 'whitelist' the server (which apparently has to be done with the introduction of El Cap).

I see in the system.log on the server lots of NetBoot: [1,68:5b:35:ca:29:9f] BSDP ACK[LIST] sent 10.90.78.4 pktsize 343 logs. So my MacBook gets an IP adres, asks the Server for a boot image but never moves to the BSDP ACK[SELECT] phase, where it selects my default image and proceeds with the boot.

I've checked the folder permissions on the NetBootSP0 folder and those seem to be fine (Everyone has read rights). I also created a default image with the System Image Utility, to see if that could be the issue: doesn't make a difference. Stopped and started the NetBoot service, rebuild the folder structure, nothing...

Already checked a lot of suggestions floating around here and @ Google but not success so far. Does anyone have an idea what could be the issue?

Thanks in advance!

27 REPLIES 27

TJ_Edgerly
New Contributor III

Are you just not able to see the server? ...or are you trying to boot and the NBI does not load?

skeb1ns
Contributor

I'm trying to boot but the NBI doesn't load. What is also weird is that the network boot option doesn't show up when holding alt during startup, it's only selectable in the startup disk @ the recovery environment.

bentoms
Release Candidate Programs Tester

@rschenk What's the NetBoot server?

skeb1ns
Contributor

@bentoms

You mean version? 5.0.15.

Also, after some basic troubleshooting (just trying to ping my client) I now see that I actually don't have a network connection at the time the Mac reboots to proceed with NetBoot. Rebooting in Recovery Mode -> DHCP gives me an address, I'm able to select the NBI -> Reboot, network connection lost.

So I guess that I'm not getting a network connection in the Boot Menu, even though it triggers a bootpd search.

gskibum
Contributor III

I went looking for a different pub but found this one instead. Perhaps it will get you somewhere.

https://support.apple.com/en-us/HT203437

I've often had off & on issues with NetBooting certain devices, while other devices will work just fine all of the time. One day a device might refuse to NetBoot and the next day it works just fine.

stevewood
Honored Contributor II
Honored Contributor II

@rschenk verify that your NetBoot server has the NBI hosted in the correct format. I ran into a similar issue where I had created the NBI (using @bentoms tool) as an NFS hosted NBI, but it was set to HTTP on the NetBoot server (or vice versa).

Double click the NBI in the Netinstall screen of Server.app and verify:

optional image ALT text

Try both NFS and HTTP to see if that helps.

skeb1ns
Contributor

@stevewood

Checked both options, and also created a new image specifically for HTTP Booting, no result. I'm now planning to take the Mac Mini and Client home and test it there to see if it's network related somehow.

lunddal
Contributor

Have you found a solution?

I'm have a similar problem. Suddenly my El Capitan image doesn't work (neither does new ones).

The clients can boot from the 10.10 or 10.9 image (if they support that), but the 10.11.2 and .3 images don't work.

They did two weeks ago.

Sandy
Valued Contributor II

I have netboot sorta working on 2 minis running 10.10.5 and server.app 5.0.15.
Netboot OS is 10.11.3, as we have new devices arriving shortly.
The booting is incredibly slow, using AutoDMG w/ AutoCasperNBI, and also old school w/ SIU, neither any better...
SO slow and uneven that with a prestage I almost have to wait for the first to get all the way to the desktop before starting the next to assure they go in order! The diskless piece is horrible. I can netboot 5 computers, and of those 5, 2 or 3 will not get a shadow file on the server, which then causes it to be created on the local HD, so Casper Imaging cannot erase. Once that happens, the drive is a total bitch to get wiped as it will not unmount. This is happening intermittently on BOTH servers.
(Yes, and for staff I do not nuke and pave, but for student devices, I do)
I have dumped everything and set back up from scratch, with no improvement.
Also, the "load balancing" which used to nicely alternate automatically between two netboot servers with same image ID is pitiful.
I have a call today with Apple.
My plan B is to boot to a USB drive and then image from the network. When I do that, I get past the OS block copy, installs one package, the rest are set to install at reboot. It copies 2 or three packages, and then the screen goes black, the IP address vanishes and everything fails. Screen lock is OFF, screensaver is OFF, Sleep is set to 3 hours.
Delete the record, reuse the prestage, reboot the target with the same USB, works perfectly. WTH

bentoms
Release Candidate Programs Tester

@Sandy Please report back with what Apple say.

I was expecting SIU & ACNBI to be the same, as ACNBI is based on what SIU does.

I do wonder if 10.11.4 changes this, but NDA & haven't tested.

@rschenk Any update?

@lunddal What's the netboot server? How was the netboot created? Any logs?

Sandy
Valued Contributor II

Apple gave me some commands to run... to gather more info....
Also suggested I upgrade my minis (my netboot servers) from 10.10.5 to 10.11.3 only because if we take it to engineering they would require it anyway....
Today I netbooted 5 in a row and all worked, so who knows?
My Network Admin does not admit to making any changes.

skeb1ns
Contributor

Allright, checked the setup @ my home. Worked flawlessly via NFS, so the culprit is the network @ work somehow.

Checked with a colleague yesterday to see if he knew a solution but he didn't unfortunately. I guess that I have to ask our network admins nicely for help.

Bottomline -> Client gets IP address in the "holding ALT to select boot device" stage according to our Windows DC DHCP servers (are you supposed to be able to ping the client during that stage btw?) -> there is traffic between the Client and the Mac Mini -> Mac Mini presents a list of bootable Images to the Client, but the Client does nothing with that information thus generating an endless list of BSDP ACK[LIST] entries in the system.log on the server.

lunddal
Contributor

@bentoms It's an old 10.5.8 server that has been working so far and still works with older images.

It never finishes booting the 10.11.x images.

Edit: I've now tested on a different server on a different location. Same result. The El Capitan images don't work anymore.

bentoms
Release Candidate Programs Tester

@rschenk Did HTTP work in the office?

@lunddal How was the Image created? How long have you waited?

lunddal
Contributor

@bentoms I've let it run for at least an hour.

At first I thought that the certificate problem with installers downloaded before February 4 was the problem, but an image made from a newly downloaded El Capitan installer doesn't work either.

bentoms
Release Candidate Programs Tester

@lunddal Can you enable verbose boot & see if anything sticks out?

lunddal
Contributor

It just stops and nothing sticks out.

I tried to clone the image to a harddrive, but that doesn't boot either.

I'm not using AutoCasperNBI btw. I'm using AutoDMG and Casper NetInstall Image Creator.

I will try to make an image using AutoCasperNBI.

lunddal
Contributor

AutoCasperNBI works (yeah!!).

It's using Netboot where CNIC is using NetInstall.

But it seems to be a bug in CNIC.

PS: The Danish keyboard layout seems to be missing from AutoCasperNBI in Advanced.

kturnbeaugh
New Contributor III

We use System Image Utility to Create our Netboot set and bless it now, use Composer to create our Base Images, and have an Xserve racked in our Data Center as our Netboot server. We had to add the IP of the netboot server to the IP helper tables of the VLANs to get it to see when holding Option/Alt the NBI option, and to prevent timeout issues for booting. We also have to have them connected via ethernet, as Networking has a fit if we try to do anything over wireless. Before we added the IP to the helper tables it would stall and then give us a black screen. Sometimes restarting the netboot server would fix it, but the thing that fixed it permanently (so far) for us was adding it to the IP Helper table and making sure it was racked in the data center. When it was on the same VLAN or even tried to traverse before racking it didn't work consistently.

spowell01
Contributor

I just wanted to chime in here since we are getting similar behavior with our 10.11.4 NetRestore images. We have had a mac mini on 10.10.3 for some time now that serves all of our netboot/netrestore images. We recently built a new imaging lab for a new location and i setup a fresh mac mini with 10.11.4, latest server, and created a new 10.11.4 netrestore NBI via SIU. The results we see when booting this image is extremely sporadic....you might get a machine that will quickly complete the net restore process, but you might only get 1/2 to 3/4 through the boot progress bar and it will just hang. I just took the 10.10.3 netboot image that we run from our old imaging lab and confirmed that machines are able to netbook this image from the new mac mini server in a timely manner, but not the 10.11.4 netrestore images.....any thoughts on why 10.11.4 restores would be so inconsistent?

Now I'm going to try moving the 10.10.3 netboot server up to our new imaging lab and see if it can handle serving out the 10.11.4 net restore image any better than the fresh 10.11.4 server.

-Update
and our 10.11.4 net restore image seems to complete in a decent amount of time when served from our new location with the mini thats running Yosemite.....something is up with 10.11.4 server and net restore from what we are seeing.

My next plan of action is to take the new mac mini that has an SSD and 10.11.4, bring it back down to 10.10.X, and just not look at 10.11.4 for our netboot server yet.

gskibum
Contributor III

@spowell01

Today I'm having trouble NetBooting from a 10.11.4 server, whereas before it worked.

Clients simply don't complete the boot. I'm thinking there's something up with 10.11.4 as well.

Go Apple.

spowell01
Contributor

Thanks for the response @gskibum at least we aren't alone. Are you using an actual netboot image, netrestore or netinstall? Our netboot image is 10.10.

does anyone have any idea why we would also be seeing inconsistent hostnames in the netinstall connection list? we are seeing sporadic results with some being fully qualified and others not

MacBook-Pro-2.local
MacBook-Air.local
MacBook-Pro-3.kibsd.org

The machines are obviously not bound to the domain as they are just in the boot process, but if one machine shows a fully qualified name, I kinda expect them all to reflect that...inconsistencies

kerouak
Valued Contributor

As a temp fix, you can do this

Netbot via command line:
sudo bless --netboot --nextonly --server bsdp://<ip address of the netboot server>
sudo shutdown -r now

EricD
New Contributor

So I've been having the same issues as well. It only pertains to the 10.11.4 image. As a workaround I've created a boot image using the old 10.11.2 (didn't have the 10.11.3) and included the combo update 10.11.4 in the packages field. That seems to work however I've noticed this only occurs on certain models and believe it may be an issue with Apple querying the model identifier (Only a guess) in the 10.11.4 builds. I believe this is a bug and I hope they fix it. Absolutely annoying!!!!!

jhbush
Valued Contributor II

...from our friends at the fruit company.

Engineering is aware of this issue and at present has an open case on it which remains under investigation. Testing thus far seems to reveal different behavior by different models of portables, though beyond that we are still in the early stages of the investigation. As a workaround, we recommend using 10.11.3 for this workflow until we can identify and address the issue with 10.11.4.

EricD
New Contributor

@jhbush1973 Nice, thanks for the info. Good to hear that their at least aware of the issue

dferrara
Contributor II

@kerouak Thanks for the command. If anyone tries to run it remember you have disable SIP first with csrutil disable.