Posted on 01-10-2017 09:35 AM
Happy New Year JN!
My environment is experiencing strange NetBoot issue. Before I get to that let me give you the answer to "What has changed?"
The University of NM closes completely over the holidays. This is a perfect time to perform upgrades, migrations, and other changes. Over this past break session, our networking team migrated from Tipping Point to Palo Altos in our DC. ALL existing rules were migrated to the Palos (supposedly).
Since the migration, certain vintages of the Macs I manage no longer complete the NetBoot process-they time out and then boot to whatever resident OS is currently on the hard drive of said Mac.
Tested models that do NOT work: Late 2015 iMac, Late 2012 Mac Mini, Late 2012 iMac.
Tested models that DO work: MPB's from 2010-there may be others but we have not found/tested them.
We are currently packet sniffing and trying to chase it down from that end. I have a Test Mac Pro with a Dev/Test JSS that is outside of the data center and all models work and Netboot as expected.
Obviously, there is something that is not routing or being allowed that did not transfer during the migration. The baffling thing is that it should be binary-they all either NetBoot or they don't. But, that is not the case.
My question is simple: Are there different protocols, behavior, firmware or ??? between models with respect to NetBoot processes?
If I need to clarify or answer any questions, please ask. In my role, I do not have much insight into the various network infrastructure but I should be able to find the answer.
Things I have Googled: https://support.apple.com/en-us/HT203437 , https://static.afp548.com/mactips/netboot.html
Posted on 01-10-2017 11:45 AM
@Randydid do you use IPHelper Addresses for NetBooting across subnets? I ask because here at our college there were updates done some time ago and the IPHelpers had to be replaced/added back into the scope. The only reason I don't suspect this is because you mentioned about specific models. It seems highly random that specific models would be in specific subnets that cannot reach the NetBoot server.
This is the only thing that comes to mind off the top. I am sure others will have thoughts on this as well.
Posted on 01-10-2017 11:52 AM
Yes, there are at least two major versions of NetBoot. I recall on our Xserve there was a plist that we had to modify for some machines. The plist was /etc/bootpd.plist.
<key>old_netboot_enabled</key>
<false/>
If you changed "false" to "true" the server would support NetBoot for older machines. However, I don't think I've had to use that for a few years. Our client machines are as follows: iMac12,1 (21.5-inch Mid-2010), MacPro5,1 (Mid-2010), MacPro6,1 (Late 2013), iMac17,1 (iMac 5K, 27-inch, Late 2015) and we sometimes will NetBook our MacBook Pro units which can be anything from mid-2013 to the generation before the TouchBar. Plus we have a couple of Mac Mini units, the all aluminum form-factor from various years. They are all successfully booting from our Xserve running OS X 10.6.8. (Yes, I tried updating to later versions of the OS but none of them seemed to work with our Windows clients and our Xserve is now end-of-life and will be shut off in a few weeks for the last time. Still annoyed with Apple for killing their server hardware and hobbling their server software for SOHO use instead of Enterprise/Education.)
Mr. Bombich wrote a lovely article on troubleshooting NetBoot which is very hard to find. I may have a copy.
The iMac (Mid-2010) has an issue where the kernelcache cannot exceed a certain size. There are lots of threads here on how to solve that one.
Another potential issue is a feature our network support staff call "DHCP Snooping". If this is on, NetBoot across subnets doesn't work. Not sure about in a subnet but I have to guess that since some of NetBoot's protocols are very DHCP-like that the feature would try to quash any server trying to offer this service to clients in the same subnet as well.
Ah, I found the document by Mr. Bombich. I will paste it below.
Bombich.com: Troubleshooting the NetBoot Process 11-11-01 3:15 PM
http://afp548.com/netboot/mactips/netboot.html Page 1 of 11
Troubleshooting the NetBoot process
Introduction
Network booting a computer is a fairly straightforward, yet complex task involving many
different pieces of technology. As such, troubleshooting it can be challenging. In this article I
lay out the steps of the Netboot process on Mac OS X clients and indicate what technologies
are involved at each step, how they could fail, and how to solve the issue.
1-19-06 Update: On January 10th, Apple announced new Intel-based Macs. Instead of Open
Firmware, the Intel Macs use Intel's Extensible Firmware Interface (EFI). While most of the
NetBoot process is exactly the same for EFI-based Macs, I will point out any differences
between the two platforms throughout the article. These changes will be marked with "†(EFI)".
In cases where EFI and Open Firmware behave the same, I have replaced platform-specific
language with simply "machine firmware". Also take a look at this video comparing the
NetBoot experience on the EFI and OF-based Macs.
Netboot, from the viewer's perspective
Here is a brief overview of what happens when you Netboot a client, and what you'll see on the
screen as this occurs.
1. Computer chimes when you turn it on
The computer runs a self test and loads the machine firmware.
2. A blinking globe appears.
The computer is requesting an IP address and Netboot information, and begins
downloading a boot file
3. The gray Apple logo and a small spinning globe appear
The computer is loading the boot file, which downloads and loads the kernel and
kernel extension cache
4. The spinning globe turns into a circular progress indicator
The computer has loaded the kernel and the boot process has begun. The kernel
mounts the Netboot disk image via NFS and loads the kernel extension cache. The
remainder of the boot process is mostly the same as a standard local-disk boot.
1) Machine chimes
This is the standard "POST", or power on self test, that occurs regardless of how you intend to
boot the client. If you don't hear a chime and you're sure that the audio of the machine is
working and not muted, you probably have a hardware problem.
2) Blinking globe
After the chime, the machine firmware loads, reads the boot settings, and in the case of
Netboot, starts a DHCP and BSDP (boot service discovery protocol) discovery process. Its
important to draw a distinction between the two. The two protocols are very similar in
behavior and can both be administered by the bootpd process on Mac OS X Server. It is not
necessary, however, for a client to get both DHCP and BSDP information from one server, nor
is it necessary that they even come from a Mac OS X server (although configuring another OS
to hand out Mac-specific BSDP information is not an easy task -- that is the value of Mac OS X
Server).
†(EFI): EFI provides much richer graphics support than Open Firmware -- the blinking globe
has more detail and is no longer on a square button background. Additionally, EFI loads much
faster than OF, shaving 10 - 15 seconds off the boot process.
Requirements for this step to proceed:
A DHCP server must respond with an IP address within the subnet of the Netboot
server
A Netboot server must respond with a "BSDP ACK[SELECT]" -- an acknowledgment that
it will be the server for this client
What you'll see in the server log:
Bombich.com: Troubleshooting the NetBoot Process 11-11-01 3:15 PM
http://afp548.com/netboot/mactips/netboot.html Page 2 of 11
What you'll see in the server log:
netboot_server:~ root# tail -f /var/log/system.log
bootpd[456]: BSDP DISCOVER [en0] 1,0:a:95:c4:21:9c arch=ppc sysid=PowerMac7,2
bootpd[456]: DHCP DISCOVER [en0]: 1,0:a:95:c4:21:9c
bootpd[456]: OFFER sent <no hostname> 10.0.1.7 pktsize 300
bootpd[456]: DHCP REQUEST [en0]: 1,0:a:95:c4:21:9c
bootpd[456]: ACK sent <no hostname> 10.0.1.7 pktsize 300
Above, the client simultaneously made separate DHCP and BSDP requests. The server (in this
case running both Netboot and DHCP) responds first with a DHCP response. You see the
typical DISCOVER-OFFER-REQUEST-ACK.
†(EFI): Now that there is more than one architecture (ppc and i386), it is important to point out
that a NetBooting client includes its architecture in the BSDP DISCOVER. For example,
VC:"AAPLBSDPC/i386/iMac4,1". What the NetBoot server does with this information will be
discussed in more detail in the "Architectures" section.
bootpd[456]: BSDP INFORM [en0] 1,0:a:95:c4:21:9c arch=ppc sysid=PowerMac7,2
bootpd[456]: NetBoot: [1,0:a:95:c4:21:9c] BSDP ACK[LIST] sent 10.0.1.7 pktsize 300
bootpd[456]: DHCP INFORM [en0]: 1,0:a:95:c4:21:9c
bootpd[456]: ACK sent <no hostname> 10.0.1.7 pktsize 300
bootpd[456]: BSDP INFORM [en0] 1,0:a:95:c4:21:9c arch=ppc sysid=PowerMac7,2
bootpd[456]: NetBoot: [1,0:a:95:c4:21:9c] BSDP ACK[SELECT] sent 10.0.1.7 pktsize
364
bootpd[456]: DHCP INFORM [en0]: 1,0:a:95:c4:21:9c
bootpd[456]: ACK sent <no hostname> 10.0.1.7 pktsize 300
And now the client has handled a BSDP response. The key parts here are BSDP INFORM-BSDP
ACK[LIST]-BSDP INFORM-BSDP ACK[SELECT]. If you see only parts of this "conversation", check
to see that there is not another Netboot server on the network responding to your client. A
packet trace can help rule that out (described below).
The last thing that occurs while you still see the blinking globe icon is that the client
downloads the "booter" file that you can see in the NetBoot image.nbi set
(/Library/NetBoot/NetBootSP0/image_name.nbi). The booter file is simply a copy of the
"BootX" file that you can find in /System/Library/CoreServices on any Mac OS X installation.
This file is responsible for the very first stage of booting the machine, it loads the Mac OS X
kernel file.
†(EFI): EFI uses a different booter file. The source is located at /usr/standalone/i386/boot.efi.
On a blessed volume, you will find this file at /System/Library/CoreServices/boot.efi.
Additionally, the "booter" file for EFI must be stored in an architecture-specific directory within
the NetBoot set. This will be described in more detail in the "Architectures" section.
In the case of Netboot, the location of the file is advertised in the BSDP response. If you do a
packet trace you will see a packet similar to this:
16:23:19.979291 IP (tos 0x0, ttl 255, id 58694, offset 0, flags [none], length:
382) 10.0.1.1.bootps > 0.0.0.0.bootpc: [udp sum ok] BOOTP/DHCP, Reply, length: 354,
xid:0x4149, flags: [none] (0x0000)
Server IP: 10.0.1.1
Client Ethernet Address: 00:0a:95:c4:21:9c
sname "xserve.apple.edu"
file "/private/tftpboot/NetBoot/NetBootSP0/Panther Server.nbi/booter"
Vendor-rfc1048:
DHCP:OFFER
SID:10.0.1.1
VC:"AAPLBSDPC"
RP:"nfs:10.0.1.1:/Library/NetBoot/NetBootSP0:Panther Server.nbi/Install.dmg"
VO:8.4.129.0.1.145.130.10.78.101.116.66.111.111.116.48.48.50
The firmware has a very lightweight tftp client (trivial FTP) that it uses to download this file.
Once the file is downloaded, it is executed and the boot process is handed off from firmware
to the boot file.
Potential problems
If your client does not get past the blinking globe icon, look for the following problems. As the
Netboot process is fairly difficult to troubleshoot at this stage, examine the Netboot and DHCP
Bombich.com: Troubleshooting the NetBoot Process 11-11-01 3:15 PM
http://afp548.com/netboot/mactips/netboot.html Page 3 of 11
Netboot process is fairly difficult to troubleshoot at this stage, examine the Netboot and DHCP
server logs, and perform a packet trace to see what information is coming from and going to
the client. These methods are described at the end of this article.
Problem: Client does not get an IP address
Characteristics: You may see DHCP DISCOVERs in your server's log, but not a DHCP OFFER or
ACK. You may also see BSDP SELECT[ACK]s in your logs, but the client does not proceed. A
packet trace will reveal that no OFFER broadcast is sent to the client.
Cause: A DHCP server is not available, or does not have any available IP addresses
Solution: Resolve the DHCP problem. Always verify that your client can get a DHCP address
while booted from a typical system prior to Netbooting.
Other suggestions: Make sure that there are not startup network connectivity delays. "Initial
Connectivity Delay" is the general term used to describe a short router-imposed delay to
network connectivity. On a managed switch, there are several features that prevent things
such as network looping, which can take down a network (for example, plug both ends of an
ethernet cable into a switch -- what happens? Hint: Nothing good). These protocols probe the
attached device when a connection is first detected on the port, and often take 15-30 seconds
before allowing traffic across the port. Some of the terms that you may see in relation to Initial
Connectivity Delay are "PortFast", "Spanning Tree Protocol", "Etherchanneling", and "Trunking".
There are others, but these are the ones you'll see most frequently. These are not "bad"
protocols, in fact they are quite important for a managed network environment. However, they
are not typically necessary on ports with hosts (computers) attached.
Initial Connectivity Delay can kill Netboot functionality -- a Netbooting client really needs to
have immediate network connectivity. If you notice that it takes a particularly long time for the
blinking globe to disappear, or it never does and you're sure DHCP and Netboot are
configured properly, try isolating your server and client to a private network on a dumb
switch. If performance is fine on the dumb switch, have a discussion with your network
administrator about "configuring the ports that computers are connected to for host
configuration". Most routers today have macros for easily making this change. Finally. refer to
this Cisco article for a background on Initial Connectivity Delays and how to mitigate
them (applicable to non-Cisco network gear as well)
Problem: Client DISCOVERs and DHCP server OFFERs, but client doesn't REQUEST the offered IP
address.
Characteristics: The DHCP server log shows a DHCP DISCOVER and subsequent OFFER, but no
DHCP REQUESTs. The ethernet switch is a fairly new Cisco device.
Cause: Back when they were classified by IANA as "site specific" options, Apple originally use
DHCP options 220 and 221 for NetBoot purposes. Recently those options were reclassified for
"general use", and Cisco applied for them. Now Cisco uses them in their DHCP server:
cisco-subnet-allocation 220 Cisco Subnet Allocation
cisco-vpn-id 221 Cisco VPN Identifier
Solution: As the use of these options is built into Open Firmware, its not necessarily a trivial
problem to fix from an Apple perspective. There are two simple workarounds to this problem,
however:
At the Cisco Network Registrar:
1. Disable vpn-communication at the DHCP server level or to use the ignore-ciscooptions
DHCP server attribute to cause the CNR DHCP server to ignore "cisco-vpn-id"
and/or "vpn-id".
Or, at every single Mac client:
2. Run the following command in the Terminal to disable the use of these options in
Open Firmware:
sudo nvram default-bootp-vexts="%00"
Bombich.com: Troubleshooting the NetBoot Process 11-11-01 3:15 PM
http://afp548.com/netboot/mactips/netboot.html Page 4 of 11
Then reboot the client. This change will be effective until you zap the PRAM. Also,
instead of running the command on each client, you could use Apple Remote Desktop
to "Send UNIX command" to multiple machines simultaneously.
Problem: Client performs the DHCP handshake, but fails to get a BSDP ACK[SELECT]
Characteristics: The server log shows a BSDP DISCOVER, but no BSDP ACK[LIST]s. A packet
trace will reveal that no BSDP ACK[SELECT] broadcast is sent to the client.
Cause: This could be a misconfigured Netboot server. Do you have a Netboot image enabled?
This could also be an issue with not getting an IP address within the same subnet range as the
server. DHCP and BSDP requests and initial responses occur via broadcast, thus require that
either the server and client are in the same subnet or that your routers are configured to
handle this traffic specially to facilitate DHCP and Netbooting. Finally, this could simply be a
timing issue. Sometimes the bootpd process needs to be restarted before it recognizes
configuration changes.
†(EFI): This could also occur if your NetBoot image does not support the architecture of the
machine you are trying to boot. See the "Architectures" section for more details.
Solution: Verify that you have a Netboot image enabled at your server. Try restarting the
Netboot service in Server Admin. Verify that you can see the Netboot image in the Startup Disk
preference pane while booted from the client's typical OS (also verify the client is configured
for DHCP while doing this!).
Problem: Client gets DHCP and BSDP information, but fails to download the booter file
Characteristics: You see in your server logs that your client is getting an IP address in the
same subnet as the Netboot server, and it is negotiation a Netboot set with the Netboot
server, but the client is failing to get to the gray Apple logo. You may also see a Mac OS 9-ish
blinking question mark.
Cause: First, confirm that your DHCP server is providing your client with a pingable router
address. Often, people will omit the router address for a single-subnet, isolated test network,
but this will definitely cause the NetBoot process to fail at this point. Even if a router does not
exist, you must specify an IP address that the client will be able to ARP. Specifying the IP
address of the DHCP server in cases such as this is the best approach. You can determine if
your client is getting a default router address by examining a packet trace (more info on
packet traces below):
Your IP: 10.0.1.7
Server IP: 10.0.1.1
Client Ethernet Address: 00:0a:95:c4:21:9c
sname "roscoe.bombich.com"
Vendor-rfc1048:
DHCP:OFFER
SID:10.0.1.1
LT:1197504
SM:255.255.0.0
DG:10.0.1.1
If you have confirmed that your client is getting a pingable IP address for the default router,
the this is probably a problem with tftp. After verifying that your Netboot set actually has a
booter file, test that your tftp service is working. At another client, run this command in the
Terminal, substituting your server's hostname and your Netboot set's name:
[admin:~/Desktop] tftp 10.0.1.21
tftp> get NetBoot/NetBootSP0/NetRestore.nbi/booter
Received 174997 bytes in 0.2 seconds
tftp>
Note: this test will fail if your Netboot set has spaces in its name. In general, however, its OK
to have spaces in your Netboot set's name
Bombich.com: Troubleshooting the NetBoot Process 11-11-01 3:15 PM
http://afp548.com/netboot/mactips/netboot.html Page 5 of 11
to have spaces in your Netboot set's name
If you get an error, you probably have a tftp configuration problem. Check out this article in
regards to tftp issues on an upgraded Tiger Netboot server
Other suggestions:
Check that your server's firewall settings allow traffic on port 69
Verify that tftp is enabled in /etc/xinetd.d/tftp (Panther) or
/System/Library/LaunchDaemons/tftp.plist (Tiger)
Verify that the "booter" file exists in your NetBoot set and is readable (has read
privileges for "everyone")
Verify that your client can at least ping the router address returned by your DHCP
server
3) Gray Apple logo, spinning globe icon
When you see the gray Apple logo, it means that the booter file has been downloaded and
executed. In the case of Netboot, the booter file then downloads two additional files via tftp:
the mach.macosx and mach.macosx.mkext files. The mach.macosx file is simply a copy of the
/mach_kernel file located at the root of any Mac OS X filesystem. The mach.macos.mkext file
is a kernel extensions cache -- a file containing all the important kernel extensions for basic
network booting. While these files are downloaded, the small globe icon rotates. When the file
downloads are complete, the booter file loads the kernel and the kernel carries forth with the
boot process.
†(EFI): The kernel and kext cache files are very architecture-dependent. As of 10.4.4, these
files are "fat-but-extracted" files. That is, they contain header information that describes the
binaries available for each architecture within the file, but the architecture-specific binaries
have been extracted to reduce the overall size of the files. This will be explained in more detail
in the "Architectures" section.
Its fairly uncommon to run into problems in this stage of the Netboot process, however, there
are a couple specific issues that can cause kernel panics at this point. Possible problems
would be:
Not having a mach.macosx and mach.macosx.mkext file in your Netboot set
Either of those files being corrupt or otherwise inaccessible
The mach.macosx (kernel) file does not contain the binary for the client architecture or
is otherwise incompatible
The mach.macosx.mkext (kernel extension cache) file does not contain kernel
extensions required for the machine
These files take up about 12-15MB of space, so it should take a few seconds (or several
seconds for many machines) for this step to complete. If you experience problems at this
stage of the process, fixing the problem is fairly trivial:
1. Reboot the affected client machine from a local drive containing the most current OS
available. The OS version should also match the version of OS on your NetBoot disk
image. If the OS on the NetBoot disk image is older than that on your affected client
machine, you should recreate your NetBoot disk image. It is most important that the
OS on the NetBoot disk image be newer than (or the same as) the OS that the machine
shipped with.
2. Mount via AFP the NetBoot sharepoint of the NetBoot server that contains the affected
NetBoot set.
3. Recreate the mach.macosx and/or the mach.macosx.mkext files. See the
"Architectures" section for more details.
If all else fails, simply recreate the entire NetBoot set on the affected hardware. Be sure to
delete (or move out of the NetBoot sharepoint) any non-functional NetBoot sets.
4) Spinning globe turns into indeterminate progress indicator
Once the kernel loads, it changes the spinning globe icon into an indeterminate, circular
progress indicator, and the boot process functions mostly the same as a standard boot
process. If you were holding down Command+V during start up, you'd get the verbose boot at
Bombich.com: Troubleshooting the NetBoot Process 11-11-01 3:15 PM
http://afp548.com/netboot/mactips/netboot.html Page 6 of 11
process. If you were holding down Command+V during start up, you'd get the verbose boot at
this point. Two interesting things happen here that are relevant to troubleshooting Netboot.
First, the kernel loads the kernel extension cache to give the young OS the functionality it
needs to perform advanced network communication, mount disks, etc before the rest of the
OS loads.
Second, the kernel executes the /etc/rc.netboot startup script. This script attempts to mount
the disk image inside your Netboot set via NFS. The path to this disk image is obtained from
the BSDP response and maintained in memory (much like your DHCP packet is maintained and
accessible via the ipconfig command). If you do a packet trace you will see a packet similar to
this:
Server IP: 10.0.1.1
Client Ethernet Address: 00:0a:95:c4:21:9c
sname "xserve.apple.edu"
file "/private/tftpboot/NetBoot/NetBootSP0/Panther Server.nbi/booter"
Vendor-rfc1048:
DHCP:OFFER
SID:10.0.1.1
VC:"AAPLBSDPC"
RP:"nfs:10.0.1.1:/Library/NetBoot/NetBootSP0:Panther Server.nbi/Install.dmg"
VO:8.4.129.0.1.145.130.10.78.101.116.66.111.111.116.48.48.50
After these occur, the kernel initiates the /etc/rc.boot and/or /etc/rc.cdrom scripts which
complete the boot process. Eventually the screen turns blue as the WindowServer loads and
you begin to see the more familiar parts of the boot process.
Potential problems
Problem: Soon after the circular progress indicator appears under the gray Apple logo, white
horizontal lines appear on the screen and the progress indicator stops spinning.
Cause: This is probably a kernel panic, and it is likely a result of the machine trying to mount
the NFS-hosted disk image and failing.
Suggestions:
Verify that you have a kernel panic by holding down Command+V while you reboot the
client. There should be some indication of a panic.
Verify that NFS is running on the server
Verify that the NetBootSPx sharepoint is valid and accessible. Remember that the
NetBoot sharepoint should look like this:
cd /Library/NetBoot
ls -la
.sharepoint --> NetBootSP0
.clients --> NetBootClients0
NetBootSP0
NetBootClients0
If it doesn't, you can manually repair it, or run this command:
/System/Library/ServerSetup/NetBoot
Or you could reset the NetBoot sharepoints in Server Admin:
1. Navigate to NetBoot > Settings > General in Server Admin
2. Deselect all checkboxes in the bottom pane ("Select where to put images and client
data")
3. Save changes
4. Reselect the desired volumes for storing images and client data
5. Save changes
Problem: The system reboots about ten seconds or so after the circular progress indicator
appears
Bombich.com: Troubleshooting the NetBoot Process 11-11-01 3:15 PM
http://afp548.com/netboot/mactips/netboot.html Page 7 of 11
appears
Cause: To really determine the cause, you should do a verbose boot and try to catch the error
message indicated on the screen. More often than not, the problem is with an incompatible
kernel extension cache. The machine tried to load the cache, but some important piece was
missing and the computer could not continue booting.
Solution: Rebuild your Netboot image set on a machine that you would like to boot from that
set. Typically this means that you want to use your latest and greatest machine for creating
Netboot sets. Newly released Apple hardware always fails to boot from last year's Netboot
set. Keep your Netboot images fresh and you shouldn't run into this.
Problem: The system never progresses beyond the circular progress indicator
Cause: Again, to really determine the cause, you should do a verbose boot and to see specific
error messages indicated on the screen. Often this a misconfiguration of NFS at the server,
characterized by messages like "RPC timeout for server <NetBoot server IP>". Occasionally it is
due to bugs in the (third party) startup scripts.
Solution: Basic NFS troubleshooting -- start with resetting the NetBoot sharepoints in Server
Admin as indicated above. Verify that your firewall is not blocking ports required by NFS: 111
(UDP), 989 (UDP), 2049 (UDP and TCP). Also, use the commands "showmount" and
"mount_nfs" to verify that NFS is functioning. From a client booted from its own hard drive,
run these commands:
showmount -e <NetBoot Server IP>
mkdir /tmp/mnt
mount_nfs <NetBoot Server IP>:/Library/NetBoot/NetBootSP0 /tmp/mnt
The "showmount" command will indicate what NFS sharepoints are available on your NetBoot
server. If you do not see your NetBoot sharepoint, reset the NetBoot sharepoint in Server
Admin. The mount_nfs command actually attempts to mount the NFS sharepoint.
NetBoot Troubleshooting Topics
General Troubleshooting suggestions
Start simple using Apple's System Image Utility
Isolate your server and client to a private network on a dumb switch
Recreate the Netboot set
Try booting verbosely to see if any error messages point you in the right direction
Verify that you're getting an IP address within the subnet range of your Netboot server
Packet traces
This packet trace can be really useful (performed at the Netboot server):
sudo tcpdump -i en0 -s 0 -nvX port bootps or port bootpc or port tftp
or if you're planning to send the results to someone else:
sudo tcpdump -i en0 -s 0 -w ~/Desktop/packets.trace port bootps or port bootpc or
port tftp
What the arguments mean:
-i en0: Listen to traffic on en0
-s 0: Do not truncate packets
-n: Do not convert IP addresses to names
-v: Verbose output (give me a pretty summary of what the packet means)
-X: Print the contents of the packet in ASCII and hex
-x: print the contents of the packet in hex
-A: Print the contents of the packet in ASCII
-w: write the packets to a file instead of displaying them
There is a lot of information in packet traces, and it can be daunting to figure out what it all
Bombich.com: Troubleshooting the NetBoot Process 11-11-01 3:15 PM
http://afp548.com/netboot/mactips/netboot.html Page 8 of 11
There is a lot of information in packet traces, and it can be daunting to figure out what it all
means. You can also download my package of annotated packet traces for reference. The
most important thing to know about packet traces is how to do them. Even if you don't know
what to glean out of the trace, having it to hand to someone else can make troubleshooting
much easier.
Getting BSDP information at the command line
If you edit your Netboot set to provide a shell early in the boot process, you can see what BSDP
information your client is getting from the server with the following commands:
ipconfig netbootoption shadow_mount_path
ipconfig netbootoption shadow_file_path
ipconfig netbootoption machine_name
Diskless Netboot
A diskless NetBoot image is exactly the same as a non-diskless image (you don't make that
choice during SIU image creation, right? Right.) When you choose to make an image set
diskless in Server Admin, the only change that is made is to the "SupportsDiskless" key in the
NBInfo.plist file in the .nbi directory.
The magic occurs when you boot the client. Part of the BSDP response to the client includes
information about the location of any network mountpoints for shadow files. For example,
using the previous tip, you can get the following data from the BSDP packet:
% ipconfig netbootoption shadow_mount_path
afp://netboot001:10d7c947@10.0.1.4/NetBootClients3
% ipconfig netbootoption shadow_file_path
NetBoot001/Shadow
% ipconfig netbootoption machine_name
NetBoot001
Examining the /etc/rc.netboot startup script you can see how diskless Netbooting works. By
default, a Netboot client will try to mount a shadow file at the shadow_mount_path. If that fails
though (for example, if shadow_mount_path is not defined by the Netboot server), it will use
the local drive instead. Therefore, diskless Netboot depends entirely on the client's ability to
mount a shadow file at the AFP mount path returned by the Netboot server in the BSDP
response.
Note that while NetInstall does not require an internal drive, it is not "diskless netboot".
NetInstall does not use a shadow file at all, therefore a network shadow file is not required or
returned in the BSDP response. This is also why the "Diskless" checkbox is disabled in Server
Admin for NetInstall image sets. NetInstall sets employ RAM disks as necessary for writable
space.
Resetting NetBoot server caches
When you hold down the "N" key during startup, your machine will boot from the image set
that you have identified as the "default" set in Server Admin. When you choose a Network
startup disk in the Startup disk preferences pane, the server keeps track of your selection, and
you're forever bound to that server and Netboot set until you make another choice. What this
means is that if you change the default set at the server, then hold down the N key on startup
at that client that had chosen another Netboot set, the client will not boot from your default
set, it will always boot from the set that you had previously chosen (even if you have, since
then, reset the startup disk to a local disk).
†(EFI): Hold down Option+N to boot from the actual default NetBoot image.
While this technically works as designed, it doesn't necessarily work as expected. The Netboot
server keeps these choice settings in /var/db/bsdpd_clients. Its safe to delete that file to allow
your clients to boot to the default image set again. Also, the following series of commands
tend to resolve problems caused by setting a specific network startup disk choice on a client,
Bombich.com: Troubleshooting the NetBoot Process 11-11-01 3:15 PM
http://afp548.com/netboot/mactips/netboot.html Page 9 of 11
tend to resolve problems caused by setting a specific network startup disk choice on a client,
then deleting that Netboot set.
sudo rm /var/db/bsdpd_clients
sudo killall bootpd
sudo killall -HUP xinetd
sudo lookupd -flushcache
sudo serveradmin stop netboot
sudo serveradmin start netboot
Netbooting across subnets
Netboot requires that the client can get DHCP and BSDP information via broadcast. This
typically requires that the Netboot server and clients reside on the same subnet, because
routers typically do not pass broadcast information between subnets. DHCP information,
however, is handled specially by routers so you don't need a DHCP server on every segment of
your network. This is handled by what are typically called "DHCP Helper tables" (or more
generally, DHCP Relay) in your router's configuration. Basically this is just a list of IP addresses
that DHCP broadcast packets should be relayed to.
Because the BSDP protocol is so similar to DHCP, the router configuration for a BSDP server is
the same as for DHCP. Therefore, if you want to Netboot across subnets, or more technically
spoken, if you want BSDP broadcast information relayed past your routers, you need to add
the IP address of your Netboot server to your router's DHCP helper table.
A common fear among network administrators is that this will interfere with the handling of
DHCP by other servers. However, although the bootpd process is running on your Netboot
server, if the DHCP service is not turned on, it will not hand out IP addresses. In fact, it will
completely ignore any DHCP requests altogether. Likewise, your other DHCP server will
completely ignore BSDP broadcasts that are relayed to it by the router.
In summary, if you want to Netboot across subnets, work with your network administrator to
configure your routers to send BSDP broadcasts to your Netboot server. This is not an
unreasonable request or difficult task, and greatly reduces your infrastructure and
management costs.
NetBooting Multiple Architectures
When a Macintosh client begins the NetBoot process, it sends out a broadcast request for a
NetBoot server. Within this request are three very important pieces of information: Client
identifier (MAC address), architecture, and System Identifier (machine model). When a (Tiger+)
NetBoot server sees a broadcast BSDP request, launchd launches bootpd to handle the
request. The NetBoot server checks its /var/db/bsdpd_clients file to determine if the client
already has selected a NetBoot image on the server. If a record for the client exists on the
server, the server will return the associated NetBoot image information and the NetBoot client
will prefer this server over any other NetBoot servers on the network. If an association does
not yet exist, the server returns a list of NetBoot images that are available to the particular
client. When the client finally chooses an image, the server creates a client-association record
in /var/db/bsdpd_clients.
The NetBoot server will filter a NetBoot image from the list returned to the client if:
1. The client's MAC address is specifically forbidden from accessing images on the server
(NetBoot filters)
2. The NetBoot image does not support the architecture of the client machine
3. The NetBoot image is not enabled for the machine model
Refer to the Mac OS X Server documentation for more details on NetBoot filtering.
Architecture support is defined in two ways. As of 10.4.4, there is an additional key in the
NBImageInfo.plist file named "Architectures". This attribute contains an array of the
architectures supported, for example {ppc} or {ppc, i386}. Additionally, the NetBoot set must
contain a booter, mach.macosx, and mach.macosx.mkext file for each architecture supported.
For backward compatibility, the ppc booter files may reside at the root level of the NetBoot set
or within a folder named "ppc" at the root level of the NetBoot set. Intel-specific booter files
must reside within a folder named "i386" at the root level of the NetBoot set. Therefore, you
Bombich.com: Troubleshooting the NetBoot Process 11-11-01 3:15 PM
http://afp548.com/netboot/mactips/netboot.html Page 10 of 11
must reside within a folder named "i386" at the root level of the NetBoot set. Therefore, you
could have a Universal NetBoot set (capable of booting ppc or i386) with the following
structure:
NetBoot.nbi/
booter
i386/
booter
mach.macosx
mach.macosx.mkext
mach.macosx
mach.macosx.mkext
NBImageInfo.plist
System.dmg
When the NetBoot server receives a BSDP request from a particular architecture, it determines
if ${arch}/booter exists. If it does, it returns the path to that file in the BSDP response. If it
does not, and arch = ppc, it returns the path to booter (at the root level of the nbi) if it exists.
If the booter does not exist for the architecture, not only will the client not boot from that
NetBoot set, but the NetBoot image will not even appear as an available boot disk to the client.
Generating platform-specific boot files:
1. Create the mach.macosx file with a command similar to the following, replacing the
path to the NetBoot set with your own information:
ditto /mach_kernel /Volumes/NetBootSP0/NetRestore.nbi/mach.macosx
or, if the kernel is fat, you can extract the architecture specific binary directly to the nbi
folder:
lipo -extract ppc -output /Volumes/NetBootSP0/NetRestore.nbi/mach.macosx
/mach_kernel
2. Create the kernel extension cache with a command similar to the following, replacing
the path to the NetBoot set with your own information:
sudo kextcache -a ppc -s -l -n -z -m /tmp/mkext /System/Library/Extensions
ditto /tmp/mkext /Volumes/NetBootSP0/NetRestore.nbi/mach.macosx.mkext
or, for an Intel-based Mac:
sudo kextcache -a i386 -s -l -n -z -m /tmp/mkext /System/Library/Extensions
ditto /tmp/mkext /Volumes/NetBootSP0/NetRestore.nbi/i386/mach.macosx.mkext
3. Add the booter files. PowerPC:
ditto /usr/standalone/ppc/bootx.bootinfo
/Volumes/NetBootSP0/NetRestore.nbi/booter
Intel-based Mac:
ditto /usr/standalone/i386/boot.efi
/Volumes/NetBootSP0/NetRestore.nbi/i386/booter
References:
Cisco article on DHCP Relay configuration
Apple Kbase: Netbooting across subnets
Alternative method of Netbooting across subnets
Kernelthread.com: Booting Mac OS X
Apple Documentation of the Mac OS X boot process
How to enable NetBoot 1.0 for older NetBoot client computers
History:
7/8/2005: Initial publication
1/19/2006: Updated with information about EFI/Intel-based Macs
4/3/2006: Updated with additional NFS troubleshooting information
Bombich.com: Troubleshooting the NetBoot Process 11-11-01 3:15 PM
http://afp548.com/netboot/mactips/netboot.html Page 11 of 11
Posted on 01-11-2017 06:12 PM
OK,
I have some more info. Rather than paraphrasing what my network guy is telling me, I am going to copy his response below. Here are some questions to think about as you read that. 1. Does his ultimate solution sound plausible? 2. This feels like a client side issue BUT why would Apple set different MTU sizes on different models? 3. Can the MTU be set on a client as he suggests? 4. Or, does this need to be configured on my OS X server(s) that are offering NetBoot?
*Not sure if Randy has been able to update you yet and let you know we've found a workaround that restored ability to image MACs.
I wanted to take a minute and explain what we found and what we will need from you guys going forward for a permanent fix.
What we were seeing in the packet captures was a ton of fragmented packets coming into the firewall. Working with our SE we determined that due to network load balancing these fragments are coming into different blades on the firewall and unable to be reassembled.
For the short term, due to the moratorium, we have disabled the second link so that all packets come into one blade. That restored service but impairs redundancy. We will run a change after moratorium to update the load balancing algorithm to do a better job of pinning these to one blade.
That being said, both of these are just workarounds. We will need you to engage your vendor support and find out why the imaging process is using packets larger than the standard MTU of 1500 bytes and see if that can be cranked down. I don't know if the imaging uses a GRE tunnel or something like that which is causing excess overhead. Resolving the fragmentation is the correct way to fix this and what we will need your help with.
If you have any questions or need packet captures or anything please let us know.
*-------
So, does anyone out in JN have any ideas I can try from my end?
TIA,
/randy
Posted on 01-11-2017 08:06 PM
What type of server are you using for NetBoot? Apple Server, NetSUS, BSDPy, etc?
Posted on 01-12-2017 12:44 PM
@mscottblake OS X 10.11.6 Mac Mini
Posted on 01-13-2017 03:28 AM
It's really an Apple issue.. you can use @Bruienne's BSDPy to set a MTU to 1500.
Apple seem to follow the below, which can make some NW folks unhappy:
The original protocol has a transfer file size limit of 512 bytes/block x 65535 blocks = 32 MB. In 1998 this limit was extended to 1468 bytes/block x 65535 blocks = 93 MB by TFTP Blocksize Option RFC 2348. If IP fragmentation is not an option the maximum block size is the size of an Ethernet MTU (1500) minus the headers of TFTP (4 bytes), UDP (8 bytes) and IP (20 bytes) = 1468 bytes/block. Today most servers and clients support block number roll-over (block counter going back to 0 after 65535) which gives an essentially unlimited transfer file size.