1 <?xml version="1.0" encoding="UTF-8"?>
2 <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
3 "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd">
6 <title>Boot Manager Technical Documentation</title>
9 <firstname>Aaron</firstname>
11 <surname>Klingaman</surname>
13 <email>alk@cs.princeton.edu</email>
17 <orgname>Princeton University</orgname>
22 <revnumber>1.0</revnumber>
24 <date>March 15, 2005</date>
26 <authorinitials>AK</authorinitials>
29 <para>Initial draft.</para>
36 <title>Components</title>
38 <para>The entire Boot Manager system consists of several components that
39 are designed to work together to provide the functionality outline in the
40 Boot Manager PDN <citation>1</citation>. These consist of:</para>
44 <para>A set of API calls available at PlanetLab Central</para>
48 <para>A package to be run in the boot cd environment on nodes</para>
52 <para>An appropriate user interface allowing administrators to create
53 node configuration files</para>
57 <para>The previous implementation of the software responsible for
58 installing and booting nodes consisted of a set of boot scripts that the
59 boot cd would run, depending on the node's current boot state. The logic
60 behind which script the node was sent to the node existed on the boot
61 server in the form of PHP scripts. However, the intention with the new
62 Boot Manager system is to send the same boot manager back for all nodes,
63 in all boot states, each time the node starts. Then, the boot manager will
64 run and detiremine which operations to perform on the node, based on the
65 current boot state. There is no longer any boot state specific logic at
70 <title>API Calls</title>
72 <para>Most of the API calls available as part of the PlanetLab Central API
73 are intended to be run by users, and thus authentication for these calls
74 is done with the user's email address and password. However, the API calls
75 described below will be run by the nodes themselves, so a new
76 authentication mechanism is required.</para>
79 <title>Authentication</title>
81 <para>As is done with other PLC API calls, the first parameter to all
82 Boot Manager related calls will be an authentication structure,
83 consisting of these named fields:</para>
87 <para>AuthMethod</para>
89 <para>The authentication method, only 'hmac' is currently
96 <para>The node id, contained on the configuration file.</para>
102 <para>The node's primary IP address. This will be checked with the
103 node_id against PLC records.</para>
109 <para>The authentication string, depending on method. For the 'hmac'
110 method, a hash for the call, made from the parameters of the call
111 the key contained on the configuration file.</para>
115 <para>Authentication is succesful if PLC is able to create the same hash
116 from the values usings its own copy of the node key. If the hash values
117 to not match, then either the keys do not match or the values of the
118 call were modified in transmision and the node cannot be
119 authenticated.</para>
121 <para>TODO: add specifics on how the hash value is produced from the
122 parameters in the API call.</para>
126 <title>PLC API Calls</title>
128 <para>Full technical documentation of these functions can be found in
129 the PlanetLab API documentation.</para>
133 <para>BootUpdateNode( authentication, update_values )</para>
135 <para>Update a node record, currenly only allowing the boot state to
140 <para>BootCheckAuthentication( authentication )</para>
142 <para>Simply check to see if the node is recognized by the system
143 and is authorized</para>
147 <para>BootGetNodeDetails( authentication )</para>
149 <para>Return details about a node, including its state, what
150 networks the PLC database has configured for the node.</para>
154 <para>BootNotifyOwners( authentication, message, include_pi,
155 include_tech, include_support )</para>
157 <para>Notify someone about an event that happened on the machine,
158 and optionally include the site PIs, technical contacts, and
159 PlanetLab Support</para>
163 <para>BootUpdateNodeHardware( authentication, pci_entries )</para>
165 <para>Send the set of hardware this node has and update the record
173 <title>Core Package</title>
175 <para>The Boot Manager core package, which is run on the nodes and
176 contacts the Boot API as necessary, is responsible for the following major
177 functional units:</para>
181 <para>Installing nodes with alpina, the PlanetLab installer</para>
185 <para>Putting a node into a debug state so administrators can track
190 <para>Reconfiguring an already installed node to reflect new hardware,
191 or changed network settings</para>
195 <para>Booting an already installed node</para>
200 <title>Boot States</title>
202 <para>Each node always has one of four possible boot states.</para>
208 <para>The boot state cooresponds to a new node that has not yet been
209 installed, but record of it does exist. When the boot manager
210 starts, and the node is in this state, the user is prompted to
211 continue with the installation. The intention here is to prevent a
212 non-PlanetLab machine (like a user's desktop machine) from becoming
213 inadvertantly wiped and installed with the PlanetLab node
218 <para>'reinstall'</para>
220 <para>In this state, a node will reinstall the node software,
221 erasing anything that might have been on the disk before.</para>
227 <para>This state cooresponds with nodes that have sucessfully
228 installed, and can be chain booted to the runtime node
235 <para>Regardless of whether or not a machine has been installed,
236 this state sets up a node to be debugged by administrators.</para>
242 <title>Flow Chart</title>
244 <para>Below is a high level flow chart of the boot manager, from the
245 time it is executed to when it exits.</para>
248 <title>Boot Manager Flow Chart</title>
252 <imagedata align="left" fileref="boot-manager-flowchart.png"
260 <title>Boot CD Environment</title>
262 <para>The boot manager needs to be able to operate under all currently
263 supported boot cds. The new 3.0 cd contains software the current 2.x cds
264 do not contain, including the Logical Volume Manager (LVM) client tools,
265 RPM, and YUM, among other packages. Given this requirement, the boot cd
266 will need to download as necessary the extra support files it needs to
267 run. Depending on the size of these files, they may only be downloaded
268 by specific steps in the flow chart in figure 1, and thus are not
273 <title>Node Configuration Files</title>
275 <para>To remain compatible with 2.x boot cds, the format and existing
276 contents of the configuration files for the nodes will not change. There
277 will be, however, the addition of three fields:</para>
281 <para>NET_DEVICE</para>
283 <para>If present, use the device with the specified mac address to
284 contact PLC. The network on this device will be setup. If not
285 present, the device represented by 'eth0' will be used.</para>
289 <para>NODE_KEY</para>
291 <para>The unique, per-node key to be used during authentication and
292 identity verification. This is a fixed length, random value that is
293 only known to the node and PLC.</para>
299 <para>The PLC assigned node identifier.</para>
303 <para>An example of a configuration file for a dhcp networked
306 <programlisting>IP_METHOD="dhcp"
307 HOST_NAME="planetlab-1"
308 DOMAIN_NAME="cs.princeton.edu"
309 NET_DEVICE="00:06:5B:EC:33:BB"
310 NODE_KEY="79efbe871722771675de604a227db8386bc6ef482a4b74"
311 NODE_ID="121"</programlisting>
313 <para>An example of a configuration file for the same machine, only with
314 a statically assigned network address:</para>
316 <programlisting>IP_METHOD="static"
317 IP_ADDRESS="128.112.139.71"
318 IP_GATEWAY="128.112.139.65"
319 IP_NETMASK="255.255.255.192"
320 IP_NETADDR="128.112.139.127"
321 IP_BROADCASTADDR="128.112.139.127"
322 IP_DNS1="128.112.136.10"
323 IP_DNS2="128.112.136.12"
324 HOST_NAME="planetlab-1"
325 DOMAIN_NAME="cs.princeton.edu"
326 NET_DEVICE="00:06:5B:EC:33:BB"
327 NODE_KEY="79efbe871722771675de604a227db8386bc6ef482a4b74"
328 NODE_ID="121"</programlisting>
330 <para>Existing 2.x boot cds will look for the configuration files only
331 on a floppy disk, and the file must be named 'planet.cnf'. The new 3.x
332 boot cds, however, will initially look for a file named 'plnode.txt' on
333 either a floppy disk, or burned onto the cd itself. Alternatively, it
334 will fall back to looking for the original file name, 'planet.cnf'. This
335 initial file reading is performed by the boot cd itself to bring the
336 nodes network online, so it can download and execute the Boot
339 <para>However, the Boot Manager will also need to identify the location
340 of and read in the file, so it can get the extra fields not initially
341 used to bring the network online (node_key and node_id). Below is the
342 search order that the boot manager will use to locate a file. If a file
343 is found in the order below, but does not match the network that has
344 already been setup, then there may be two configuration files located on
345 the same machine. In this situation, the file is skipped and searching
348 <para>Configuration file location search order (both file names,
349 plnode.txt and planet.cnf are search in each of these locations;
350 plnode.txt first):</para>
354 <para>Standard floppy disk (/dev/fd0 and /dev/fd1)</para>
358 <para>USB floppy disk</para>
362 <para>USB flash based disk</para>
366 <para>USB cdrom</para>
370 <para>standard cdrom</para>
377 <title>User Interface for Node Management</title>
380 <title>Adding Nodes</title>
382 <para>New nodes are added to the system explicitly by either a PI or a
383 tech contact, either directly through the API calls, or by using the
384 appropriate interfaces on the website. As nodes are added, only their
385 hostname and ip address are required to be entered. When the node is
386 brought online, the records at PLC will be updated with the remaining
389 <para>After a node is added, the user has the option of creating a
390 configuration file for that node. This is done automatically, and the
391 user is prompted to download and save the file. This file contains only
392 the primary network interface information (necessary to contact PLC),
393 and the per-node key.</para>
395 <para>The default boot state of a new node is 'new', which requires the
396 user to confirm the installation at the node, by typing yes on the
397 console. If this is not desired, as is the case with nodes in a
398 co-location site, or for a large number of nodes being setup at the same
399 time, the administrator can change the node state, after the entry is in
400 the PLC records, from 'new' to 'reinstall'. This will bypass the
401 confirmation screen, and proceed directly to reinstall the machine (even
402 if it already had a node installation on it).</para>
406 <title>Updating Node Network Settings</title>
408 <para>If the primary node network address must be updated, if the node
409 is moved to a new network for example, then two steps must be performed
410 to successfully complete the move:</para>
414 <para>The node network will need to be updated at PLC, either
415 through the API directly or via the website.</para>
419 <para>Either the floppy file regenerated and put into the machine,
420 or, update the existing floppy to match the new settings.</para>
424 <para>If the node ip address on the floppy does not Match the record at
425 PLC, then the node will not boot until they do match. The intention here
426 is to prevent a malicious user from taking the floppy disk, altering the
427 network settings, and trying to bring up a new machine with the new
430 <para>On the other hand, if a non-primary network address needs to be
431 updated, then simply updating the records at PLC will suffice. The boot
432 manager, at next restart, will reconfigure the machine to match the PLC
437 <title>Removing Nodes</title>
439 <para>Nodes are removed from the system by:</para>
443 <para>Deleting the record of the node at PLC</para>
447 <para>Shutting down the machine.</para>
451 <para>Once this is done, even if the machine attempts to come back
452 online, it cannot be authorized with PLC and will not boot.</para>
457 <title>Common Scenarios</title>
459 <para>Below are common scenarios that the boot manager might encounter
460 that would exist outside of the documented procedures for handling nodes.
461 A full description of how they will be handled follows each.</para>
465 <para>A configuration file from previously installed and functioning
466 node is copied or moved to another machine, and the networks settings
467 are updated on it (but the key is left the same).</para>
469 <para>Since the authentication for a node consists of matching not
470 only the node id, but the primary node ip, this step will fail, and
471 the node will not allow the boot manager to be run. Instead, the new
472 node must be created at PLC first, and a network configuration file
473 for it must be generated, with its own node key.</para>
477 <para>After a node is installed and running, the administrators
478 mistakenly remove the cd and disk.</para>
480 <para>The node installer clears all boot records from the disk, so the
481 node will not boot. Typically, the bios will report no operating
491 <title>The PlanetLab Boot Manager</title>
493 <date>January 14, 2005</date>
496 <firstname>Aaron</firstname>
498 <surname>Klingaman</surname>