1 <?xml version="1.0" encoding="UTF-8"?>
2 <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
3 "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd">
6 <title>Boot Manager Technical Documentation</title>
9 <firstname>Aaron</firstname>
11 <surname>Klingaman</surname>
13 <email>alk@cs.princeton.edu</email>
17 <orgname>Princeton University</orgname>
22 <revnumber>1.0</revnumber>
24 <date>March 15, 2005</date>
26 <authorinitials>AK</authorinitials>
29 <para>Initial draft.</para>
34 <revnumber>1.1</revnumber>
36 <date>May 31, 2005</date>
38 <authorinitials>AK</authorinitials>
41 <para>Updated post implementation and deployment.</para>
46 <revnumber>1.2</revnumber>
48 <date>November 16, 2005</date>
50 <authorinitials>AK</authorinitials>
53 <para>Add section on where source code is, and other updates to make
54 it consistent with implementation.</para>
61 <title>Components</title>
63 <para>The entire Boot Manager system consists of several components that
64 are designed to work together to provide the functionality outline in the
65 Boot Manager PDN <citation>1</citation>. These consist of:</para>
69 <para>A set of API calls available at PlanetLab Central</para>
73 <para>An API authentication mechanism used exclusively by the
74 BootManager, for the above API calls</para>
78 <para>A package to be run in the boot cd environment on nodes
79 containing core logic</para>
83 <para>A user interface allowing authorized users to add and manage
84 nodes and create node/BootManager configuration files</para>
88 <para>The previous implementation of the software responsible for
89 installing and booting nodes consisted of a set of boot scripts that the
90 boot cd would download and run, depending on the node's current boot
91 state. Only the necessary script for the current state would be
92 downloaded, and the logic behind which script the node was sent to the
93 node existed on the boot server in the form of PHP scripts. However, the
94 intention with the new BootManager system is to send the same script back
95 for all nodes (consisting of the core BootManager code), in all boot
96 states, each time the node starts. Then, the boot manager will run and
97 detiremine which operations to perform on the node, based on the current
98 boot state. All state based logic for the node boot, install, debug, and
99 reconfigure operations are contained in one place; there is no longer any
100 boot state specific logic at PLC.</para>
104 <title>Soure Code</title>
106 <para>All BootManager source code is located in the repository
107 'bootmanager' on the PlanetLab CVS system. For information on how to
108 access CVS, consult the PlanetLab website. Unless otherwise noted, all
109 file references refer to this repository.</para>
113 <title>API Calls</title>
115 <para>Most of the API calls available as part of the PlanetLab Central API
116 are intended to be run by users, and thus authentication for these calls
117 is done with the user's email address and password. However, the API calls
118 described below will be run by the nodes themselves, so a new
119 authentication mechanism is required.</para>
122 <title>Authentication</title>
124 <para>As is done with other PLC API calls, the first parameter to all
125 BootManager related calls will be an authentication structure,
126 consisting of these named fields:</para>
130 <para>AuthMethod</para>
132 <para>The authentication method, only 'hmac' is currently
139 <para>The node id, contained on the configuration file.</para>
145 <para>The node's primary IP address. This will be checked with the
146 node_id against PLC records.</para>
152 <para>The authentication string, depending on method. For the 'hmac'
153 method, a hash for the call using the HMAC algorithm, made from the
154 parameters of the call the key contained on the configuration file.
155 For specifics on how this is created, see below.</para>
159 <para>Authentication is succesful if PLC is able to create the same hash
160 from the values usings its own copy of the node key. If the hash values
161 to not match, then either the keys do not match or the values of the
162 call were modified in transmision and the node cannot be
163 authenticated.</para>
165 <para>Both the BootManager and the authentication software at PLC must
166 agree on a method for creating the hash values for each call. This hash
167 is essentially a finger print of the method call, and is created by this
172 <para>Take the value of every part of each parameter, except the
173 authentication structure, and convert them to strings. For arrays,
174 each element is used. For dictionaries, not only is the value of all
175 the items used, but the keys themselves. Embedded types (arrays or
176 dictionaries inside arrays or dictionaries, etc), also have all
177 values extracted.</para>
181 <para>Alphabetically sort all the parameters.</para>
185 <para>Concatenate them into a single string.</para>
189 <para>Prepend the string with the method name and [, and append
194 <para>The implementation of this algorithm is in the function
195 serialize_params in the file source/BootAPI.py. The same algorithm is
196 located in the 'plc_api' repository, in the function serialize_params in
197 the file PLC/Auth.py.</para>
199 <para>The resultant string is fed into the HMAC algorithm with the node
200 key, and the resultant hash value is used in the authentication
203 <para>This authentication method makes a number of assumptions, detailed
208 <para>All calls made to PLC are done over SSL, so the details of the
209 authentication structure cannot be viewed by 3rd parties. If, in the
210 future, non-SSL based calls are desired, a sequence number or some
211 other value making each call unique will would be required to
212 prevent replay attacks. In fact, the current use of SSL negates the
213 need to create and send hashes across - technically, the key itself
214 could be sent directly to PLC, assuming the connection is made to an
215 HTTPS server with a third party signed SSL certificate.</para>
219 <para>Athough calls are done over SSL, they use the Python class
220 libary xmlrpclib, which does not do SSL certificate
227 <title>PLC API Calls</title>
229 <para>Full, up to date technical documentation of these functions can be
230 found in the PlanetLab Central API documentation. They are listed here
231 for completeness.</para>
235 <para>BootUpdateNode( authentication, update_values )</para>
237 <para>Update a node record, including its boot state, primary
238 network, or ssh host key.</para>
242 <para>BootCheckAuthentication( authentication )</para>
244 <para>Simply check to see if the node is recognized by the system
245 and is authorized.</para>
249 <para>BootGetNodeDetails( authentication )</para>
251 <para>Return details about a node, including its state, what
252 networks the PLC database has configured for the node, and what the
253 model of the node is.</para>
257 <para>BootNotifyOwners( authentication, message, include_pi,
258 include_tech, include_support )</para>
260 <para>Notify someone about an event that happened on the machine,
261 and optionally include the site PIs, technical contacts, and
262 PlanetLab Support.</para>
269 <title>Core Package</title>
271 <para>The Boot Manager core package, which is run on the nodes and
272 contacts the Boot API as necessary, is responsible for the following major
273 functional units:</para>
277 <para>Configuring node hardware and installing the PlanetLab operating
282 <para>Putting a node into a debug state so administrators can track
287 <para>Reconfiguring an already installed node to reflect new hardware,
288 or changed network settings</para>
292 <para>Booting an already installed node into the PlanetLab operating
298 <title>Boot States</title>
300 <para>Each node always has one of four possible boot states.</para>
306 <para>Install. The boot state cooresponds to a new node that has not
307 yet been installed, but record of it does exist. When the boot
308 manager starts, and the node is in this state, the user is prompted
309 to continue with the installation. The intention here is to prevent
310 a non-PlanetLab machine (like a user's desktop machine) from
311 becoming inadvertantly wiped and installed with the PlanetLab node
318 <para>Reinstall. In this state, a node will reinstall the node
319 software, erasing anything that might have been on the disk
326 <para>Boot. This state cooresponds with nodes that have sucessfully
327 installed, and can be chain booted to the runtime node
334 <para>Debug. Regardless of whether or not a machine has been
335 installed, this state sets up a node to be debugged by
336 administrators.</para>
342 <title>Flow Chart</title>
344 <para>Below is a high level flow chart of the boot manager, from the
345 time it is executed to when it exits. This core state machine is located
346 in source/BootManager.py.</para>
349 <title>Boot Manager Flow Chart</title>
353 <imagedata align="left" fileref="boot-manager-flowchart.png"
363 <title>Boot CD Environment</title>
365 <para>The boot manager needs to be able to operate under all currently
366 supported boot cds. The new 3.0 cd contains software the current 2.x cds
367 do not contain, including the Logical Volume Manager (LVM) client tools,
368 RPM, and YUM, among other packages. Given this requirement, the boot cd
369 will need to download as necessary the extra support files it needs to
370 run. Depending on the size of these files, they may only be downloaded
371 by specific steps in the flow chart in figure 1, and thus are not
374 <para>See the PlanetLab BootCD Documentation for more information about
375 the current, 3.x boot cds, how they are build, and what they provide to
376 the BootManager.</para>
380 <title>Node Configuration Files</title>
382 <para>To remain compatible with 2.x boot cds, the format and existing
383 contents of the configuration files for the nodes will not change. There
384 will be, however, the addition of three fields:</para>
388 <para>NET_DEVICE</para>
390 <para>If present, use the device with the specified mac address to
391 contact PLC. The network on this device will be setup. If not
392 present, the device represented by 'eth0' will be used.</para>
396 <para>NODE_KEY</para>
398 <para>The unique, per-node key to be used during authentication and
399 identity verification. This is a fixed length, random value that is
400 only known to the node and PLC.</para>
406 <para>The PLC assigned node identifier.</para>
410 <para>An example of a configuration file for a dhcp networked
413 <programlisting>IP_METHOD="dhcp"
414 HOST_NAME="planetlab-1"
415 DOMAIN_NAME="cs.princeton.edu"
416 NET_DEVICE="00:06:5B:EC:33:BB"
417 NODE_KEY="79efbe871722771675de604a227db8386bc6ef482a4b74"
418 NODE_ID="121"</programlisting>
420 <para>An example of a configuration file for the same machine, only with
421 a statically assigned network address:</para>
423 <programlisting>IP_METHOD="static"
424 IP_ADDRESS="128.112.139.71"
425 IP_GATEWAY="128.112.139.65"
426 IP_NETMASK="255.255.255.192"
427 IP_NETADDR="128.112.139.127"
428 IP_BROADCASTADDR="128.112.139.127"
429 IP_DNS1="128.112.136.10"
430 IP_DNS2="128.112.136.12"
431 HOST_NAME="planetlab-1"
432 DOMAIN_NAME="cs.princeton.edu"
433 NET_DEVICE="00:06:5B:EC:33:BB"
434 NODE_KEY="79efbe871722771675de604a227db8386bc6ef482a4b74"
435 NODE_ID="121"</programlisting>
437 <para>Existing 2.x boot cds will look for the configuration files only
438 on a floppy disk, and the file must be named 'planet.cnf'. The new 3.x
439 boot cds, however, will initially look for a file named 'plnode.txt' on
440 either a floppy disk, or burned onto the cd itself. Alternatively, it
441 will fall back to looking for the original file name, 'planet.cnf'. This
442 initial file reading is performed by the boot cd itself to bring the
443 nodes network online, so it can download and execute the Boot
446 <para>However, the Boot Manager will also need to identify the location
447 of and read in the file, so it can get the extra fields not initially
448 used to bring the network online (primarily node_key and node_id). Below
449 is the search order that the BootManager will use to locate a
452 <para>Configuration file location search order:<informaltable>
456 <entry>File name</entry>
458 <entry>Floppy drive</entry>
460 <entry>Flash devices</entry>
462 <entry>Root file system, in /</entry>
464 <entry>CDRom, in /usr/boot</entry>
466 <entry>CDRom, in /usr</entry>
470 <entry>plode.txt</entry>
484 <entry>planet.cnf</entry>
498 </informaltable></para>
503 <title>User Interface for Node Management</title>
506 <title>Adding Nodes</title>
508 <para>New nodes are added to the system explicitly by either a PI or a
509 tech contact, either directly through the API calls, or by using the
510 appropriate interfaces on the website. As nodes are added, their
511 hostname, network configuration method (dhcp or static), and any static
512 settings are required to be entered. Regardless of network configuration
513 method, IP address is required. When the node is brought online, the
514 records at PLC will be updated with any remaining information.</para>
516 <para>After a node is added, the user has the option of creating a
517 configuration file for that node. Once the node is added, the contents
518 of the file are created automatically, and the user is prompted to
519 download and save the file. This file contains only the primary network
520 interface information (necessary to contact PLC), the node id, and the
523 <para>The default boot state of a new node is 'inst', which requires the
524 user to confirm the installation at the node, by typing yes on the
525 console. If this is not desired, as is the case with nodes in a
526 co-location site, or for a large number of nodes being setup at the same
527 time, the administrator can change the node state, after the entry is in
528 the PLC records, from 'inst' to 'reinstall'. This will bypass the
529 confirmation screen, and proceed directly to reinstall the machine (even
530 if it already had a node installation on it).</para>
534 <title>Updating Node Network Settings</title>
536 <para>If the primary node network address must be updated, if the node
537 is moved to a new network for example, then two steps must be performed
538 to successfully complete the move:</para>
542 <para>The node network will need to be updated at PLC, either
543 through the API directly or via the website.</para>
547 <para>Either the floppy file regenerated and put into the machine,
548 or, update the existing floppy to match the new settings.</para>
552 <para>If the node ip address on the floppy does not match the record at
553 PLC, then the node will not boot until they do match, as authentication
554 will fail. The intention here is to prevent a malicious user from taking
555 the floppy disk, altering the network settings, and trying to bring up a
556 new machine with the new settings.</para>
558 <para>On the other hand, if a non-primary network address needs to be
559 updated, then simply updating the record in the configuration file will
560 suffice. The boot manager, at next restart, will reconfigure the
561 machine, and update the PLC records to match the configuration
566 <title>Removing Nodes</title>
568 <para>Nodes are removed from the system by:</para>
572 <para>Deleting the record of the node at PLC</para>
576 <para>Shutting down the machine.</para>
580 <para>Once this is done, even if the machine attempts to come back
581 online, it cannot be authorized with PLC and will not boot.</para>
586 <title>BootManager Configuration</title>
588 <para>All run time configuration options for the BootManager exist in a
589 single file located at source/configuration. These values are described
594 <para><literal>VERSION</literal></para>
596 <para>The current BootManager version. During install, written out to
597 /etc/planetlab/install_version</para>
601 <para><literal>BOOT_API_SERVER</literal></para>
603 <para>The full URL of the API server to contact for authenticated
608 <para><literal>TEMP_PATH</literal></para>
610 <para>A writable path on the boot cd we can use for temporary storage
615 <para><literal>SYSIMG_PATH</literal></para>
617 <para>The path were we will mount the node logical volumes during any
618 step that requires access to the disks.</para>
622 <para>CACERT_PATH</para>
624 <para>Variable not used anymore.</para>
628 <para><literal>NONCE_FILE</literal></para>
630 <para>Variable not used anymore.</para>
634 <para><literal>PLCONF_DIR</literal></para>
636 <para>The path that PlanetLab node configuration files will be created
637 in during install. This should not be changed from /etc/planetlab, as
638 this path is assumed in other PlanetLab components.</para>
642 <para><literal>SUPPORT_FILE_DIR</literal></para>
644 <para>A path on the boot server where per-step additional files may be
645 located. For example, the packages that include the tools to allow
646 older 2.x version boot cds to partition disks with LVM.</para>
650 <para><literal>ROOT_SIZE</literal></para>
652 <para>During install, this sets the size of the node root partition.
653 It must be large enough to house all the node operational software. It
654 does not store any user/slice files. Include 'G' suffix in this value,
655 indicating gigabytes.</para>
659 <para><literal>SWAP_SIZE</literal></para>
661 <para>How much swap to configure the node with during install. Include
662 'G' suffix in this value, indicating gigabytes.</para>
666 <para><literal>SKIP_HARDWARE_REQUIREMENT_CHECK</literal></para>
668 <para>Whether or not to skip any of the hardware requirement checks,
669 including total disk and memory size constraints.</para>
673 <para><literal>MINIMUM_MEMORY</literal></para>
675 <para>How much memory is required by a running PlanetLab node. If a
676 machine contains less physical memory than this value, the install
677 will not proceed.</para>
681 <para><literal>MINIMUM_DISK_SIZE</literal></para>
683 <para>The size of the small disk we are willing to attempt to use
684 during the install, in gigabytes. Do not include any suffixes.</para>
688 <para><literal>TOTAL_MINIMUM_DISK_SIZE</literal></para>
690 <para>The size of all usable disks must be at least this sizse, in
691 gigabytes. Do not include any suffixes.</para>
695 <para><literal>INSTALL_LANGS</literal></para>
697 <para>Which language support to install. This value is used by RPM,
698 and is used in writting /etc/rpm/macros before any RPMs are
703 <para><literal>NUM_AUTH_FAILURES_BEFORE_DEBUG</literal></para>
705 <para>How many authentication failures the BootManager is willing to
706 except for any set of calls, before stopping and putting the node into
713 <title>Installer Hardware Detection</title>
715 <para>When a node is being installed, the Boot Manager must identify which
716 hardware the machine has that is applicable to a running node, and
717 configure the node properly so it can boot properly post-install. The
718 general procedure for doing so is outline in this section. It is
719 implemented in the <filename>source/systeminfo.py</filename> file.</para>
721 <para>The process for identifying which kernel module needs to be load
726 <para>Create a lookup table of all modules, and which PCI ids
727 coorespond to this module.</para>
731 <para>For each PCI device on the system, lookup its module in the
736 <para>If a module is found, put in into one of two categories of
737 modules, either network module or scsi module, based on the PCI device
742 <para>For each network module, write out an 'eth<index>' entry
743 in the modprobe.conf configuration file.</para>
747 <para>For each scsi module, write out a
748 'scsi_hostadapter<index>' entry in the modprobe.conf
749 configuration file.</para>
753 <para>This process is fairly straight forward, and is simplified by the
754 fact that we currently do not need support for USB, sound, or video
755 devices when the node is fully running. The boot cd itself uses a similar
756 process, but includes USB devices. Consult the boot cd technical
757 documentation for more information.</para>
759 <para>The creation of the PCI id to kernel module table lookup uses three
760 different sources of information, and merges them together into a single
761 table for easier lookups. With these three sources of information, a
762 fairly comprehensive lookup table can be generated for the devices that
763 PlanetLab nodes need to have configured. They include:</para>
767 <para>The installed <filename>/usr/share/hwdata/pcitable
768 </filename>file</para>
770 <para>Created at the time the hwdata rpm was built, this file contains
771 mappings of PCI ids to devices for a large number of devices. It is
772 not necessarily complete, and doesn't take into account the modules
773 that are actually available by the built PlanetLab kernel, which is a
774 subset of the full set available (again, PlanetLab nodes do not have a
775 use for network or video drivers, and thus are not typically
780 <para>From the built kernel, the <filename>modules.pcimap</filename>
781 from the <filename>/lib/modules/<kernelversion>/</filename>
784 <para>This file is generated at the time the kernel is installed, and
785 pulls the PCI ids out of each module, for the modules list they
786 devices they support. Not all modules list all devices they sort, and
787 some contain wild cards (that match any device of a single
788 manufacturer).</para>
792 <para>From the built kernel, the <filename>modules.dep</filename> from
793 the <filename>/lib/modules/<kernelversion>/</filename>
796 <para>This file is also generated at the time the kernel is installed,
797 but lists the dependencies between various modules. It is used to
798 generate a list of modules that are actually available.</para>
802 <para>It should be noted here that SATA (Serial ATA) devices have been
803 known to exist with both a PCI SCSI device class, and with a PCI IDE
804 device class. Under linux 2.6 kernels, all SATA modules need to be listed
805 in modprobe.conf under 'scsi_hostadapter' lines. This case is handled in
806 the hardware loading scripts by making the assumption that if an IDE
807 device matches a loadable module, it should be put in the modprobe.conf
808 file, as 'real' IDE drivers are all currently built into the kernel, and
809 do not need to be loaded. SATA devices that have a PCI SCSI device class
810 are easily identified.</para>
812 <para>It is enssential that the modprobe.conf configuration file contain
813 the correct drivers for the disks on the system, if they are present, as
814 during kernel installation the creation of the initrd (initial ramdisk)
815 which is responsible for booting the system uses this file to identify
816 which drivers to include in it. A failure to do this typically results in
817 an kernel panic at boot with a 'no init found' message.</para>
821 <title>Backward Compatibility</title>
823 <para>Given the large number of nodes in PlanetLab, and the lack of direct
824 physical access to them, the process of updating all configuration files
825 to include the new node id and node key will take a fairly significant
826 amount of time. Rather than delay deployment of the Boot Manager until all
827 machines are updated, alternative methods for aquiring these values is
828 used for existing nodes.</para>
830 <para>First, the node id. For any machine already part of PlanetLab, there
831 exists a record of its IP address and MAC address in PlanetLab central. To
832 get the node_id value, if it is not located in the configuration file, the
833 BootManager uses a standard HTTP POST request to a known php page on the
834 boot server, sending the IP and MAC address of the node. This php page
835 queries the PLC database, and returns a node_Id if the node is part of
836 PlanetLab, -1 otherwise.</para>
838 <para>Second, the node key. All Boot CDs currently in use, at the time
839 they request a script from PLC to run, send in the request a randomly
840 generated value called a boot_nonce, usually 32 bytes or larger. During
841 normal BootManager operation, this value is ignored. However, in the
842 absense of a node key, we can use this value. Although it is not as secure
843 as a typical node key (because it is not distributed through external
844 mechanisms, but is generated by the node itself), it can be used if we
845 validate that the IP address of the node making the request matches the
846 PLC record. This means that nodes behind firewalls can no longer be
847 allowed in this situation.</para>
851 <title>Common Scenarios</title>
853 <para>Below are common scenarios that the BootManager might encounter that
854 would exist outside of the documented procedures for handling nodes. A
855 full description of how they will be handled by the BootManager follows
860 <para>A configuration file from previously installed and functioning
861 node is copied or moved to another machine, and the networks settings
862 are updated on it (but the key and node_id is left the same).</para>
864 <para>Since the authentication for a node consists of matching not
865 only the node id, but the primary node ip, this step will fail, and
866 the node will not allow the boot manager to be run. Instead, the new
867 node must be created at PLC first, and a network configuration file
868 for it must be generated, with its own node key.</para>
872 <para>After a node is installed and running, the administrators
873 mistakenly remove the cd and media containing the configuration
876 <para>The node installer clears all boot records from the disk, so the
877 node will not boot. Typically, the bios will report no operating
882 <para>A new network configuration file is generated on the website,
883 but is not put on the node.</para>
885 <para>Creating a new network configuration file through the PLC
886 interfaces will generate a new node key, effectively invalidating the
887 old configuration file (still in use by the machine). The next time
888 the node reboots and attempts to authentication with PLC, it will
889 fail. After two consecutive authentication failures, the node will
890 automatically put itself into debug mode. In this case, regardless of
891 the API function being called that was unable to authentication, the
892 software at PLC will automatically notify the PlanetLab
893 administrators, and the contacts at the site of the node was able to
894 be identified (usually through its IP address or node_id by searching
895 PLC records.).</para>
904 <title>The PlanetLab Boot Manager</title>
906 <date>January 14, 2005</date>
909 <firstname>Aaron</firstname>
911 <surname>Klingaman</surname>