1 <?xml version="1.0" encoding="UTF-8"?>
2 <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
3 "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd">
6 <title>BootManager Technical Documentation</title>
9 <firstname>Aaron</firstname>
11 <surname>Klingaman</surname>
13 <email>alk@absarokasoft.com</email>
17 <orgname>Princeton University</orgname>
22 <revnumber>1.0</revnumber>
24 <date>March 15, 2005</date>
26 <authorinitials>AK</authorinitials>
29 <para>Initial draft.</para>
34 <revnumber>1.1</revnumber>
36 <date>May 31, 2005</date>
38 <authorinitials>AK</authorinitials>
41 <para>Updated post implementation and deployment.</para>
46 <revnumber>1.2</revnumber>
48 <date>November 16, 2005</date>
50 <authorinitials>AK</authorinitials>
53 <para>Add section on where source code is, and other updates to make
54 it consistent with implementation.</para>
59 <revnumber>1.3</revnumber>
61 <date>March 17, 2006</date>
63 <authorinitials>AK</authorinitials>
66 <para>Reworked various wording to fit in correctly with new
67 architecture terminology.</para>
69 <para>Updated to match PlanetLab Core Specification.</para>
76 <title>Overview</title>
78 <para>This document describes the implementation of the package called the
79 BootManager at a technical level. The BootManager is used in conjunction
80 with the PlanetLab BootCD to securely boot nodes, including remote
81 installation, debugging, and validation. It is the primary method used by
82 the PlanetLab Central Management Authority (MA) to manage nodes.</para>
86 <title>Components</title>
88 <para>The entire BootManager system consists of several primary
89 components. These are:</para>
93 <para>The existing, stardard MA provided calls to allow principals to
94 add and manage node records, and a new call to generate node-specific
95 configuration files</para>
99 <para>New MA API calls with a new authentication mechanism for
100 node-based MA calls</para>
104 <para>A code package to be run in the boot cd environment on nodes
105 containing core install/validate/boot logic</para>
109 <para>The intention with the BootManager system is to send the same script
110 to all nodes (consisting of the core BootManager code), each time the node
111 starts. Then, the BootManager will run and detiremine which operations to
112 perform on the node, based on its state of installation. All state based
113 logic for the node boot, install, debug, and reconfigure operations are
114 contained in one place; there is no boot state specific logic located on
115 the MA servers.</para>
119 <title>Soure Code</title>
121 <para>All BootManager source code is located in the repository
122 'bootmanager' on the PlanetLab CVS system. For information on how to
123 access CVS, consult the PlanetLab website. Unless otherwise noted, all
124 file references refer to this repository.</para>
128 <title>Management Authority Node Fields</title>
130 <para>The following MA database fields are directly applicable to the
131 BootManager operation, and to the node-related API calls (detailed
135 <title>node_id</title>
137 <para>An integer unique identifier for a specific node.</para>
141 <title>node_key</title>
143 <para>This is a per-node, unique value that forms the basis of the node
144 authentication mechanism detailed below. When a new node record is added
145 to the MA by a principal, it is automatically assigned a new, random
146 key, and distributed out of band to the nodes. This shared secret is
147 then used for node authentication. The contents of node_key are
148 generated using this command:</para>
150 <para><programlisting>openssl rand -base64 32</programlisting></para>
152 <para>Any = (equals) characters are removed from the string.</para>
156 <title>boot_state</title>
158 <para>Each node always has one of four possible boot states, stored as a
159 string, refered to as boot_state. These are:</para>
165 <para>Install. The boot state cooresponds to a new node that has not
166 yet been installed, but record of it does exist. When the
167 BootManager starts, and the node is in this state, the user is
168 prompted to continue with the installation. The intention here is to
169 prevent a non-PlanetLab machine (like a user's desktop machine) from
170 becoming inadvertantly wiped and installed with the PlanetLab node
171 software. This is the default state for new nodes.</para>
177 <para>Reinstall. In this state, a node will reinstall the node
178 software, erasing anything that might have been on the disk
185 <para>Boot to bring a node online. This state cooresponds with nodes
186 that have sucessfully installed, and can be chain booted to the
187 runtime node kernel.</para>
193 <para>Debug. Regardless of whether or not a machine has been
194 installed, this state sets up a node to be debugged by
195 administrators. In debug mode, no node software is running, and the
196 node can be accessed remotely by administrators.</para>
203 <title>Existing Management Authority API Calls</title>
205 <para>These calls, take from the PlanetLab Core Specification and extended
206 with additional parameters, are used by principals to maintain the set of
207 nodes managed by a MA. See the Core Specification for more information.
208 The MA may provide an easy to use interface, such as a web interface, that
209 calls these directly.</para>
213 <para>AddNode( authentication, node_values )</para>
215 <para>Add a new node record. node_values contains hostname, ip
216 address and other network settings, and the new fields: boot_state.
217 The resultant node_id is returned.</para>
221 <para>UpdateNode( authentication, node_id, update_values )</para>
223 <para>Update an existing node record. update_values can include
224 hostname, ipaddress, and the new fields: boot_state.</para>
228 <para>DeleteNode( authentication, node_id )</para>
230 <para>Delete a node record.</para>
232 </itemizedlist></para>
236 <title>New Management Authority API Calls</title>
238 <para>The API calls available as part of the MA API that are intended to
239 be run by principals leverage existing authentication mechanisms. However,
240 the API calls described below that will be run by the nodes themselves
241 need a new authentication mechanism.</para>
244 <title>Node Authentication</title>
246 <para>As is done with other MA API calls, the first parameter to all
247 BootManager related calls will be an authentication structure,
248 consisting of these named fields:</para>
252 <para>AuthMethod</para>
254 <para>The authentication method, only 'hmac' is currently
261 <para>The node id, contained in the configuration file on the
268 <para>The node's primary IP address. This will be checked with the
269 node_id against MA records.</para>
275 <para>The authentication string, depending on method. For the 'hmac'
276 method, a hash for the call using the HMAC algorithm, made from the
277 parameters of the call and the key contained on the configuration
278 file. For specifics on how this is created, see below.</para>
282 <para>Authentication is succesful if the MA is able to create the same
283 hash from the values usings its own copy of the NODE_KEY. If the hash
284 values to not match, then either the keys do not match or the values of
285 the call were modified in transmision and the node cannot be
286 authenticated.</para>
288 <para>Both the BootManager and the authentication functions at the MA
289 must agree on a method for creating the hash values for each call. This
290 hash is essentially a finger print of the method call, and is created by
291 this algorithm:</para>
295 <para>Take the value of every part of each parameter, except the
296 authentication structure, and convert them to strings. For arrays,
297 each element is used. For dictionaries, not only is the value of all
298 the items used, but the keys themselves. Embedded types (arrays or
299 dictionaries inside arrays or dictionaries, etc), also have all
300 values extracted.</para>
304 <para>Alphabetically sort all the parameters.</para>
308 <para>Concatenate them into a single string.</para>
312 <para>Prepend the string with the method name and [, and append
317 <para>The implementation of this algorithm is in the function
318 serialize_params in the file source/BootAPI.py. The same algorithm is
319 located in the 'plc_api' repository, in the function serialize_params in
320 the file PLC/Auth.py.</para>
322 <para>The resultant string is fed into the HMAC algorithm with the node
323 key, and the resultant hash value is used in the authentication
326 <para>This authentication method makes a number of assumptions, detailed
331 <para>All calls made to the MA are done over SSL, so the details of
332 the authentication structure cannot be viewed by 3rd parties. If, in
333 the future, non-SSL based calls are desired, a sequence number or
334 some other value making each call unique will would be required to
335 prevent replay attacks. In fact, the current use of SSL negates the
336 need to create and send hashes across - technically, the key itself
337 could be sent directly to the MA, assuming the connection is made to
338 an HTTPS server with a third party signed SSL certificate being
343 <para>Athough calls are done over SSL, they use the Python class
344 libary xmlrpclib, which does not do SSL certificate
351 <title>New API Calls</title>
353 <para>The calls available to the BootManager, that accept the above
354 authentication, are:</para>
358 <para>BootUpdateNode( authentication, update_values )</para>
360 <para>Update a node record, including its boot state, primary
361 network, or ssh host key.</para>
365 <para>BootCheckAuthentication( authentication )</para>
367 <para>Simply check to see if the node is recognized by the system
368 and is authorized.</para>
372 <para>BootGetNodeDetails( authentication )</para>
374 <para>Return details about a node, including its state, what
375 networks the MA database has configured for the node, and what the
376 model of the node is.</para>
380 <para>BootNotifyOwners( authentication, message, include_pi,
381 include_tech, include_support )</para>
383 <para>Notify someone about an event that happened on the machine,
384 and optionally include the site Principal Investigators, technical
385 contacts, and PlanetLab Support.</para>
389 <para>The new calls used by principals, using existing authentication
394 <para>GenerateNodeConfigurationFile( authentication, node_id
397 <para>Generate a configuration file to be used by the BootManager
398 and the BootCD to configure the network for the node during boot.
399 This resultant file also contains the node_id and node_key values.
400 A new node_key is generated each time, invalidating old files. The
401 full contents and format of this file is detailed below.</para>
403 </itemizedlist></para>
408 <title>Core Software Package</title>
410 <para>The BootManager core package, which is run on the nodes and contacts
411 the MA API as necessary, is responsible for the following major functional
416 <para>Configuring node hardware and installing the PlanetLab operating
421 <para>Putting a node into a debug state so administrators can track
426 <para>Reconfiguring an already installed node to reflect new hardware,
427 or changed network settings</para>
431 <para>Booting an already installed node into the PlanetLab operating
437 <title>BootManager Flow Chart</title>
439 <para>Below is a high level flow chart of the BootManager, from the time
440 it is executed to when it exits. This core state machine is located in
441 source/BootManager.py.</para>
444 <title>BootManager Flow Chart</title>
448 <imagedata align="left" fileref="boot-manager-flowchart.png"
458 <title>Example Execution Session</title>
460 <para>Below is one example session of the BootManager, for a new node
461 being installed then booted.</para>
464 <title>Example Execution Session</title>
468 <imagedata align="left" fileref="bootmanager-sequence.png"
476 <title>Boot CD Environment</title>
478 <para>The BootManager needs to be able to operate under all currently
479 supported boot cds. The new 3.0 cd contains software the current 2.x cds
480 do not contain, including the Logical Volume Manager (LVM) client tools,
481 RPM, and YUM, among other packages. Given this requirement, the boot cd
482 will need to download as necessary the extra support files it needs to
483 run. Depending on the size of these files, they may only be downloaded
484 by specific steps in the flow chart in figure 1, and thus are not
487 <para>See the PlanetLab BootCD Documentation for more information about
488 the current, 3.x boot cds, how they are build, and what they provide to
489 the BootManager.</para>
493 <title>Node Configuration Files</title>
495 <para>To remain compatible with 2.x boot cds, the format and existing
496 contents of the configuration files for the nodes will not change. There
497 will be, however, the addition of three fields:</para>
501 <para>NET_DEVICE</para>
503 <para>If present, use the device with the specified mac address to
504 contact the MA. The network on this device will be setup. If not
505 present, the device represented by 'eth0' will be used.</para>
509 <para>NODE_KEY</para>
511 <para>The unique, per-node key to be used during authentication and
512 identity verification. This is a fixed length, random value that is
513 only known to the node and the MA database.</para>
519 <para>The MA assigned node identifier.</para>
523 <para>An example of a configuration file for a dhcp networked
526 <programlisting>IP_METHOD="dhcp"
527 HOST_NAME="planetlab-1"
528 DOMAIN_NAME="cs.princeton.edu"
529 NET_DEVICE="00:06:5B:EC:33:BB"
530 NODE_KEY="79efbe871722771675de604a227db8386bc6ef482a4b74"
531 NODE_ID="121"</programlisting>
533 <para>An example of a configuration file for the same machine, only with
534 a statically assigned network address:</para>
536 <programlisting>IP_METHOD="static"
537 IP_ADDRESS="128.112.139.71"
538 IP_GATEWAY="128.112.139.65"
539 IP_NETMASK="255.255.255.192"
540 IP_NETADDR="128.112.139.127"
541 IP_BROADCASTADDR="128.112.139.127"
542 IP_DNS1="128.112.136.10"
543 IP_DNS2="128.112.136.12"
544 HOST_NAME="planetlab-1"
545 DOMAIN_NAME="cs.princeton.edu"
546 NET_DEVICE="00:06:5B:EC:33:BB"
547 NODE_KEY="79efbe871722771675de604a227db8386bc6ef482a4b74"
548 NODE_ID="121"</programlisting>
550 <para>Existing 2.x boot cds will look for the configuration files only
551 on a floppy disk, and the file must be named 'planet.cnf'. The new 3.x
552 boot cds, however, will initially look for a file named 'plnode.txt' on
553 either a floppy disk, or burned onto the cd itself. Alternatively, it
554 will fall back to looking for the original file name, 'planet.cnf'. This
555 initial file reading is performed by the boot cd itself to bring the
556 nodes network online, so it can download and execute the
559 <para>However, the BootManager will also need to identify the location
560 of and read in the file, so it can get the extra fields not initially
561 used to bring the network online (primarily node_key and node_id). Below
562 is the search order that the BootManager will use to locate a
565 <para>Configuration file location search order:<informaltable>
569 <entry>File name</entry>
571 <entry>Floppy drive</entry>
573 <entry>Flash devices</entry>
575 <entry>Root file system, in /</entry>
577 <entry>CDRom, in /usr/boot</entry>
579 <entry>CDRom, in /usr</entry>
583 <entry>plode.txt</entry>
597 <entry>planet.cnf</entry>
611 </informaltable></para>
615 <title>BootManager Configuration</title>
617 <para>All run time configuration options for the BootManager exist in a
618 single file located at source/configuration. These values are described
619 below. These values cannot be changed on the fly - they must be changed
620 and a new BootManager package built and signed.</para>
624 <para><literal>VERSION</literal></para>
626 <para>The current BootManager version. During install, written out
627 to /etc/planetlab/install_version</para>
631 <para><literal>BOOT_API_SERVER</literal></para>
633 <para>The full URL of the API server to contact for authenticated
638 <para><literal>TEMP_PATH</literal></para>
640 <para>A writable path on the boot cd we can use for temporary
641 storage of files.</para>
645 <para><literal>SYSIMG_PATH</literal></para>
647 <para>The path were we will mount the node logical volumes during
648 any step that requires access to the disks.</para>
652 <para>CACERT_PATH</para>
654 <para>Variable not used anymore.</para>
658 <para><literal>NONCE_FILE</literal></para>
660 <para>Variable not used anymore.</para>
664 <para><literal>PLCONF_DIR</literal></para>
666 <para>The path that PlanetLab node configuration files will be
667 created in during install. This should not be changed from
668 /etc/planetlab, as this path is assumed in other PlanetLab
673 <para><literal>SUPPORT_FILE_DIR</literal></para>
675 <para>A path on the boot server where per-step additional files may
676 be located. For example, the packages that include the tools to
677 allow older 2.x version boot cds to partition disks with LVM.</para>
681 <para><literal>ROOT_SIZE</literal></para>
683 <para>During install, this sets the size of the node root partition.
684 It must be large enough to house all the node operational software.
685 It does not store any user/slice files. Include 'G' suffix in this
686 value, indicating gigabytes.</para>
690 <para><literal>SWAP_SIZE</literal></para>
692 <para>How much swap to configure the node with during install.
693 Include 'G' suffix in this value, indicating gigabytes.</para>
697 <para><literal>SKIP_HARDWARE_REQUIREMENT_CHECK</literal></para>
699 <para>Whether or not to skip any of the hardware requirement checks,
700 including total disk and memory size constraints.</para>
704 <para><literal>MINIMUM_MEMORY</literal></para>
706 <para>How much memory is required by a running PlanetLab node. If a
707 machine contains less physical memory than this value, the install
708 will not proceed.</para>
712 <para><literal>MINIMUM_DISK_SIZE</literal></para>
714 <para>The size of the small disk we are willing to attempt to use
715 during the install, in gigabytes. Do not include any
720 <para><literal>TOTAL_MINIMUM_DISK_SIZE</literal></para>
722 <para>The size of all usable disks must be at least this sizse, in
723 gigabytes. Do not include any suffixes.</para>
727 <para><literal>INSTALL_LANGS</literal></para>
729 <para>Which language support to install. This value is used by RPM,
730 and is used in writting /etc/rpm/macros before any RPMs are
735 <para><literal>NUM_AUTH_FAILURES_BEFORE_DEBUG</literal></para>
737 <para>How many authentication failures the BootManager is willing to
738 except for any set of calls, before stopping and putting the node
739 into a debug mode.</para>
745 <title>Installer Hardware Detection</title>
747 <para>When a node is being installed, the BootManager must identify
748 which hardware the machine has that is applicable to a running node, and
749 configure the node properly so it can boot properly post-install. The
750 general procedure for doing so is outline in this section. It is
751 implemented in the <filename>source/systeminfo.py</filename>
754 <para>The process for identifying which kernel module needs to be load
759 <para>Create a lookup table of all modules, and which PCI ids
760 coorespond to this module.</para>
764 <para>For each PCI device on the system, lookup its module in the
769 <para>If a module is found, put in into one of two categories of
770 modules, either network module or scsi module, based on the PCI
775 <para>For each network module, write out an 'eth<index>' entry
776 in the modprobe.conf configuration file.</para>
780 <para>For each scsi module, write out a
781 'scsi_hostadapter<index>' entry in the modprobe.conf
782 configuration file.</para>
786 <para>This process is fairly straight forward, and is simplified by the
787 fact that we currently do not need support for USB, sound, or video
788 devices when the node is fully running. The boot cd itself uses a
789 similar process, but includes USB devices. Consult the boot cd technical
790 documentation for more information.</para>
792 <para>The creation of the PCI id to kernel module table lookup uses
793 three different sources of information, and merges them together into a
794 single table for easier lookups. With these three sources of
795 information, a fairly comprehensive lookup table can be generated for
796 the devices that PlanetLab nodes need to have configured. They
801 <para>The installed <filename>/usr/share/hwdata/pcitable
802 </filename>file</para>
804 <para>Created at the time the hwdata rpm was built, this file
805 contains mappings of PCI ids to devices for a large number of
806 devices. It is not necessarily complete, and doesn't take into
807 account the modules that are actually available by the built
808 PlanetLab kernel, which is a subset of the full set available
809 (again, PlanetLab nodes do not have a use for network or video
810 drivers, and thus are not typically built).</para>
814 <para>From the built kernel, the <filename>modules.pcimap</filename>
815 from the <filename>/lib/modules/<kernelversion>/</filename>
818 <para>This file is generated at the time the kernel is installed,
819 and pulls the PCI ids out of each module, for the modules list they
820 devices they support. Not all modules list all devices they sort,
821 and some contain wild cards (that match any device of a single
822 manufacturer).</para>
826 <para>From the built kernel, the <filename>modules.dep</filename>
827 from the <filename>/lib/modules/<kernelversion>/</filename>
830 <para>This file is also generated at the time the kernel is
831 installed, but lists the dependencies between various modules. It is
832 used to generate a list of modules that are actually
837 <para>It should be noted here that SATA (Serial ATA) devices have been
838 known to exist with both a PCI SCSI device class, and with a PCI IDE
839 device class. Under linux 2.6 kernels, all SATA modules need to be
840 listed in modprobe.conf under 'scsi_hostadapter' lines. This case is
841 handled in the hardware loading scripts by making the assumption that if
842 an IDE device matches a loadable module, it should be put in the
843 modprobe.conf file, as 'real' IDE drivers are all currently built into
844 the kernel, and do not need to be loaded. SATA devices that have a PCI
845 SCSI device class are easily identified.</para>
847 <para>It is enssential that the modprobe.conf configuration file contain
848 the correct drivers for the disks on the system, if they are present, as
849 during kernel installation the creation of the initrd (initial ramdisk)
850 which is responsible for booting the system uses this file to identify
851 which drivers to include in it. A failure to do this typically results
852 in an kernel panic at boot with a 'no init found' message.</para>
857 <title>Backward Compatibility</title>
859 <para>This section only applies to those interested in sections of the
860 BootManager that exist for backward compatibility with nodes not
861 containing the NODE_KEY. This does not affect any nodes added to the
862 system after deployment of the BootManager.</para>
864 <para>Given the large number of nodes in PlanetLab, and the lack of direct
865 physical access to them, the process of updating all configuration files
866 to include the new NODE_ID and NODE_KEY will take a fairly significant
867 amount of time. Rather than delay deployment of the BootManager until all
868 machines are updated, alternative methods for aquiring these values is
869 used for these nodes.</para>
871 <para>First, the NODE_ID value. For any machine already part of PlanetLab,
872 there exists a record of its IP address and MAC address in PlanetLab
873 central. To get the NODE_ID value, if it is not located in the
874 configuration file, the BootManager uses a standard HTTP POST request to a
875 known php page on the boot server, sending the IP and MAC address of the
876 node. This php page queries the MA database (by using a PHP page, not
877 through the MA API), and returns a NODE_ID if the node is part of
878 PlanetLab, -1 otherwise.</para>
880 <para>Second, the NODE_KEY value. All Boot CDs currently in use, at the
881 time they request a script from the MA to run, send in the request a
882 randomly generated value called a boot_nonce, usually 32 bytes or larger.
883 During normal BootManager operation, this value is ignored. However, in
884 the absense of a node key, we can use this value. Although it is not as
885 secure as a typical node key (because it is not distributed through
886 external mechanisms, but is generated by the node itself), it can be used
887 if we validate that the IP address of the node making the request matches
888 the MA record. This means that nodes behind firewalls can no longer be
889 allowed in this situation.</para>