1 <?xml version="1.0" encoding="UTF-8"?>
2 <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
3 "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd">
6 <title>Booting PlanetLab Nodes</title>
9 <firstname>Aaron</firstname>
11 <surname>Klingaman</surname>
13 <email>alk@absarokasoft.com</email>
17 <orgname>Princeton University</orgname>
22 <revnumber>1.0</revnumber>
24 <date>March 16, 2006</date>
26 <authorinitials>AK</authorinitials>
29 <para>Initial draft of new PDN, based on existing BootManager and
30 BootCD Technical documentation</para>
37 <title>Overview</title>
39 <para>This document describes a reference implementation for securely
40 booting PlanetLab nodes, that has been collectively named the
45 <title>Components</title>
47 <para>The entire Boot Manager system consists of several components that
48 are designed to work together to provide the ability to install, validate,
49 and boot a PlanetLab node. These components are:</para>
53 <para>The existing, stardard MA provided calls to allow principals to
54 add and manage node records</para>
58 <para>New principal used API calls to create and download
59 node-specific configuration files</para>
63 <para>A new set of API calls and a new authentication mechanism to be
64 used by the nodes</para>
68 <para>A code package to be run in the boot cd environment on nodes
69 containing core install/validate/boot logic</para>
75 <title>Soure Code</title>
77 <para>All BootManager source code is located in the repository
78 'bootmanager' on the PlanetLab CVS system. For information on how to
79 access CVS, consult the PlanetLab website. Unless otherwise noted, all
80 file references refer to this repository.</para>
84 <title>Stardard MA Interfaces</title>
86 <para>The API calls provided by the Management Authority are called out
87 here for their relevency, and to document any extentions to them. See the
88 PlanetLab Core Specification for more details.</para>
92 <para>AddNode( authentication, node_values )</para>
94 <para>Add a new node record</para>
98 <para>UpdateNode( authentication, update_values )</para>
100 <para>Update an existing node record</para>
104 <para>DeleteNode( authentication, node_id )</para>
106 <para>Removes a node from the MA list of nodes</para>
108 </itemizedlist></para>
110 <para>Additional node-specific values have been added to the AddNode and
111 UpdateNode calls:</para>
115 <para>boot_state</para>
117 <para>Store what state the node is currently in.</para>
122 <title>Boot States</title>
124 <para>Each node always has one of four possible boot states.</para>
130 <para>Install. The boot state cooresponds to a new node that has not
131 yet been installed, but record of it does exist. When the boot
132 manager starts, and the node is in this state, the user is prompted
133 to continue with the installation. The intention here is to prevent
134 a non-PlanetLab machine (like a user's desktop machine) from
135 becoming inadvertantly wiped and installed with the PlanetLab node
142 <para>Reinstall. In this state, a node will reinstall the node
143 software, erasing anything that might have been on the disk
150 <para>Boot. This state cooresponds with nodes that have sucessfully
151 installed, and can be chain booted to the runtime node
158 <para>Debug. Regardless of whether or not a machine has been
159 installed, this state sets up a node to be debugged by
160 administrators.</para>
167 <title>Additional Principal Based MA Interfaces</title>
169 <para>The following API calls have been added to the MA:</para>
173 <para>GenerateNodeConfigurationFile( authentication, node_id
176 <para>Return a configuration file containing node details, including
177 network settings, the node_id, and a key to be used for
178 authenticated node calls.</para>
180 </itemizedlist></para>
184 <title>Additional Node Based Interfaces and Authentication</title>
187 <title>Authentication</title>
189 <para>The API calls described below will be run by the nodes themselves,
190 so a new authentication mechanism is required. As is done with other PLC
191 API calls, the first parameter to all BootManager related calls will be
192 an authentication structure, consisting of these named fields:</para>
196 <para>AuthMethod</para>
198 <para>The authentication method, only 'hmac' is currently
205 <para>The node id, contained on the configuration file.</para>
211 <para>The node's primary IP address. This will be checked with the
212 node_id against PLC records.</para>
218 <para>The authentication string, depending on method. For the 'hmac'
219 method, a hash for the call using the HMAC algorithm, made from the
220 parameters of the call the key contained on the configuration file.
221 For specifics on how this is created, see below.</para>
225 <para>Authentication is succesful if PLC is able to create the same hash
226 from the values usings its own copy of the node key. If the hash values
227 to not match, then either the keys do not match or the values of the
228 call were modified in transmision and the node cannot be
229 authenticated.</para>
231 <para>Both the BootManager and the authentication software at PLC must
232 agree on a method for creating the hash values for each call. This hash
233 is essentially a finger print of the method call, and is created by this
238 <para>Take the value of every part of each parameter, except the
239 authentication structure, and convert them to strings. For arrays,
240 each element is used. For dictionaries, not only is the value of all
241 the items used, but the keys themselves. Embedded types (arrays or
242 dictionaries inside arrays or dictionaries, etc), also have all
243 values extracted.</para>
247 <para>Alphabetically sort all the parameters.</para>
251 <para>Concatenate them into a single string.</para>
255 <para>Prepend the string with the method name and [, and append
260 <para>The implementation of this algorithm is in the function
261 serialize_params in the file source/BootAPI.py. The same algorithm is
262 located in the 'plc_api' repository, in the function serialize_params in
263 the file PLC/Auth.py.</para>
265 <para>The resultant string is fed into the HMAC algorithm with the node
266 key, and the resultant hash value is used in the authentication
269 <para>This authentication method makes a number of assumptions, detailed
274 <para>All calls made to PLC are done over SSL, so the details of the
275 authentication structure cannot be viewed by 3rd parties. If, in the
276 future, non-SSL based calls are desired, a sequence number or some
277 other value making each call unique will would be required to
278 prevent replay attacks. In fact, the current use of SSL negates the
279 need to create and send hashes across - technically, the key itself
280 could be sent directly to PLC, assuming the connection is made to an
281 HTTPS server with a third party signed SSL certificate.</para>
285 <para>Athough calls are done over SSL, they use the Python class
286 libary xmlrpclib, which does not do SSL certificate
293 <title>Additional API Calls</title>
295 <para>The following calls have been added:</para>
299 <para>BootUpdateNode( authentication, update_values )</para>
301 <para>Update a node record, including its boot state, primary
302 network, or ssh host key.</para>
306 <para>BootCheckAuthentication( authentication )</para>
308 <para>Simply check to see if the node is recognized by the system
309 and is authorized.</para>
313 <para>BootGetNodeDetails( authentication )</para>
315 <para>Return details about a node, including its state, what
316 networks the PLC database has configured for the node, and what the
317 model of the node is.</para>
321 <para>BootNotifyOwners( authentication, message, include_pi,
322 include_tech, include_support )</para>
324 <para>Notify someone about an event that happened on the machine,
325 and optionally include the site PIs, technical contacts, and
326 PlanetLab Support.</para>
333 <title>Core Package</title>
335 <para>The Boot Manager core package, which is run on the nodes and
336 contacts the Boot API as necessary, is responsible for the following major
337 functional units:</para>
341 <para>Configuring node hardware and installing the PlanetLab operating
346 <para>Putting a node into a debug state so administrators can track
351 <para>Reconfiguring an already installed node to reflect new hardware,
352 or changed network settings</para>
356 <para>Booting an already installed node into the PlanetLab operating
362 <title>Flow Chart</title>
364 <para>Below is a high level flow chart of the boot manager, from the
365 time it is executed to when it exits. This core state machine is located
366 in source/BootManager.py.</para>
369 <title>Boot Manager Flow Chart</title>
373 <imagedata align="left" fileref="boot-manager-flowchart.png"
383 <title>Example Session Sequence</title>
386 <title>Boot Manager Session Sequence Diagram</title>
390 <imagedata align="left" fileref="bootmanager-sequence.png"
398 <title>Boot CD Environment</title>
400 <para>The boot manager needs to be able to operate under all currently
401 supported boot cds. The new 3.0 cd contains software the current 2.x cds
402 do not contain, including the Logical Volume Manager (LVM) client tools,
403 RPM, and YUM, among other packages. Given this requirement, the boot cd
404 will need to download as necessary the extra support files it needs to
405 run. Depending on the size of these files, they may only be downloaded
406 by specific steps in the flow chart in figure 1, and thus are not
409 <para>See the PlanetLab BootCD Documentation for more information about
410 the current, 3.x boot cds, how they are build, and what they provide to
411 the BootManager.</para>
415 <title>Node Configuration Files</title>
417 <para>To remain compatible with 2.x boot cds, the format and existing
418 contents of the configuration files for the nodes will not change. There
419 will be, however, the addition of three fields:</para>
423 <para>NET_DEVICE</para>
425 <para>If present, use the device with the specified mac address to
426 contact PLC. The network on this device will be setup. If not
427 present, the device represented by 'eth0' will be used.</para>
431 <para>NODE_KEY</para>
433 <para>The unique, per-node key to be used during authentication and
434 identity verification. This is a fixed length, random value that is
435 only known to the node and PLC.</para>
441 <para>The PLC assigned node identifier.</para>
445 <para>An example of a configuration file for a dhcp networked
448 <programlisting>IP_METHOD="dhcp"
449 HOST_NAME="planetlab-1"
450 DOMAIN_NAME="cs.princeton.edu"
451 NET_DEVICE="00:06:5B:EC:33:BB"
452 NODE_KEY="79efbe871722771675de604a227db8386bc6ef482a4b74"
453 NODE_ID="121"</programlisting>
455 <para>An example of a configuration file for the same machine, only with
456 a statically assigned network address:</para>
458 <programlisting>IP_METHOD="static"
459 IP_ADDRESS="128.112.139.71"
460 IP_GATEWAY="128.112.139.65"
461 IP_NETMASK="255.255.255.192"
462 IP_NETADDR="128.112.139.127"
463 IP_BROADCASTADDR="128.112.139.127"
464 IP_DNS1="128.112.136.10"
465 IP_DNS2="128.112.136.12"
466 HOST_NAME="planetlab-1"
467 DOMAIN_NAME="cs.princeton.edu"
468 NET_DEVICE="00:06:5B:EC:33:BB"
469 NODE_KEY="79efbe871722771675de604a227db8386bc6ef482a4b74"
470 NODE_ID="121"</programlisting>
475 <title>BootManager Configuration</title>
477 <para>All run time configuration options for the BootManager exist in a
478 single file located at source/configuration. These values are described
483 <para><literal>VERSION</literal></para>
485 <para>The current BootManager version. During install, written out to
486 /etc/planetlab/install_version</para>
490 <para><literal>BOOT_API_SERVER</literal></para>
492 <para>The full URL of the API server to contact for authenticated
497 <para><literal>TEMP_PATH</literal></para>
499 <para>A writable path on the boot cd we can use for temporary storage
504 <para><literal>SYSIMG_PATH</literal></para>
506 <para>The path were we will mount the node logical volumes during any
507 step that requires access to the disks.</para>
511 <para>CACERT_PATH</para>
513 <para>Variable not used anymore.</para>
517 <para><literal>NONCE_FILE</literal></para>
519 <para>Variable not used anymore.</para>
523 <para><literal>PLCONF_DIR</literal></para>
525 <para>The path that PlanetLab node configuration files will be created
526 in during install. This should not be changed from /etc/planetlab, as
527 this path is assumed in other PlanetLab components.</para>
531 <para><literal>SUPPORT_FILE_DIR</literal></para>
533 <para>A path on the boot server where per-step additional files may be
534 located. For example, the packages that include the tools to allow
535 older 2.x version boot cds to partition disks with LVM.</para>
539 <para><literal>ROOT_SIZE</literal></para>
541 <para>During install, this sets the size of the node root partition.
542 It must be large enough to house all the node operational software. It
543 does not store any user/slice files. Include 'G' suffix in this value,
544 indicating gigabytes.</para>
548 <para><literal>SWAP_SIZE</literal></para>
550 <para>How much swap to configure the node with during install. Include
551 'G' suffix in this value, indicating gigabytes.</para>
555 <para><literal>SKIP_HARDWARE_REQUIREMENT_CHECK</literal></para>
557 <para>Whether or not to skip any of the hardware requirement checks,
558 including total disk and memory size constraints.</para>
562 <para><literal>MINIMUM_MEMORY</literal></para>
564 <para>How much memory is required by a running PlanetLab node. If a
565 machine contains less physical memory than this value, the install
566 will not proceed.</para>
570 <para><literal>MINIMUM_DISK_SIZE</literal></para>
572 <para>The size of the small disk we are willing to attempt to use
573 during the install, in gigabytes. Do not include any suffixes.</para>
577 <para><literal>TOTAL_MINIMUM_DISK_SIZE</literal></para>
579 <para>The size of all usable disks must be at least this sizse, in
580 gigabytes. Do not include any suffixes.</para>
584 <para><literal>INSTALL_LANGS</literal></para>
586 <para>Which language support to install. This value is used by RPM,
587 and is used in writting /etc/rpm/macros before any RPMs are
592 <para><literal>NUM_AUTH_FAILURES_BEFORE_DEBUG</literal></para>
594 <para>How many authentication failures the BootManager is willing to
595 except for any set of calls, before stopping and putting the node into
602 <title>Installer Hardware Detection</title>
604 <para>When a node is being installed, the Boot Manager must identify which
605 hardware the machine has that is applicable to a running node, and
606 configure the node properly so it can boot properly post-install. The
607 general procedure for doing so is outline in this section. It is
608 implemented in the <filename>source/systeminfo.py</filename> file.</para>
610 <para>The process for identifying which kernel module needs to be load
615 <para>Create a lookup table of all modules, and which PCI ids
616 coorespond to this module.</para>
620 <para>For each PCI device on the system, lookup its module in the
625 <para>If a module is found, put in into one of two categories of
626 modules, either network module or scsi module, based on the PCI device
631 <para>For each network module, write out an 'eth<index>' entry
632 in the modprobe.conf configuration file.</para>
636 <para>For each scsi module, write out a
637 'scsi_hostadapter<index>' entry in the modprobe.conf
638 configuration file.</para>
642 <para>This process is fairly straight forward, and is simplified by the
643 fact that we currently do not need support for USB, sound, or video
644 devices when the node is fully running. The boot cd itself uses a similar
645 process, but includes USB devices. Consult the boot cd technical
646 documentation for more information.</para>
648 <para>The creation of the PCI id to kernel module table lookup uses three
649 different sources of information, and merges them together into a single
650 table for easier lookups. With these three sources of information, a
651 fairly comprehensive lookup table can be generated for the devices that
652 PlanetLab nodes need to have configured. They include:</para>
656 <para>The installed <filename>/usr/share/hwdata/pcitable
657 </filename>file</para>
659 <para>Created at the time the hwdata rpm was built, this file contains
660 mappings of PCI ids to devices for a large number of devices. It is
661 not necessarily complete, and doesn't take into account the modules
662 that are actually available by the built PlanetLab kernel, which is a
663 subset of the full set available (again, PlanetLab nodes do not have a
664 use for network or video drivers, and thus are not typically
669 <para>From the built kernel, the <filename>modules.pcimap</filename>
670 from the <filename>/lib/modules/<kernelversion>/</filename>
673 <para>This file is generated at the time the kernel is installed, and
674 pulls the PCI ids out of each module, for the modules list they
675 devices they support. Not all modules list all devices they sort, and
676 some contain wild cards (that match any device of a single
677 manufacturer).</para>
681 <para>From the built kernel, the <filename>modules.dep</filename> from
682 the <filename>/lib/modules/<kernelversion>/</filename>
685 <para>This file is also generated at the time the kernel is installed,
686 but lists the dependencies between various modules. It is used to
687 generate a list of modules that are actually available.</para>
691 <para>It should be noted here that SATA (Serial ATA) devices have been
692 known to exist with both a PCI SCSI device class, and with a PCI IDE
693 device class. Under linux 2.6 kernels, all SATA modules need to be listed
694 in modprobe.conf under 'scsi_hostadapter' lines. This case is handled in
695 the hardware loading scripts by making the assumption that if an IDE
696 device matches a loadable module, it should be put in the modprobe.conf
697 file, as 'real' IDE drivers are all currently built into the kernel, and
698 do not need to be loaded. SATA devices that have a PCI SCSI device class
699 are easily identified.</para>
701 <para>It is enssential that the modprobe.conf configuration file contain
702 the correct drivers for the disks on the system, if they are present, as
703 during kernel installation the creation of the initrd (initial ramdisk)
704 which is responsible for booting the system uses this file to identify
705 which drivers to include in it. A failure to do this typically results in
706 an kernel panic at boot with a 'no init found' message.</para>
710 <title>Common Scenarios</title>
712 <para>Below are common scenarios that the BootManager might encounter that
713 would exist outside of the documented procedures for handling nodes. A
714 full description of how they will be handled by the BootManager follows
719 <para>A configuration file from previously installed and functioning
720 node is copied or moved to another machine, and the networks settings
721 are updated on it (but the key and node_id is left the same).</para>
723 <para>Since the authentication for a node consists of matching not
724 only the node id, but the primary node ip, this step will fail, and
725 the node will not allow the boot manager to be run. Instead, the new
726 node must be created at PLC first, and a network configuration file
727 for it must be generated, with its own node key.</para>
731 <para>After a node is installed and running, the administrators
732 mistakenly remove the cd and media containing the configuration
735 <para>The node installer clears all boot records from the disk, so the
736 node will not boot. Typically, the bios will report no operating
741 <para>A new network configuration file is generated on the website,
742 but is not put on the node.</para>
744 <para>Creating a new network configuration file through the PLC
745 interfaces will generate a new node key, effectively invalidating the
746 old configuration file (still in use by the machine). The next time
747 the node reboots and attempts to authentication with PLC, it will
748 fail. After two consecutive authentication failures, the node will
749 automatically put itself into debug mode. In this case, regardless of
750 the API function being called that was unable to authentication, the
751 software at PLC will automatically notify the PlanetLab
752 administrators, and the contacts at the site of the node was able to
753 be identified (usually through its IP address or node_id by searching
754 PLC records.).</para>