--- /dev/null
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
+"http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd">
+<article>
+ <articleinfo>
+ <title>Booting PlanetLab Nodes</title>
+
+ <author>
+ <firstname>Aaron</firstname>
+
+ <surname>Klingaman</surname>
+
+ <email>alk@absarokasoft.com</email>
+ </author>
+
+ <affiliation>
+ <orgname>Princeton University</orgname>
+ </affiliation>
+
+ <revhistory>
+ <revision>
+ <revnumber>1.0</revnumber>
+
+ <date>March 16, 2006</date>
+
+ <authorinitials>AK</authorinitials>
+
+ <revdescription>
+ <para>Initial draft of new PDN, based on existing BootManager and
+ BootCD Technical documentation</para>
+ </revdescription>
+ </revision>
+ </revhistory>
+ </articleinfo>
+
+ <section>
+ <title>Overview</title>
+
+ <para>This document describes a reference implementation for securely
+ booting PlanetLab nodes, that has been collectively named the
+ BootManager.</para>
+ </section>
+
+ <section>
+ <title>Components</title>
+
+ <para>The entire Boot Manager system consists of several components that
+ are designed to work together to provide the ability to install, validate,
+ and boot a PlanetLab node. These components are:</para>
+
+ <itemizedlist>
+ <listitem>
+ <para>The existing, stardard MA provided calls to allow principals to
+ add and manage node records</para>
+ </listitem>
+
+ <listitem>
+ <para>New principal used API calls to create and download
+ node-specific configuration files</para>
+ </listitem>
+
+ <listitem>
+ <para>A new set of API calls and a new authentication mechanism to be
+ used by the nodes</para>
+ </listitem>
+
+ <listitem>
+ <para>A code package to be run in the boot cd environment on nodes
+ containing core install/validate/boot logic</para>
+ </listitem>
+ </itemizedlist>
+ </section>
+
+ <section>
+ <title>Soure Code</title>
+
+ <para>All BootManager source code is located in the repository
+ 'bootmanager' on the PlanetLab CVS system. For information on how to
+ access CVS, consult the PlanetLab website. Unless otherwise noted, all
+ file references refer to this repository.</para>
+ </section>
+
+ <section>
+ <title>Stardard MA Interfaces</title>
+
+ <para>The API calls provided by the Management Authority are called out
+ here for their relevency, and to document any extentions to them. See the
+ PlanetLab Core Specification for more details.</para>
+
+ <para><itemizedlist>
+ <listitem>
+ <para>AddNode( authentication, node_values )</para>
+
+ <para>Add a new node record</para>
+ </listitem>
+
+ <listitem>
+ <para>UpdateNode( authentication, update_values )</para>
+
+ <para>Update an existing node record</para>
+ </listitem>
+
+ <listitem>
+ <para>DeleteNode( authentication, node_id )</para>
+
+ <para>Removes a node from the MA list of nodes</para>
+ </listitem>
+ </itemizedlist></para>
+
+ <para>Additional node-specific values have been added to the AddNode and
+ UpdateNode calls:</para>
+
+ <itemizedlist>
+ <listitem>
+ <para>boot_state</para>
+
+ <para>Store what state the node is currently in.</para>
+ </listitem>
+ </itemizedlist>
+
+ <section>
+ <title>Boot States</title>
+
+ <para>Each node always has one of four possible boot states.</para>
+
+ <orderedlist>
+ <listitem>
+ <para>'inst'</para>
+
+ <para>Install. The boot state cooresponds to a new node that has not
+ yet been installed, but record of it does exist. When the boot
+ manager starts, and the node is in this state, the user is prompted
+ to continue with the installation. The intention here is to prevent
+ a non-PlanetLab machine (like a user's desktop machine) from
+ becoming inadvertantly wiped and installed with the PlanetLab node
+ software.</para>
+ </listitem>
+
+ <listitem>
+ <para>'rins'</para>
+
+ <para>Reinstall. In this state, a node will reinstall the node
+ software, erasing anything that might have been on the disk
+ before.</para>
+ </listitem>
+
+ <listitem>
+ <para>'boot'</para>
+
+ <para>Boot. This state cooresponds with nodes that have sucessfully
+ installed, and can be chain booted to the runtime node
+ kernel.</para>
+ </listitem>
+
+ <listitem>
+ <para>'dbg'</para>
+
+ <para>Debug. Regardless of whether or not a machine has been
+ installed, this state sets up a node to be debugged by
+ administrators.</para>
+ </listitem>
+ </orderedlist>
+ </section>
+ </section>
+
+ <section>
+ <title>Additional Principal Based MA Interfaces</title>
+
+ <para>The following API calls have been added to the MA:</para>
+
+ <para><itemizedlist>
+ <listitem>
+ <para>GenerateNodeConfigurationFile( authentication, node_id
+ )</para>
+
+ <para>Return a configuration file containing node details, including
+ network settings, the node_id, and a key to be used for
+ authenticated node calls.</para>
+ </listitem>
+ </itemizedlist></para>
+ </section>
+
+ <section>
+ <title>Additional Node Based Interfaces and Authentication</title>
+
+ <section>
+ <title>Authentication</title>
+
+ <para>The API calls described below will be run by the nodes themselves,
+ so a new authentication mechanism is required. As is done with other PLC
+ API calls, the first parameter to all BootManager related calls will be
+ an authentication structure, consisting of these named fields:</para>
+
+ <itemizedlist>
+ <listitem>
+ <para>AuthMethod</para>
+
+ <para>The authentication method, only 'hmac' is currently
+ supported</para>
+ </listitem>
+
+ <listitem>
+ <para>node_id</para>
+
+ <para>The node id, contained on the configuration file.</para>
+ </listitem>
+
+ <listitem>
+ <para>node_ip</para>
+
+ <para>The node's primary IP address. This will be checked with the
+ node_id against PLC records.</para>
+ </listitem>
+
+ <listitem>
+ <para>value</para>
+
+ <para>The authentication string, depending on method. For the 'hmac'
+ method, a hash for the call using the HMAC algorithm, made from the
+ parameters of the call the key contained on the configuration file.
+ For specifics on how this is created, see below.</para>
+ </listitem>
+ </itemizedlist>
+
+ <para>Authentication is succesful if PLC is able to create the same hash
+ from the values usings its own copy of the node key. If the hash values
+ to not match, then either the keys do not match or the values of the
+ call were modified in transmision and the node cannot be
+ authenticated.</para>
+
+ <para>Both the BootManager and the authentication software at PLC must
+ agree on a method for creating the hash values for each call. This hash
+ is essentially a finger print of the method call, and is created by this
+ algorithm:</para>
+
+ <orderedlist>
+ <listitem>
+ <para>Take the value of every part of each parameter, except the
+ authentication structure, and convert them to strings. For arrays,
+ each element is used. For dictionaries, not only is the value of all
+ the items used, but the keys themselves. Embedded types (arrays or
+ dictionaries inside arrays or dictionaries, etc), also have all
+ values extracted.</para>
+ </listitem>
+
+ <listitem>
+ <para>Alphabetically sort all the parameters.</para>
+ </listitem>
+
+ <listitem>
+ <para>Concatenate them into a single string.</para>
+ </listitem>
+
+ <listitem>
+ <para>Prepend the string with the method name and [, and append
+ ].</para>
+ </listitem>
+ </orderedlist>
+
+ <para>The implementation of this algorithm is in the function
+ serialize_params in the file source/BootAPI.py. The same algorithm is
+ located in the 'plc_api' repository, in the function serialize_params in
+ the file PLC/Auth.py.</para>
+
+ <para>The resultant string is fed into the HMAC algorithm with the node
+ key, and the resultant hash value is used in the authentication
+ structure.</para>
+
+ <para>This authentication method makes a number of assumptions, detailed
+ below.</para>
+
+ <orderedlist>
+ <listitem>
+ <para>All calls made to PLC are done over SSL, so the details of the
+ authentication structure cannot be viewed by 3rd parties. If, in the
+ future, non-SSL based calls are desired, a sequence number or some
+ other value making each call unique will would be required to
+ prevent replay attacks. In fact, the current use of SSL negates the
+ need to create and send hashes across - technically, the key itself
+ could be sent directly to PLC, assuming the connection is made to an
+ HTTPS server with a third party signed SSL certificate.</para>
+ </listitem>
+
+ <listitem>
+ <para>Athough calls are done over SSL, they use the Python class
+ libary xmlrpclib, which does not do SSL certificate
+ verification.</para>
+ </listitem>
+ </orderedlist>
+ </section>
+
+ <section>
+ <title>Additional API Calls</title>
+
+ <para>The following calls have been added:</para>
+
+ <itemizedlist>
+ <listitem>
+ <para>BootUpdateNode( authentication, update_values )</para>
+
+ <para>Update a node record, including its boot state, primary
+ network, or ssh host key.</para>
+ </listitem>
+
+ <listitem>
+ <para>BootCheckAuthentication( authentication )</para>
+
+ <para>Simply check to see if the node is recognized by the system
+ and is authorized.</para>
+ </listitem>
+
+ <listitem>
+ <para>BootGetNodeDetails( authentication )</para>
+
+ <para>Return details about a node, including its state, what
+ networks the PLC database has configured for the node, and what the
+ model of the node is.</para>
+ </listitem>
+
+ <listitem>
+ <para>BootNotifyOwners( authentication, message, include_pi,
+ include_tech, include_support )</para>
+
+ <para>Notify someone about an event that happened on the machine,
+ and optionally include the site PIs, technical contacts, and
+ PlanetLab Support.</para>
+ </listitem>
+ </itemizedlist>
+ </section>
+ </section>
+
+ <section>
+ <title>Core Package</title>
+
+ <para>The Boot Manager core package, which is run on the nodes and
+ contacts the Boot API as necessary, is responsible for the following major
+ functional units:</para>
+
+ <itemizedlist>
+ <listitem>
+ <para>Configuring node hardware and installing the PlanetLab operating
+ system</para>
+ </listitem>
+
+ <listitem>
+ <para>Putting a node into a debug state so administrators can track
+ down problems</para>
+ </listitem>
+
+ <listitem>
+ <para>Reconfiguring an already installed node to reflect new hardware,
+ or changed network settings</para>
+ </listitem>
+
+ <listitem>
+ <para>Booting an already installed node into the PlanetLab operating
+ system</para>
+ </listitem>
+ </itemizedlist>
+
+ <section>
+ <title>Flow Chart</title>
+
+ <para>Below is a high level flow chart of the boot manager, from the
+ time it is executed to when it exits. This core state machine is located
+ in source/BootManager.py.</para>
+
+ <para><figure>
+ <title>Boot Manager Flow Chart</title>
+
+ <mediaobject>
+ <imageobject>
+ <imagedata align="left" fileref="boot-manager-flowchart.png"
+ scalefit="1" />
+ </imageobject>
+ </mediaobject>
+ </figure></para>
+
+ <para></para>
+ </section>
+
+ <section>
+ <title>Example Session Sequence</title>
+
+ <para><figure>
+ <title>Boot Manager Session Sequence Diagram</title>
+
+ <mediaobject>
+ <imageobject>
+ <imagedata align="left" fileref="bootmanager-sequence.png"
+ scalefit="1" />
+ </imageobject>
+ </mediaobject>
+ </figure></para>
+ </section>
+
+ <section>
+ <title>Boot CD Environment</title>
+
+ <para>The boot manager needs to be able to operate under all currently
+ supported boot cds. The new 3.0 cd contains software the current 2.x cds
+ do not contain, including the Logical Volume Manager (LVM) client tools,
+ RPM, and YUM, among other packages. Given this requirement, the boot cd
+ will need to download as necessary the extra support files it needs to
+ run. Depending on the size of these files, they may only be downloaded
+ by specific steps in the flow chart in figure 1, and thus are not
+ mentioned.</para>
+
+ <para>See the PlanetLab BootCD Documentation for more information about
+ the current, 3.x boot cds, how they are build, and what they provide to
+ the BootManager.</para>
+ </section>
+
+ <section>
+ <title>Node Configuration Files</title>
+
+ <para>To remain compatible with 2.x boot cds, the format and existing
+ contents of the configuration files for the nodes will not change. There
+ will be, however, the addition of three fields:</para>
+
+ <orderedlist>
+ <listitem>
+ <para>NET_DEVICE</para>
+
+ <para>If present, use the device with the specified mac address to
+ contact PLC. The network on this device will be setup. If not
+ present, the device represented by 'eth0' will be used.</para>
+ </listitem>
+
+ <listitem>
+ <para>NODE_KEY</para>
+
+ <para>The unique, per-node key to be used during authentication and
+ identity verification. This is a fixed length, random value that is
+ only known to the node and PLC.</para>
+ </listitem>
+
+ <listitem>
+ <para>NODE_ID</para>
+
+ <para>The PLC assigned node identifier.</para>
+ </listitem>
+ </orderedlist>
+
+ <para>An example of a configuration file for a dhcp networked
+ machine:</para>
+
+ <programlisting>IP_METHOD="dhcp"
+HOST_NAME="planetlab-1"
+DOMAIN_NAME="cs.princeton.edu"
+NET_DEVICE="00:06:5B:EC:33:BB"
+NODE_KEY="79efbe871722771675de604a227db8386bc6ef482a4b74"
+NODE_ID="121"</programlisting>
+
+ <para>An example of a configuration file for the same machine, only with
+ a statically assigned network address:</para>
+
+ <programlisting>IP_METHOD="static"
+IP_ADDRESS="128.112.139.71"
+IP_GATEWAY="128.112.139.65"
+IP_NETMASK="255.255.255.192"
+IP_NETADDR="128.112.139.127"
+IP_BROADCASTADDR="128.112.139.127"
+IP_DNS1="128.112.136.10"
+IP_DNS2="128.112.136.12"
+HOST_NAME="planetlab-1"
+DOMAIN_NAME="cs.princeton.edu"
+NET_DEVICE="00:06:5B:EC:33:BB"
+NODE_KEY="79efbe871722771675de604a227db8386bc6ef482a4b74"
+NODE_ID="121"</programlisting>
+ </section>
+ </section>
+
+ <section>
+ <title>BootManager Configuration</title>
+
+ <para>All run time configuration options for the BootManager exist in a
+ single file located at source/configuration. These values are described
+ below.</para>
+
+ <itemizedlist>
+ <listitem>
+ <para><literal>VERSION</literal></para>
+
+ <para>The current BootManager version. During install, written out to
+ /etc/planetlab/install_version</para>
+ </listitem>
+
+ <listitem>
+ <para><literal>BOOT_API_SERVER</literal></para>
+
+ <para>The full URL of the API server to contact for authenticated
+ operations.</para>
+ </listitem>
+
+ <listitem>
+ <para><literal>TEMP_PATH</literal></para>
+
+ <para>A writable path on the boot cd we can use for temporary storage
+ of files.</para>
+ </listitem>
+
+ <listitem>
+ <para><literal>SYSIMG_PATH</literal></para>
+
+ <para>The path were we will mount the node logical volumes during any
+ step that requires access to the disks.</para>
+ </listitem>
+
+ <listitem>
+ <para>CACERT_PATH</para>
+
+ <para>Variable not used anymore.</para>
+ </listitem>
+
+ <listitem>
+ <para><literal>NONCE_FILE</literal></para>
+
+ <para>Variable not used anymore.</para>
+ </listitem>
+
+ <listitem>
+ <para><literal>PLCONF_DIR</literal></para>
+
+ <para>The path that PlanetLab node configuration files will be created
+ in during install. This should not be changed from /etc/planetlab, as
+ this path is assumed in other PlanetLab components.</para>
+ </listitem>
+
+ <listitem>
+ <para><literal>SUPPORT_FILE_DIR</literal></para>
+
+ <para>A path on the boot server where per-step additional files may be
+ located. For example, the packages that include the tools to allow
+ older 2.x version boot cds to partition disks with LVM.</para>
+ </listitem>
+
+ <listitem>
+ <para><literal>ROOT_SIZE</literal></para>
+
+ <para>During install, this sets the size of the node root partition.
+ It must be large enough to house all the node operational software. It
+ does not store any user/slice files. Include 'G' suffix in this value,
+ indicating gigabytes.</para>
+ </listitem>
+
+ <listitem>
+ <para><literal>SWAP_SIZE</literal></para>
+
+ <para>How much swap to configure the node with during install. Include
+ 'G' suffix in this value, indicating gigabytes.</para>
+ </listitem>
+
+ <listitem>
+ <para><literal>SKIP_HARDWARE_REQUIREMENT_CHECK</literal></para>
+
+ <para>Whether or not to skip any of the hardware requirement checks,
+ including total disk and memory size constraints.</para>
+ </listitem>
+
+ <listitem>
+ <para><literal>MINIMUM_MEMORY</literal></para>
+
+ <para>How much memory is required by a running PlanetLab node. If a
+ machine contains less physical memory than this value, the install
+ will not proceed.</para>
+ </listitem>
+
+ <listitem>
+ <para><literal>MINIMUM_DISK_SIZE</literal></para>
+
+ <para>The size of the small disk we are willing to attempt to use
+ during the install, in gigabytes. Do not include any suffixes.</para>
+ </listitem>
+
+ <listitem>
+ <para><literal>TOTAL_MINIMUM_DISK_SIZE</literal></para>
+
+ <para>The size of all usable disks must be at least this sizse, in
+ gigabytes. Do not include any suffixes.</para>
+ </listitem>
+
+ <listitem>
+ <para><literal>INSTALL_LANGS</literal></para>
+
+ <para>Which language support to install. This value is used by RPM,
+ and is used in writting /etc/rpm/macros before any RPMs are
+ installed.</para>
+ </listitem>
+
+ <listitem>
+ <para><literal>NUM_AUTH_FAILURES_BEFORE_DEBUG</literal></para>
+
+ <para>How many authentication failures the BootManager is willing to
+ except for any set of calls, before stopping and putting the node into
+ a debug mode.</para>
+ </listitem>
+ </itemizedlist>
+ </section>
+
+ <section>
+ <title>Installer Hardware Detection</title>
+
+ <para>When a node is being installed, the Boot Manager must identify which
+ hardware the machine has that is applicable to a running node, and
+ configure the node properly so it can boot properly post-install. The
+ general procedure for doing so is outline in this section. It is
+ implemented in the <filename>source/systeminfo.py</filename> file.</para>
+
+ <para>The process for identifying which kernel module needs to be load
+ is:</para>
+
+ <orderedlist>
+ <listitem>
+ <para>Create a lookup table of all modules, and which PCI ids
+ coorespond to this module.</para>
+ </listitem>
+
+ <listitem>
+ <para>For each PCI device on the system, lookup its module in the
+ first table.</para>
+ </listitem>
+
+ <listitem>
+ <para>If a module is found, put in into one of two categories of
+ modules, either network module or scsi module, based on the PCI device
+ class.</para>
+ </listitem>
+
+ <listitem>
+ <para>For each network module, write out an 'eth<index>' entry
+ in the modprobe.conf configuration file.</para>
+ </listitem>
+
+ <listitem>
+ <para>For each scsi module, write out a
+ 'scsi_hostadapter<index>' entry in the modprobe.conf
+ configuration file.</para>
+ </listitem>
+ </orderedlist>
+
+ <para>This process is fairly straight forward, and is simplified by the
+ fact that we currently do not need support for USB, sound, or video
+ devices when the node is fully running. The boot cd itself uses a similar
+ process, but includes USB devices. Consult the boot cd technical
+ documentation for more information.</para>
+
+ <para>The creation of the PCI id to kernel module table lookup uses three
+ different sources of information, and merges them together into a single
+ table for easier lookups. With these three sources of information, a
+ fairly comprehensive lookup table can be generated for the devices that
+ PlanetLab nodes need to have configured. They include:</para>
+
+ <orderedlist>
+ <listitem>
+ <para>The installed <filename>/usr/share/hwdata/pcitable
+ </filename>file</para>
+
+ <para>Created at the time the hwdata rpm was built, this file contains
+ mappings of PCI ids to devices for a large number of devices. It is
+ not necessarily complete, and doesn't take into account the modules
+ that are actually available by the built PlanetLab kernel, which is a
+ subset of the full set available (again, PlanetLab nodes do not have a
+ use for network or video drivers, and thus are not typically
+ built).</para>
+ </listitem>
+
+ <listitem>
+ <para>From the built kernel, the <filename>modules.pcimap</filename>
+ from the <filename>/lib/modules/<kernelversion>/</filename>
+ directory.</para>
+
+ <para>This file is generated at the time the kernel is installed, and
+ pulls the PCI ids out of each module, for the modules list they
+ devices they support. Not all modules list all devices they sort, and
+ some contain wild cards (that match any device of a single
+ manufacturer).</para>
+ </listitem>
+
+ <listitem>
+ <para>From the built kernel, the <filename>modules.dep</filename> from
+ the <filename>/lib/modules/<kernelversion>/</filename>
+ directory.</para>
+
+ <para>This file is also generated at the time the kernel is installed,
+ but lists the dependencies between various modules. It is used to
+ generate a list of modules that are actually available.</para>
+ </listitem>
+ </orderedlist>
+
+ <para>It should be noted here that SATA (Serial ATA) devices have been
+ known to exist with both a PCI SCSI device class, and with a PCI IDE
+ device class. Under linux 2.6 kernels, all SATA modules need to be listed
+ in modprobe.conf under 'scsi_hostadapter' lines. This case is handled in
+ the hardware loading scripts by making the assumption that if an IDE
+ device matches a loadable module, it should be put in the modprobe.conf
+ file, as 'real' IDE drivers are all currently built into the kernel, and
+ do not need to be loaded. SATA devices that have a PCI SCSI device class
+ are easily identified.</para>
+
+ <para>It is enssential that the modprobe.conf configuration file contain
+ the correct drivers for the disks on the system, if they are present, as
+ during kernel installation the creation of the initrd (initial ramdisk)
+ which is responsible for booting the system uses this file to identify
+ which drivers to include in it. A failure to do this typically results in
+ an kernel panic at boot with a 'no init found' message.</para>
+ </section>
+
+ <section>
+ <title>Common Scenarios</title>
+
+ <para>Below are common scenarios that the BootManager might encounter that
+ would exist outside of the documented procedures for handling nodes. A
+ full description of how they will be handled by the BootManager follows
+ each.</para>
+
+ <itemizedlist>
+ <listitem>
+ <para>A configuration file from previously installed and functioning
+ node is copied or moved to another machine, and the networks settings
+ are updated on it (but the key and node_id is left the same).</para>
+
+ <para>Since the authentication for a node consists of matching not
+ only the node id, but the primary node ip, this step will fail, and
+ the node will not allow the boot manager to be run. Instead, the new
+ node must be created at PLC first, and a network configuration file
+ for it must be generated, with its own node key.</para>
+ </listitem>
+
+ <listitem>
+ <para>After a node is installed and running, the administrators
+ mistakenly remove the cd and media containing the configuration
+ file.</para>
+
+ <para>The node installer clears all boot records from the disk, so the
+ node will not boot. Typically, the bios will report no operating
+ system.</para>
+ </listitem>
+
+ <listitem>
+ <para>A new network configuration file is generated on the website,
+ but is not put on the node.</para>
+
+ <para>Creating a new network configuration file through the PLC
+ interfaces will generate a new node key, effectively invalidating the
+ old configuration file (still in use by the machine). The next time
+ the node reboots and attempts to authentication with PLC, it will
+ fail. After two consecutive authentication failures, the node will
+ automatically put itself into debug mode. In this case, regardless of
+ the API function being called that was unable to authentication, the
+ software at PLC will automatically notify the PlanetLab
+ administrators, and the contacts at the site of the node was able to
+ be identified (usually through its IP address or node_id by searching
+ PLC records.).</para>
+ </listitem>
+ </itemizedlist>
+ </section>
+</article>
\ No newline at end of file