"http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd">
<article>
<articleinfo>
- <title>Boot Manager Technical Documentation</title>
+ <title>BootManager Technical Documentation</title>
<author>
<firstname>Aaron</firstname>
<surname>Klingaman</surname>
- <email>alk@cs.princeton.edu</email>
+ <email>alk@absarokasoft.com</email>
</author>
<affiliation>
<revision>
<revnumber>1.0</revnumber>
- <date>March 10, 2005</date>
+ <date>March 15, 2005</date>
<authorinitials>AK</authorinitials>
<para>Initial draft.</para>
</revdescription>
</revision>
+
+ <revision>
+ <revnumber>1.1</revnumber>
+
+ <date>May 31, 2005</date>
+
+ <authorinitials>AK</authorinitials>
+
+ <revdescription>
+ <para>Updated post implementation and deployment.</para>
+ </revdescription>
+ </revision>
+
+ <revision>
+ <revnumber>1.2</revnumber>
+
+ <date>November 16, 2005</date>
+
+ <authorinitials>AK</authorinitials>
+
+ <revdescription>
+ <para>Add section on where source code is, and other updates to make
+ it consistent with implementation.</para>
+ </revdescription>
+ </revision>
+
+ <revision>
+ <revnumber>1.3</revnumber>
+
+ <date>March 17, 2006</date>
+
+ <authorinitials>AK</authorinitials>
+
+ <revdescription>
+ <para>Reworked various wording to fit in correctly with new
+ architecture terminology.</para>
+
+ <para>Updated to match PlanetLab Core Specification.</para>
+ </revdescription>
+ </revision>
</revhistory>
</articleinfo>
+ <section>
+ <title>Overview</title>
+
+ <para>This document describes the implementation of the package called the
+ BootManager at a technical level. The BootManager is used in conjunction
+ with the PlanetLab BootCD to securely boot nodes, including remote
+ installation, debugging, and validation. It is the primary method used by
+ the PlanetLab Central Management Authority (MA) to manage nodes.</para>
+ </section>
+
<section>
<title>Components</title>
- <para>The entire Boot Manager system consists of several components that
- are designed to work together to provide the functionality outline in the
- Boot Manager PlanetLab Design Note. These consist of:</para>
+ <para>The entire BootManager system consists of several primary
+ components. These are:</para>
+
+ <itemizedlist>
+ <listitem>
+ <para>The existing, stardard MA provided calls to allow principals to
+ add and manage node records, and a new call to generate node-specific
+ configuration files</para>
+ </listitem>
+
+ <listitem>
+ <para>New MA API calls with a new authentication mechanism for
+ node-based MA calls</para>
+ </listitem>
+
+ <listitem>
+ <para>A code package to be run in the boot cd environment on nodes
+ containing core install/validate/boot logic</para>
+ </listitem>
+ </itemizedlist>
+
+ <para>The intention with the BootManager system is to send the same script
+ to all nodes (consisting of the core BootManager code), each time the node
+ starts. Then, the BootManager will run and detiremine which operations to
+ perform on the node, based on its state of installation. All state based
+ logic for the node boot, install, debug, and reconfigure operations are
+ contained in one place; there is no boot state specific logic located on
+ the MA servers.</para>
+ </section>
+
+ <section>
+ <title>Soure Code</title>
+
+ <para>All BootManager source code is located in the repository
+ 'bootmanager' on the PlanetLab CVS system. For information on how to
+ access CVS, consult the PlanetLab website. Unless otherwise noted, all
+ file references refer to this repository.</para>
+ </section>
+
+ <section>
+ <title>Management Authority Node Fields</title>
+
+ <para>The following MA database fields are directly applicable to the
+ BootManager operation, and to the node-related API calls (detailed
+ below).</para>
+
+ <section>
+ <title>node_id</title>
+
+ <para>An integer unique identifier for a specific node.</para>
+ </section>
+
+ <section>
+ <title>node_key</title>
+
+ <para>This is a per-node, unique value that forms the basis of the node
+ authentication mechanism detailed below. When a new node record is added
+ to the MA by a principal, it is automatically assigned a new, random
+ key, and distributed out of band to the nodes. This shared secret is
+ then used for node authentication. The contents of node_key are
+ generated using this command:</para>
+
+ <para><programlisting>openssl rand -base64 32</programlisting></para>
+
+ <para>Any = (equals) characters are removed from the string.</para>
+ </section>
+
+ <section>
+ <title>boot_state</title>
+
+ <para>Each node always has one of four possible boot states, stored as a
+ string, refered to as boot_state. These are:</para>
+
+ <orderedlist>
+ <listitem>
+ <para>'inst'</para>
+
+ <para>Install. The boot state cooresponds to a new node that has not
+ yet been installed, but record of it does exist. When the
+ BootManager starts, and the node is in this state, the user is
+ prompted to continue with the installation. The intention here is to
+ prevent a non-PlanetLab machine (like a user's desktop machine) from
+ becoming inadvertantly wiped and installed with the PlanetLab node
+ software. This is the default state for new nodes.</para>
+ </listitem>
+
+ <listitem>
+ <para>'rins'</para>
+
+ <para>Reinstall. In this state, a node will reinstall the node
+ software, erasing anything that might have been on the disk
+ before.</para>
+ </listitem>
+
+ <listitem>
+ <para>'boot'</para>
+
+ <para>Boot to bring a node online. This state cooresponds with nodes
+ that have sucessfully installed, and can be chain booted to the
+ runtime node kernel.</para>
+ </listitem>
+
+ <listitem>
+ <para>'dbg'</para>
+
+ <para>Debug. Regardless of whether or not a machine has been
+ installed, this state sets up a node to be debugged by
+ administrators. In debug mode, no node software is running, and the
+ node can be accessed remotely by administrators.</para>
+ </listitem>
+ </orderedlist>
+ </section>
+ </section>
+
+ <section>
+ <title>Existing Management Authority API Calls</title>
+
+ <para>These calls, take from the PlanetLab Core Specification and extended
+ with additional parameters, are used by principals to maintain the set of
+ nodes managed by a MA. See the Core Specification for more information.
+ The MA may provide an easy to use interface, such as a web interface, that
+ calls these directly.</para>
<para><itemizedlist>
<listitem>
- <para>A set of API calls available at PlanetLab Central</para>
+ <para>AddNode( authentication, node_values )</para>
+
+ <para>Add a new node record. node_values contains hostname, ip
+ address and other network settings, and the new fields: boot_state.
+ The resultant node_id is returned.</para>
</listitem>
<listitem>
- <para>A package to be run in the boot cd environment on nodes</para>
+ <para>UpdateNode( authentication, node_id, update_values )</para>
+
+ <para>Update an existing node record. update_values can include
+ hostname, ipaddress, and the new fields: boot_state.</para>
</listitem>
<listitem>
- <para>A set of API calls and an appropriate user interface allowing
- administrators to create node configuration files</para>
+ <para>DeleteNode( authentication, node_id )</para>
+
+ <para>Delete a node record.</para>
</listitem>
</itemizedlist></para>
</section>
<section>
- <title>API Calls</title>
+ <title>New Management Authority API Calls</title>
+
+ <para>The API calls available as part of the MA API that are intended to
+ be run by principals leverage existing authentication mechanisms. However,
+ the API calls described below that will be run by the nodes themselves
+ need a new authentication mechanism.</para>
<section>
- <title>Authentication</title>
+ <title>Node Authentication</title>
- <para>The Boot PLC API calls handle authentication in a different manner
- that other API calls, which typically require a username and password,
- and are called by users of the system, not nodes.</para>
+ <para>As is done with other MA API calls, the first parameter to all
+ BootManager related calls will be an authentication structure,
+ consisting of these named fields:</para>
- <para>The authentication structure consists of two named fields:</para>
+ <itemizedlist>
+ <listitem>
+ <para>AuthMethod</para>
- <para><itemizedlist>
- <listitem>
- <para>method</para>
+ <para>The authentication method, only 'hmac' is currently
+ supported</para>
+ </listitem>
- <para>The authentication method, only hmac is currently
- supported</para>
- </listitem>
+ <listitem>
+ <para>node_id</para>
- <listitem>
- <para>node_id</para>
+ <para>The node id, contained in the configuration file on the
+ node.</para>
+ </listitem>
- <para>The node id, contained on the configuration file.</para>
- </listitem>
+ <listitem>
+ <para>node_ip</para>
- <listitem>
- <para>value</para>
+ <para>The node's primary IP address. This will be checked with the
+ node_id against MA records.</para>
+ </listitem>
- <para>An hmac hash for the call, made from the parameters of the
- call the key contained on the configuration file.</para>
- </listitem>
- </itemizedlist></para>
+ <listitem>
+ <para>value</para>
+
+ <para>The authentication string, depending on method. For the 'hmac'
+ method, a hash for the call using the HMAC algorithm, made from the
+ parameters of the call and the key contained on the configuration
+ file. For specifics on how this is created, see below.</para>
+ </listitem>
+ </itemizedlist>
+
+ <para>Authentication is succesful if the MA is able to create the same
+ hash from the values usings its own copy of the NODE_KEY. If the hash
+ values to not match, then either the keys do not match or the values of
+ the call were modified in transmision and the node cannot be
+ authenticated.</para>
+
+ <para>Both the BootManager and the authentication functions at the MA
+ must agree on a method for creating the hash values for each call. This
+ hash is essentially a finger print of the method call, and is created by
+ this algorithm:</para>
+
+ <orderedlist>
+ <listitem>
+ <para>Take the value of every part of each parameter, except the
+ authentication structure, and convert them to strings. For arrays,
+ each element is used. For dictionaries, not only is the value of all
+ the items used, but the keys themselves. Embedded types (arrays or
+ dictionaries inside arrays or dictionaries, etc), also have all
+ values extracted.</para>
+ </listitem>
+
+ <listitem>
+ <para>Alphabetically sort all the parameters.</para>
+ </listitem>
+
+ <listitem>
+ <para>Concatenate them into a single string.</para>
+ </listitem>
+
+ <listitem>
+ <para>Prepend the string with the method name and [, and append
+ ].</para>
+ </listitem>
+ </orderedlist>
+
+ <para>The implementation of this algorithm is in the function
+ serialize_params in the file source/BootAPI.py. The same algorithm is
+ located in the 'plc_api' repository, in the function serialize_params in
+ the file PLC/Auth.py.</para>
+
+ <para>The resultant string is fed into the HMAC algorithm with the node
+ key, and the resultant hash value is used in the authentication
+ structure.</para>
+
+ <para>This authentication method makes a number of assumptions, detailed
+ below.</para>
+
+ <orderedlist>
+ <listitem>
+ <para>All calls made to the MA are done over SSL, so the details of
+ the authentication structure cannot be viewed by 3rd parties. If, in
+ the future, non-SSL based calls are desired, a sequence number or
+ some other value making each call unique will would be required to
+ prevent replay attacks. In fact, the current use of SSL negates the
+ need to create and send hashes across - technically, the key itself
+ could be sent directly to the MA, assuming the connection is made to
+ an HTTPS server with a third party signed SSL certificate being
+ verified.</para>
+ </listitem>
+
+ <listitem>
+ <para>Athough calls are done over SSL, they use the Python class
+ libary xmlrpclib, which does not do SSL certificate
+ verification.</para>
+ </listitem>
+ </orderedlist>
</section>
<section>
- <title>PLC API Calls</title>
+ <title>New API Calls</title>
- <para>For full documentation of these functions can be found in the
- PlanetLab API documentation.</para>
+ <para>The calls available to the BootManager, that accept the above
+ authentication, are:</para>
- <para><itemizedlist>
- <listitem>
- <para>BootUpdateNode( authentication, update_values )</para>
+ <itemizedlist>
+ <listitem>
+ <para>BootUpdateNode( authentication, update_values )</para>
- <para>Update a node record, currenly only allowing the boot state
- to change.</para>
- </listitem>
+ <para>Update a node record, including its boot state, primary
+ network, or ssh host key.</para>
+ </listitem>
- <listitem>
- <para>BootCheckAuthentication( authentication )</para>
+ <listitem>
+ <para>BootCheckAuthentication( authentication )</para>
- <para>Simply check to see if the node is recognized by the system
- and is authorized</para>
- </listitem>
+ <para>Simply check to see if the node is recognized by the system
+ and is authorized.</para>
+ </listitem>
- <listitem>
- <para>BootGetNodeDetails( authentication )</para>
+ <listitem>
+ <para>BootGetNodeDetails( authentication )</para>
- <para>Return details about a node, including its state, what
- networks the PLC database has configured for the node.</para>
- </listitem>
+ <para>Return details about a node, including its state, what
+ networks the MA database has configured for the node, and what the
+ model of the node is.</para>
+ </listitem>
- <listitem>
- <para>BootNotifyOwners( authentication, message, include_pi,
- include_tech, include_support )</para>
+ <listitem>
+ <para>BootNotifyOwners( authentication, message, include_pi,
+ include_tech, include_support )</para>
+
+ <para>Notify someone about an event that happened on the machine,
+ and optionally include the site Principal Investigators, technical
+ contacts, and PlanetLab Support.</para>
+ </listitem>
+ </itemizedlist>
- <para>Notify someone about an event that happened on the machine,
- and optionally include the site PIs, technical contacts, and
- PlanetLab Support</para>
+ <para>The new calls used by principals, using existing authentication
+ methods, are:</para>
+
+ <para><itemizedlist>
+ <listitem>
+ <para>GenerateNodeConfigurationFile( authentication, node_id
+ )</para>
+
+ <para>Generate a configuration file to be used by the BootManager
+ and the BootCD to configure the network for the node during boot.
+ This resultant file also contains the node_id and node_key values.
+ A new node_key is generated each time, invalidating old files. The
+ full contents and format of this file is detailed below.</para>
</listitem>
</itemizedlist></para>
</section>
</section>
<section>
- <title>Core Package</title>
+ <title>Core Software Package</title>
+
+ <para>The BootManager core package, which is run on the nodes and contacts
+ the MA API as necessary, is responsible for the following major functional
+ units:</para>
+
+ <itemizedlist>
+ <listitem>
+ <para>Configuring node hardware and installing the PlanetLab operating
+ system</para>
+ </listitem>
+
+ <listitem>
+ <para>Putting a node into a debug state so administrators can track
+ down problems</para>
+ </listitem>
+
+ <listitem>
+ <para>Reconfiguring an already installed node to reflect new hardware,
+ or changed network settings</para>
+ </listitem>
+
+ <listitem>
+ <para>Booting an already installed node into the PlanetLab operating
+ system</para>
+ </listitem>
+ </itemizedlist>
- <para>The Boot Manager core package, which is run on the nodes and
- contacts the Boot API as necessary, is responsible for the follow major
- functional units:<itemizedlist>
- <listitem>
- <para>Installing nodes with alpina, the PlanetLab installer</para>
- </listitem>
+ <section>
+ <title>BootManager Flow Chart</title>
+
+ <para>Below is a high level flow chart of the BootManager, from the time
+ it is executed to when it exits. This core state machine is located in
+ source/BootManager.py.</para>
+
+ <para><figure>
+ <title>BootManager Flow Chart</title>
+
+ <mediaobject>
+ <imageobject>
+ <imagedata align="center" fileref="bm_flowchart.svg"
+ scalefit="1" width="6in"/>
+ </imageobject>
+ </mediaobject>
+ </figure></para>
+
+ <para></para>
+ </section>
+
+ <section>
+ <title>Example Execution Session</title>
+
+ <para>Below is one example session of the BootManager, for a new node
+ being installed then booted.</para>
+
+ <para><figure>
+ <title>Example Execution Session</title>
+
+ <mediaobject>
+ <imageobject>
+ <imagedata align="center" fileref="bm_sequence.svg"
+ scalefit="1" width="6in"/>
+ </imageobject>
+ </mediaobject>
+ </figure></para>
+ </section>
+
+ <section>
+ <title>Boot CD Environment</title>
+
+ <para>The BootManager needs to be able to operate under all currently
+ supported boot cds. The new 3.0 cd contains software the current 2.x cds
+ do not contain, including the Logical Volume Manager (LVM) client tools,
+ RPM, and YUM, among other packages. Given this requirement, the boot cd
+ will need to download as necessary the extra support files it needs to
+ run. Depending on the size of these files, they may only be downloaded
+ by specific steps in the flow chart in figure 1, and thus are not
+ mentioned.</para>
+
+ <para>See the PlanetLab BootCD Documentation for more information about
+ the current, 3.x boot cds, how they are build, and what they provide to
+ the BootManager.</para>
+ </section>
+
+ <section>
+ <title>Node Configuration Files</title>
+ <para>To remain compatible with 2.x boot cds, the format and existing
+ contents of the configuration files for the nodes will not change. There
+ will be, however, the addition of three fields:</para>
+
+ <orderedlist>
<listitem>
- <para>Putting a node into a debug state</para>
+ <para>NET_DEVICE</para>
+
+ <para>If present, use the device with the specified mac address to
+ contact the MA. The network on this device will be setup. If not
+ present, the device represented by 'eth0' will be used.</para>
</listitem>
<listitem>
- <para>Reconfiguring an already installed node to reflect new
- hardware, or changed network settings</para>
+ <para>NODE_KEY</para>
+
+ <para>The unique, per-node key to be used during authentication and
+ identity verification. This is a fixed length, random value that is
+ only known to the node and the MA database.</para>
</listitem>
<listitem>
- <para>Booting an already installed node</para>
+ <para>NODE_ID</para>
+
+ <para>The MA assigned node identifier.</para>
</listitem>
- </itemizedlist></para>
+ </orderedlist>
- <para>Below is a high level flow chart of the boot manager, from the time
- it is executed to when it exits.</para>
+ <para>An example of a configuration file for a dhcp networked
+ machine:</para>
- <para><figure>
- <title>Boot Manager Flow Chart</title>
+ <programlisting>IP_METHOD="dhcp"
+HOST_NAME="planetlab-1"
+DOMAIN_NAME="cs.princeton.edu"
+NET_DEVICE="00:06:5B:EC:33:BB"
+NODE_KEY="79efbe871722771675de604a227db8386bc6ef482a4b74"
+NODE_ID="121"</programlisting>
- <mediaobject>
- <imageobject>
- <imagedata align="center" fileref="boot-manager-flow.png"
- scalefit="1" />
- </imageobject>
- </mediaobject>
- </figure></para>
- </section>
+ <para>An example of a configuration file for the same machine, only with
+ a statically assigned network address:</para>
- <section>
- <title>User Interface Items</title>
+ <programlisting>IP_METHOD="static"
+IP_ADDRESS="128.112.139.71"
+IP_GATEWAY="128.112.139.65"
+IP_NETMASK="255.255.255.192"
+IP_NETADDR="128.112.139.127"
+IP_BROADCASTADDR="128.112.139.127"
+IP_DNS1="128.112.136.10"
+IP_DNS2="128.112.136.12"
+HOST_NAME="planetlab-1"
+DOMAIN_NAME="cs.princeton.edu"
+NET_DEVICE="00:06:5B:EC:33:BB"
+NODE_KEY="79efbe871722771675de604a227db8386bc6ef482a4b74"
+NODE_ID="121"</programlisting>
- <para>Nodes are now added to the system by administrators of the site, and
- technical contacts.</para>
- </section>
+ <para>Existing 2.x boot cds will look for the configuration files only
+ on a floppy disk, and the file must be named 'planet.cnf'. The new 3.x
+ boot cds, however, will initially look for a file named 'plnode.txt' on
+ either a floppy disk, or burned onto the cd itself. Alternatively, it
+ will fall back to looking for the original file name, 'planet.cnf'. This
+ initial file reading is performed by the boot cd itself to bring the
+ nodes network online, so it can download and execute the
+ BootManager.</para>
- <section>
- <title>Node Management</title>
+ <para>However, the BootManager will also need to identify the location
+ of and read in the file, so it can get the extra fields not initially
+ used to bring the network online (primarily node_key and node_id). Below
+ is the search order that the BootManager will use to locate a
+ file.</para>
- <section>
- <title>Adding Nodes</title>
-
- <para>New nodes are added to the system explicitly by either a PI or a
- tech contact, either directly through the API calls, or by using the
- appropriate interfaces on the website. As nodes are added, only their
- hostname and ip address are required to be entered. When the node is
- brought online, the records at PLC will be updated with the remaining
- information.</para>
-
- <para>After a node is added, the user has the option of creating a
- configuration file for that node. This is done automatically, and the
- user is prompted to download and save the file. This file contains only
- the primary network interface information (necessary to contact PLC),
- and the per-node key.</para>
-
- <para>The default boot state of a new node is 'new', which requires the
- user to confirm the installation at the node, by typing yes on the
- console. If this is not desired, as is the case with nodes in a
- colocation site, or for a large number of nodes being setup at the same
- time, the administrator can change the node state, after the entry is in
- the PLC records, from 'new' to 'reinstall'. This will bypass the
- confirmation screen, and proceed directly to reinstall the machine (even
- if it already had a node installation on it).</para>
+ <para>Configuration file location search order:<informaltable>
+ <tgroup cols="6">
+ <tbody>
+ <row>
+ <entry>File name</entry>
+
+ <entry>Floppy drive</entry>
+
+ <entry>Flash devices</entry>
+
+ <entry>Root file system, in /</entry>
+
+ <entry>CDRom, in /usr/boot</entry>
+
+ <entry>CDRom, in /usr</entry>
+ </row>
+
+ <row>
+ <entry>plode.txt</entry>
+
+ <entry>1</entry>
+
+ <entry>2</entry>
+
+ <entry>4</entry>
+
+ <entry>5</entry>
+
+ <entry>6</entry>
+ </row>
+
+ <row>
+ <entry>planet.cnf</entry>
+
+ <entry>3</entry>
+
+ <entry></entry>
+
+ <entry></entry>
+
+ <entry></entry>
+
+ <entry></entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </informaltable></para>
</section>
<section>
- <title>Updating Node Network Settings</title>
+ <title>BootManager Configuration</title>
- <para>If the primary node network address must be updated, if the node
- is moved to a new network for example, then two steps must be performed
- to successfully complete the move:</para>
+ <para>All run time configuration options for the BootManager exist in a
+ single file located at source/configuration. These values are described
+ below. These values cannot be changed on the fly - they must be changed
+ and a new BootManager package built and signed.</para>
- <para><orderedlist>
- <listitem>
- <para>The node network will need to be updated at PLC, either
- through the API directly or via the website.</para>
- </listitem>
+ <itemizedlist>
+ <listitem>
+ <para><literal>VERSION</literal></para>
- <listitem>
- <para>Either the floppy file regenerated and put into the machine,
- or, update the existing floppy to match the new settings.</para>
- </listitem>
- </orderedlist>If the node ip address on the floppy does not mach the
- record at PLC, then the node will not boot until they do match. The
- intention here is to prevent a malicious user from taking the floppy
- disk, altering the network settings, and trying to bring up a new
- machine with the new settings.</para>
-
- <para>On the other hand, if a non-primary network address needs to be
- updated, then simply updating the records at PLC will suffice. The boot
- manager, at next restart, will reconfigure the machine to match the PLC
- records.</para>
+ <para>The current BootManager version. During install, written out
+ to /etc/planetlab/install_version</para>
+ </listitem>
+
+ <listitem>
+ <para><literal>BOOT_API_SERVER</literal></para>
+
+ <para>The full URL of the API server to contact for authenticated
+ operations.</para>
+ </listitem>
+
+ <listitem>
+ <para><literal>TEMP_PATH</literal></para>
+
+ <para>A writable path on the boot cd we can use for temporary
+ storage of files.</para>
+ </listitem>
+
+ <listitem>
+ <para><literal>SYSIMG_PATH</literal></para>
+
+ <para>The path were we will mount the node logical volumes during
+ any step that requires access to the disks.</para>
+ </listitem>
+
+ <listitem>
+ <para>CACERT_PATH</para>
+
+ <para>Variable not used anymore.</para>
+ </listitem>
+
+ <listitem>
+ <para><literal>NONCE_FILE</literal></para>
+
+ <para>Variable not used anymore.</para>
+ </listitem>
+
+ <listitem>
+ <para><literal>PLCONF_DIR</literal></para>
+
+ <para>The path that PlanetLab node configuration files will be
+ created in during install. This should not be changed from
+ /etc/planetlab, as this path is assumed in other PlanetLab
+ components.</para>
+ </listitem>
+
+ <listitem>
+ <para><literal>SUPPORT_FILE_DIR</literal></para>
+
+ <para>A path on the boot server where per-step additional files may
+ be located. For example, the packages that include the tools to
+ allow older 2.x version boot cds to partition disks with LVM.</para>
+ </listitem>
+
+ <listitem>
+ <para><literal>ROOT_SIZE</literal></para>
+
+ <para>During install, this sets the size of the node root partition.
+ It must be large enough to house all the node operational software.
+ It does not store any user/slice files. Include 'G' suffix in this
+ value, indicating gigabytes.</para>
+ </listitem>
+
+ <listitem>
+ <para><literal>SWAP_SIZE</literal></para>
+
+ <para>How much swap to configure the node with during install.
+ Include 'G' suffix in this value, indicating gigabytes.</para>
+ </listitem>
+
+ <listitem>
+ <para><literal>SKIP_HARDWARE_REQUIREMENT_CHECK</literal></para>
+
+ <para>Whether or not to skip any of the hardware requirement checks,
+ including total disk and memory size constraints.</para>
+ </listitem>
+
+ <listitem>
+ <para><literal>MINIMUM_MEMORY</literal></para>
+
+ <para>How much memory is required by a running PlanetLab node. If a
+ machine contains less physical memory than this value, the install
+ will not proceed.</para>
+ </listitem>
+
+ <listitem>
+ <para><literal>MINIMUM_DISK_SIZE</literal></para>
+
+ <para>The size of the small disk we are willing to attempt to use
+ during the install, in gigabytes. Do not include any
+ suffixes.</para>
+ </listitem>
+
+ <listitem>
+ <para><literal>TOTAL_MINIMUM_DISK_SIZE</literal></para>
+
+ <para>The size of all usable disks must be at least this sizse, in
+ gigabytes. Do not include any suffixes.</para>
+ </listitem>
+
+ <listitem>
+ <para><literal>INSTALL_LANGS</literal></para>
+
+ <para>Which language support to install. This value is used by RPM,
+ and is used in writting /etc/rpm/macros before any RPMs are
+ installed.</para>
+ </listitem>
+
+ <listitem>
+ <para><literal>NUM_AUTH_FAILURES_BEFORE_DEBUG</literal></para>
+
+ <para>How many authentication failures the BootManager is willing to
+ except for any set of calls, before stopping and putting the node
+ into a debug mode.</para>
+ </listitem>
+ </itemizedlist>
</section>
<section>
- <title>Removing Nodes</title>
+ <title>Installer Hardware Detection</title>
- <para>Nodes are removed from the system by:</para>
+ <para>When a node is being installed, the BootManager must identify
+ which hardware the machine has that is applicable to a running node, and
+ configure the node properly so it can boot properly post-install. The
+ general procedure for doing so is outline in this section. It is
+ implemented in the <filename>source/systeminfo.py</filename>
+ file.</para>
- <para><orderedlist>
- <listitem>
- <para>Deleting the record of the node at PLC</para>
- </listitem>
+ <para>The process for identifying which kernel module needs to be load
+ is:</para>
- <listitem>
- <para>Shutting down the machine.</para>
- </listitem>
- </orderedlist>Once this is done, even if the machine attempts to come
- back online, it cannot be authorized with PLC and will not boot.</para>
+ <orderedlist>
+ <listitem>
+ <para>Create a lookup table of all modules, and which PCI ids
+ coorespond to this module.</para>
+ </listitem>
+
+ <listitem>
+ <para>For each PCI device on the system, lookup its module in the
+ first table.</para>
+ </listitem>
+
+ <listitem>
+ <para>If a module is found, put in into one of two categories of
+ modules, either network module or scsi module, based on the PCI
+ device class.</para>
+ </listitem>
+
+ <listitem>
+ <para>For each network module, write out an 'eth<index>' entry
+ in the modprobe.conf configuration file.</para>
+ </listitem>
+
+ <listitem>
+ <para>For each scsi module, write out a
+ 'scsi_hostadapter<index>' entry in the modprobe.conf
+ configuration file.</para>
+ </listitem>
+ </orderedlist>
+
+ <para>This process is fairly straight forward, and is simplified by the
+ fact that we currently do not need support for USB, sound, or video
+ devices when the node is fully running. The boot cd itself uses a
+ similar process, but includes USB devices. Consult the boot cd technical
+ documentation for more information.</para>
+
+ <para>The creation of the PCI id to kernel module table lookup uses
+ three different sources of information, and merges them together into a
+ single table for easier lookups. With these three sources of
+ information, a fairly comprehensive lookup table can be generated for
+ the devices that PlanetLab nodes need to have configured. They
+ include:</para>
+
+ <orderedlist>
+ <listitem>
+ <para>The installed <filename>/usr/share/hwdata/pcitable
+ </filename>file</para>
+
+ <para>Created at the time the hwdata rpm was built, this file
+ contains mappings of PCI ids to devices for a large number of
+ devices. It is not necessarily complete, and doesn't take into
+ account the modules that are actually available by the built
+ PlanetLab kernel, which is a subset of the full set available
+ (again, PlanetLab nodes do not have a use for network or video
+ drivers, and thus are not typically built).</para>
+ </listitem>
+
+ <listitem>
+ <para>From the built kernel, the <filename>modules.pcimap</filename>
+ from the <filename>/lib/modules/<kernelversion>/</filename>
+ directory.</para>
+
+ <para>This file is generated at the time the kernel is installed,
+ and pulls the PCI ids out of each module, for the modules list they
+ devices they support. Not all modules list all devices they sort,
+ and some contain wild cards (that match any device of a single
+ manufacturer).</para>
+ </listitem>
+
+ <listitem>
+ <para>From the built kernel, the <filename>modules.dep</filename>
+ from the <filename>/lib/modules/<kernelversion>/</filename>
+ directory.</para>
+
+ <para>This file is also generated at the time the kernel is
+ installed, but lists the dependencies between various modules. It is
+ used to generate a list of modules that are actually
+ available.</para>
+ </listitem>
+ </orderedlist>
+
+ <para>It should be noted here that SATA (Serial ATA) devices have been
+ known to exist with both a PCI SCSI device class, and with a PCI IDE
+ device class. Under linux 2.6 kernels, all SATA modules need to be
+ listed in modprobe.conf under 'scsi_hostadapter' lines. This case is
+ handled in the hardware loading scripts by making the assumption that if
+ an IDE device matches a loadable module, it should be put in the
+ modprobe.conf file, as 'real' IDE drivers are all currently built into
+ the kernel, and do not need to be loaded. SATA devices that have a PCI
+ SCSI device class are easily identified.</para>
+
+ <para>It is enssential that the modprobe.conf configuration file contain
+ the correct drivers for the disks on the system, if they are present, as
+ during kernel installation the creation of the initrd (initial ramdisk)
+ which is responsible for booting the system uses this file to identify
+ which drivers to include in it. A failure to do this typically results
+ in an kernel panic at boot with a 'no init found' message.</para>
</section>
</section>
<section>
- <title></title>
-
- <para></para>
+ <title>Backward Compatibility</title>
+
+ <para>This section only applies to those interested in sections of the
+ BootManager that exist for backward compatibility with nodes not
+ containing the NODE_KEY. This does not affect any nodes added to the
+ system after deployment of the BootManager.</para>
+
+ <para>Given the large number of nodes in PlanetLab, and the lack of direct
+ physical access to them, the process of updating all configuration files
+ to include the new NODE_ID and NODE_KEY will take a fairly significant
+ amount of time. Rather than delay deployment of the BootManager until all
+ machines are updated, alternative methods for aquiring these values is
+ used for these nodes.</para>
+
+ <para>First, the NODE_ID value. For any machine already part of PlanetLab,
+ there exists a record of its IP address and MAC address in PlanetLab
+ central. To get the NODE_ID value, if it is not located in the
+ configuration file, the BootManager uses a standard HTTP POST request to a
+ known php page on the boot server, sending the IP and MAC address of the
+ node. This php page queries the MA database (by using a PHP page, not
+ through the MA API), and returns a NODE_ID if the node is part of
+ PlanetLab, -1 otherwise.</para>
+
+ <para>Second, the NODE_KEY value. All Boot CDs currently in use, at the
+ time they request a script from the MA to run, send in the request a
+ randomly generated value called a boot_nonce, usually 32 bytes or larger.
+ During normal BootManager operation, this value is ignored. However, in
+ the absense of a node key, we can use this value. Although it is not as
+ secure as a typical node key (because it is not distributed through
+ external mechanisms, but is generated by the node itself), it can be used
+ if we validate that the IP address of the node making the request matches
+ the MA record. This means that nodes behind firewalls can no longer be
+ allowed in this situation.</para>
</section>
</article>
\ No newline at end of file