boot-manager.xml

   1 <?xml version="1.0" encoding="UTF-8"?>
   2 <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
   3 "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd">
   4 <article>
   5   <articleinfo>
   6     <title>PlanetLab Boot Manager</title>
   7
   8     <author>
   9       <firstname>Aaron</firstname>
  10
  11       <surname>Klingaman</surname>
  12
  13       <email>alk@cs.princeton.edu</email>
  14     </author>
  15
  16     <affiliation>
  17       <orgname>Princeton University</orgname>
  18     </affiliation>
  19
  20     <abstract>
  21       <para>This document outlines the design and policy decisions of a new
  22       PlanetLab component called the Boot Manager. The Boot Manager
  23       encompasses several systems and all policy regarding how new nodes are
  24       brought into the system, how they are authenticated with PlanetLab
  25       Central (PLC), what authenticated operations they can perform, and what
  26       constitutes a node's identity.</para>
  27     </abstract>
  28
  29     <revhistory>
  30       <revision>
  31         <revnumber>1.0</revnumber>
  32
  33         <date>January 14, 2005</date>
  34
  35         <authorinitials>AK</authorinitials>
  36
  37         <revdescription>
  38           <para>Initial draft.</para>
  39         </revdescription>
  40       </revision>
  41     </revhistory>
  42   </articleinfo>
  43
  44   <section>
  45     <title>Overview</title>
  46
  47     <para>This document describes the history of and groups several previously
  48     separate, undocumented components and policy decisions of the PlanetLab
  49     infrastructure into one logical group, which will be called the
  50     <firstterm>Boot Manager</firstterm>. In addition, specific recommendations
  51     are made for changes and additions to these parts to support new features
  52     and better security outlined in detail later. These include:</para>
  53
  54     <orderedlist>
  55       <listitem>
  56         <para>How new nodes are added to the PlanetLab system, and the chain
  57         of trust that accompanies that addition</para>
  58       </listitem>
  59
  60       <listitem>
  61         <para>How to prevent unauthorized nodes from becoming part of the
  62         system, and the consequences of that happening</para>
  63       </listitem>
  64
  65       <listitem>
  66         <para>How any existing node authenticates itself with PlanetLab
  67         Central (PLC), and what operations can it perform</para>
  68       </listitem>
  69
  70       <listitem>
  71         <para>What constitutes node identity, and, when this identity should
  72         and should not change</para>
  73       </listitem>
  74     </orderedlist>
  75
  76     <para>Not covered by this document are topics including node to node
  77     authentication, or any service or system running after a node is fully
  78     booted and the Boot Manager is no longer applicable.</para>
  79   </section>
  80
  81   <section>
  82     <title>Terminology</title>
  83
  84     <para>Before continuing, terms used through this document, including what
  85     a site is, what nodes are, and what PlanetLab consists of will be defined.
  86     Current organizational structure consists of groups of
  87     <firstterm>sites</firstterm>, usually a geographical location
  88     corresponding one to one with a company or university. These sites have
  89     any number of <firstterm>users</firstterm> or
  90     <firstterm>researchers</firstterm>, including a <firstterm>principle
  91     investigator</firstterm> , or <firstterm>PI</firstterm>, responsible for
  92     the users, and one or more <firstterm>technical contacts</firstterm>.
  93     Sites are usually composed of at least two machines running the PlanetLab
  94     software, usually referred to as <firstterm>nodes</firstterm>. All user
  95     and node management operations are done through a set of servers located
  96     in one physical location which is known as <firstterm>PlanetLab
  97     Central</firstterm>, or <firstterm>PLC</firstterm>.There are also a set of
  98     PlanetLab <firstterm>administrators</firstterm>; not necessarily
  99     affiliated with a particular site. <firstterm>PlanetLab</firstterm> then
 100     collectively refers to all sites and their nodes and users, and PlanetLab
 101     Central.</para>
 102   </section>
 103
 104   <section>
 105     <title>Background</title>
 106
 107     <section>
 108       <title>How Sites Become Part of PlanetLab</title>
 109
 110       <para>A full discussion and evaluation of the process and security
 111       implications of sites becoming part of PlanetLab is outside the scope of
 112       this document. It will be assumed that the process is relatively secure,
 113       and that user and PI accounts at that site are legitimate. However, it
 114       is necessary to provide some basic information about the process.</para>
 115
 116       <para>What does it mean for a site to be part of PlanetLab?
 117       Primarily:</para>
 118
 119       <orderedlist>
 120         <listitem>
 121           <para>The site's record (e.g. name, url, geographical location,
 122           contact information) is in the PLC database</para>
 123         </listitem>
 124
 125         <listitem>
 126           <para>There are a set of users (their email address, password,
 127           personal information) associated with the site in the PLC
 128           database</para>
 129         </listitem>
 130
 131         <listitem>
 132           <para>The ability for those users and PIs to perform some operations
 133           at PLC, and gain direct access to the nodes</para>
 134         </listitem>
 135       </orderedlist>
 136
 137       <para>The process for allowing new sites to become part of PlanetLab has
 138       been continually evolving since the beginning of PlanetLab. Initially,
 139       the first sites were selected and invited, and record of their existence
 140       in PLC was entered in by hand by an administrator. With a site now part
 141       of PlanetLab, users and PIs at those sites could then register for
 142       accounts to perform operations at PLC. Privileged accounts, such as PI
 143       accounts, were enabled by administrators. At the time, this
 144       administrative overhead was not a problem given the relatively limited
 145       number of total sites.</para>
 146
 147       <para>Over time, parts of these operations have been streamlined. Now, a
 148       site can submit all their relevant info on the PLC website, for review
 149       and approval by administrators. They also no longer require an explicit
 150       invitation. With the creation of the PlanetLab Consortium, there is now
 151       an additional paperwork step before a site becomes a member of
 152       PlanetLab.</para>
 153
 154       <para>With the introduction of the additional consortium step, the
 155       process now exists as:</para>
 156
 157       <orderedlist>
 158         <listitem>
 159           <para>A site either requests to join PlanetLab by contacting
 160           administrators over email, or through other external
 161           communication</para>
 162         </listitem>
 163
 164         <listitem>
 165           <para>Necessary consortium paper work is signed by all
 166           parties</para>
 167         </listitem>
 168
 169         <listitem>
 170           <para>PI(s) submit connect (join) requests with remaining site and
 171           personal information</para>
 172         </listitem>
 173
 174         <listitem>
 175           <para>Administrators verify that the PI is who they say they are,
 176           and enable their site and accounts at PLC</para>
 177         </listitem>
 178       </orderedlist>
 179     </section>
 180
 181     <section>
 182       <title>How Nodes Become Part of PlanetLab</title>
 183
 184       <para>After a site has been approved and added to PLC, they are required
 185       to install and make available to other users at least two nodes (as per
 186       current policy).</para>
 187
 188       <para>In the first revisions of the PLC software, nodes were only added
 189       to the system by hand. Usually a PI or technical contact would
 190       communicate the network settings of the node, and it was then added to
 191       PLC by an administrator. This prevented any nodes that weren't part of
 192       PlanetLab to be recognized by PLC. No mechanisms existed to ensure that
 193       the node's network (effectively its identity) was not hijacked by
 194       another machine.</para>
 195
 196       <para>Since the beginning of PlanetLab, there have been little to no
 197       restrictions on what machines the PlanetLab software can run on. This is
 198       primarily due to the fact that all source code is now available, and it
 199       is technically feasible for anyone to bring up a machine that is running
 200       the PlanetLab software, or closely resembles it. What is important,
 201       however, is when these nodes become recognized by PLC, and then
 202       available to the users via PLC. Otherwise, a user would have to go
 203       through non-PLC channels in order to find these nodes. Even then, they
 204       could not use PLC to run their experiments on the nodes, because PLC
 205       does not know about those nodes.</para>
 206
 207       <para>When a node becomes part of PlanetLab, it:</para>
 208
 209       <orderedlist>
 210         <listitem>
 211           <para>Is recognized by PLC as being at the site by its existence in
 212           our database</para>
 213         </listitem>
 214
 215         <listitem>
 216           <para>The existing node boot mechanisms allow the machine to come
 217           online after communicating its identity to PLC</para>
 218         </listitem>
 219
 220         <listitem>
 221           <para>Researchers can use the node for their experiments by using
 222           administrative interfaces at PLC</para>
 223         </listitem>
 224       </orderedlist>
 225
 226       <para>Rather than adding each node by hand, the current system instead
 227       allows for an entire network subnet to be authorized to contain nodes.
 228       When a site joins, a PLC administrator authorizes the subnet the nodes
 229       will be on, and any machines on that network are allowed to become
 230       recognized by PLC automatically. This had immediate advantages,
 231       primarily being one of not requiring overhead for PLC administrators to
 232       add each node by hand as was done in the beginning. Given that a common
 233       interest was to see PlanetLab grow in terms of number of nodes (as one
 234       metric), the assumption was made that allowing any node to come online
 235       on an authorized subnet without explicit approval from an administrator
 236       or PI would benefit everyone.</para>
 237     </section>
 238
 239     <section>
 240       <title>Node Installation</title>
 241
 242       <para>To date, there have been three major revisions of the software
 243       that installs a PlanetLab node. Not only have the mechanisms in which
 244       the nodes get installed changed, but, under what context the
 245       installation is running.</para>
 246
 247       <para>The first revision of the installer was primarily nothing more
 248       than a customized RedHat (version 7.3) boot disk, with a PlanetLab
 249       specific post script to perform final initialization steps. The network
 250       settings, and which packages to install were all stored on the disk, so
 251       a custom disk was generated on demand for each node. Anyone with one of
 252       these disks could install a PlanetLab node.</para>
 253
 254       <para>The second revision of the installer was released in conjunction
 255       the release of the new PlanetLab boot cd. The intention was not
 256       necessarily to have the node packages on the cd (as they would quickly
 257       go out of date), but, to provide a mechanism to allow administrators to
 258       regain control of a machine, in the event that the node was compromised,
 259       or the installed software was corrupted. The nodes were configured to
 260       always start off the cd, and, rather than have a custom cd per node, the
 261       network settings were stored on a floppy disk. Both the floppy disk and
 262       the boot cd were to remain in the machine at all times. The RedHat
 263       installer, Anaconda <citation>1</citation>, that was used prior to the
 264       boot cd was modified to run in the context of this boot cd. This allowed
 265       us a great deal of flexibility, as the cd was built so that all it would
 266       do was:</para>
 267
 268       <orderedlist>
 269         <listitem>
 270           <para>Bring a full Linux system online, running only off the
 271           cd</para>
 272         </listitem>
 273
 274         <listitem>
 275           <para>Load any network and other drivers necessary, based on the
 276           hardware of the node</para>
 277         </listitem>
 278
 279         <listitem>
 280           <para>Configure the network interface with the settings from the
 281           floppy disk</para>
 282         </listitem>
 283
 284         <listitem>
 285           <para>Contact a special PLC boot server, and download and execute a
 286           script.</para>
 287         </listitem>
 288       </orderedlist>
 289
 290       <para>The boot cd uses HTTPS to contact the boot server, and uses a
 291       certification authority (CA) certificate to verify the identity of the
 292       machine at PLC. This way, it can be assured that the installation of a
 293       particular node is correct, in at least that all packages originated
 294       from PLC. The script downloaded by the boot cd for a node depends on the
 295       current state of that node, in the PLC database. The PLC database must
 296       identify the node in order to accomplish that. That is covered below, in
 297       Node Identity.</para>
 298
 299       <para>The third and current version of the installer still runs in the
 300       context of the boot cd, but was a complete rewrite to better handle
 301       packages, and remove much unneeded complexity in the previous
 302       installer.</para>
 303     </section>
 304
 305     <section>
 306       <title>Node Identity</title>
 307
 308       <para>In the first revisions of the PlanetLab software, nodes were
 309       solely identified by their network settings, primarily, the hostname and
 310       the physical address of the network adapter (MAC address). This worked
 311       well then, as this set of information was unique, and allowed for the
 312       direct mapping of node identity to a physical machine. It was stored
 313       this way in the PLC database as well.</para>
 314
 315       <para>As the design of the database progressed, the PlanetLab software
 316       needed to identify nodes not by any one aspect of the physical machine,
 317       but by a more generic identifier (as this identifier needed to be used
 318       internally to refer to other aspects of a node, like which site it is
 319       at) - what has been called a node id. Although better in some respects,
 320       there are still drawbacks. For example, deleting a node entry from the
 321       database and recreating a similar one could result in a new node id,
 322       when nothing on the node itself really has changed. These problems are
 323       primarily due to a lack of policy being documented, and instead, the
 324       implementation details defining the policy.</para>
 325
 326       <para>Currently, when a node requests a script from the boot server as
 327       the last step of the boot cd operation, it sends to PLC the output of
 328       the program 'ifconfig' (among other data), which contains the network
 329       settings the machine was configured with. From the network settings, the
 330       primary MAC address is extracted by PLC and used to check the database
 331       if the node exists. Here, the MAC address is used to look up a
 332       corresponding numeric node id, which is used internally. The MAC address
 333       and the node id are tied - if a new MAC address is used, a new node id
 334       will be generated. If the node does exist, an appropriate script is sent
 335       in response, based on the current node state. Again, this was fine, as
 336       long as a node was identified correctly.</para>
 337     </section>
 338
 339     <section>
 340       <title>Node Authentication</title>
 341
 342       <para>What does a node (or PI, for that matter) have to do to prove that
 343       it is one of the real, or legitimate, PlanetLab nodes? At first, this
 344       was not an issue because the nodes were added to the system by
 345       administrators, and all communication paths led only from PLC to the
 346       nodes. Everything was downloaded from PLC, including information about
 347       what experimenters can use the system, what packages to install for
 348       updates. For this, a node only needed to send enough information in the
 349       request to identify itself with PLC. From the PLC point of view, it did
 350       not matter which node downloaded the packages for a node, so long as the
 351       node was identified correctly and received the packages it was supposed
 352       to. This was acceptable since the node was added to PLC by hand, thus it
 353       was already 'authenticated'. During this period, a number of assumptions
 354       were made:</para>
 355
 356       <orderedlist>
 357         <listitem>
 358           <para>That a rogue node with the same network settings would not be
 359           a problem, as the site technical contacts could prevent or detect
 360           that</para>
 361         </listitem>
 362
 363         <listitem>
 364           <para>The ability to check to ensure a particular node was already
 365           authenticated was not done (aside from assuring that the host's
 366           public ssh key fingerprint did not change from one login to the
 367           next)</para>
 368         </listitem>
 369       </orderedlist>
 370
 371       <para>As more previously manual steps became automated, a number of
 372       situations came up in which a node would need to initiate and perform
 373       some operation at PLC. There is only a small set of these operations,
 374       and are limited to items such as, adding a node to the system (under a
 375       previously authorized subnet), changing the 'boot state' (a record of if
 376       the machine is being installed, or is in a debug mode) of a node, or,
 377       uploading the logs of an installation.</para>
 378
 379       <para>To handle this new node authentication, a 32 byte random nonce
 380       value was generated and sent to PLC during node boot time (at the same
 381       time the network settings are sent). The nonce value in the PLC database
 382       for that particular node is updated if the node is identified correctly,
 383       and is used for authenticating subsequent, node initiated operations.
 384       Then, for example, when a node install finished, a node could request
 385       it's state updated, and all it would need to do would be to resend its
 386       network settings, and the original nonce for authentication. If the
 387       nonce in the database matched what was sent, then the requested
 388       operation was performed.</para>
 389
 390       <para>The problem here is obvious: now, any node that can be identified
 391       is essentially automatically authenticated. For a node to be identified,
 392       it has to be in the database, and, new nodes can be automatically added
 393       on any authorized subnets without intervention of an administrator or
 394       tech contact. With this system, it is trivial to add a rogue node to the
 395       system, even at a different site that was not originally authorized,
 396       because the whole system is based on what a node sends PLC, which is
 397       trivial to spoof.</para>
 398     </section>
 399   </section>
 400
 401   <section>
 402     <title>Recommendations</title>
 403
 404     <section>
 405       <title>How PLC Will Identify Nodes</title>
 406
 407       <para>Before any suggestions on what to change regarding the node
 408       identity policy can me made, the question, what makes a node a node,
 409       should be answered. This primarily depends on who is asking. From an
 410       administrators point of view, a node could be tied to a particular
 411       installation of the software. Reinstall the node, and it becomes a new
 412       node with a new identity. However, from an end user's perspective, the
 413       machine still has the same network address and hostname, and their
 414       software simply was removed. For them, changing the node identity in
 415       this situation does not make any sense, and usually causes them
 416       unnecessary work, as they have to re-add that machine to their
 417       experiment (because, as far as the PLC database is concerned, the node
 418       never existed before then). This question is particularly import for
 419       several reasons:</para>
 420
 421       <orderedlist>
 422         <listitem>
 423           <para>It gives users a way to identify it, in order to use it for
 424           their research</para>
 425         </listitem>
 426
 427         <listitem>
 428           <para>The node identity could be used by other external systems, as
 429           a universal identifier</para>
 430         </listitem>
 431       </orderedlist>
 432
 433       <para>The following recommendation is made for a new node identity
 434       policy. Rather that tie node identity to some attribute of the physical
 435       machine, such as its hardware configuration as is currently, instead,
 436       PLC will assign an arbitrary, unused identity to the node upon its
 437       creation, and that identity will be stored locally at the node (most
 438       likely on an external medium like floppy disk). Then as long as that
 439       identity is still on the node, any hardware or software changes will not
 440       necessarily require a change of the node identity. This will then allow
 441       PLC, if necessary in the future, to change the node identity policy as
 442       needed.</para>
 443
 444       <para>The following policy will apply to this new node identity:</para>
 445
 446       <orderedlist>
 447         <listitem>
 448           <para>In the past, a tech contact was able to change the network
 449           settings on a node automatically by updating the network
 450           configuration floppy. Now, these changes will have to be done at PLC
 451           (with the option of assigning a new node identity). Thus, the node's
 452           network settings (excluding MAC address), are tied to the
 453           identity.</para>
 454         </listitem>
 455
 456         <listitem>
 457           <para>Attempting to move the node identity to another machine will
 458           halt that machine from being used by researchers until the change is
 459           dealt with by either a PLC administrator or a site technical
 460           contact. If approved, the node would reconfigure itself
 461           appropriately.</para>
 462         </listitem>
 463
 464         <listitem>
 465           <para>A node identity cannot be reused after the node has been
 466           deleted from the PLC database.</para>
 467         </listitem>
 468
 469         <listitem>
 470           <para>The node identity will not change across software reinstalls,
 471           changes of the harddisks or network adapters (as long as the network
 472           settings remain), or any other hardware changes.</para>
 473         </listitem>
 474       </orderedlist>
 475
 476       <para>Given the current design of the PLC database, there is still a
 477       need to use, at least internally, a numeric based node identifier. Other
 478       software and APIs available to researchers also use this identifier, so
 479       the question becomes whether or not the above policy can be applied to
 480       it without significantly changing either the PLC software or the
 481       researcher's experiments. Answering this question is beyond the scope of
 482       this document, and is left as implementation decision.</para>
 483     </section>
 484
 485     <section>
 486       <title>Authenticating Node Identity</title>
 487
 488       <para>It is clear that the previous model for authentication will need
 489       to change, which assumes with identity comes authorization, to one where
 490       a node can present its identity, then authenticate it as a separate step
 491       in order to become authorized. During the boot process, a node can still
 492       send sufficient information to identify itself, but, a new system is
 493       required to prove that what it sends in fact does come from the node,
 494       and not someone attempting to impersonate the node. This is especially
 495       important as node identities are made public knowledge.</para>
 496
 497       <para>Authentication in distributed systems is a fairly widely
 498       researched problem, and the goal here is not to build a new mechanism
 499       from scratch, but rather to identify an existing method that can be used
 500       to fulfill our requirements. Our requirements are fairly simple, and
 501       include:</para>
 502
 503       <orderedlist>
 504         <listitem>
 505           <para>The ability to trace the origin of a node added to PlanetLab,
 506           including the party responsible for the addition.</para>
 507         </listitem>
 508
 509         <listitem>
 510           <para>Authenticating requests initiated by nodes to change
 511           information at PLC. These requests involve little actual
 512           communication between the nodes and PLC, and the overhead for
 513           authenticating each request is small given the number and frequency
 514           of them. This also means the need to open an authenticated channel
 515           for multiple requests will not be necessary.</para>
 516         </listitem>
 517       </orderedlist>
 518
 519       <para>Given the public nature of PlanetLab, the need to encrypt data
 520       during these system processes to prevent other parties from seeing it is
 521       not necessary (also, simply hiding the details of the authentication
 522       process is not a valid security model). Assuring the requests are not
 523       modified during transmission is necessary, however. A public/private key
 524       pair system could be used, where each site would be responsible for
 525       generating a private key, and signing their node's identity. PLC could
 526       then have a list of all public keys, and could validate the identities.
 527       However, this is not recommended for several reasons:</para>
 528
 529       <orderedlist>
 530         <listitem>
 531           <para>It places an additional burden on the site to generate and
 532           keep secure these private keys. Having a private key for each node
 533           would be unreasonable, so one key would be used for all nodes at a
 534           particular site.</para>
 535         </listitem>
 536
 537         <listitem>
 538           <para>By using one key for all nodes, it not only increases the cost
 539           of a compromised key (all identities would have to be resigned),
 540           but, use of the key to add unauthorized nodes could not as easily be
 541           detected.</para>
 542         </listitem>
 543
 544         <listitem>
 545           <para>Differences in versions of the software used to generate keys
 546           would have to be handling, increasing the complexity of supporting a
 547           system at PLC</para>
 548         </listitem>
 549       </orderedlist>
 550
 551       <para>To fulfill the above requirements for node identity, the
 552       recommendation is made to use a message authenticate system using hash
 553       functions and shared secrets such as in <citation>2</citation>. In such
 554       a system, the shared secret (or refered to as key, but not in the
 555       public/private key pair sense), is as simple as a fixed size, random
 556       generated number. Of primary importance in such a system is the control
 557       and distribution of the key.</para>
 558
 559       <para>Securing a key at PLC is relatively straight forward. Only a
 560       limited number of administrators have direct access to the PLC database,
 561       so keys can be stored there with relative confidence, provided access to
 562       the PLC machines is secure. Should any of these keys be compromised, all
 563       keys would need to be regenerated and redistributed, so security here is
 564       highly important.</para>
 565
 566       <para>However, securing the secret on the client side, at the node, is
 567       more difficult. The key could be placed on some removable media that
 568       will not be erased, such as a floppy disk or a small usb based disk, but
 569       mechanisms must be in place to prevent the key from being read by anyone
 570       except the boot manager and the boot cd processes, and not by any users
 571       of the machine. In a situation like this, physical security is a
 572       problem. Anyone who could get access to the machine can easily copy that
 573       key and use it elsewhere. One possible solution to such a problem is to
 574       instead make the key a combination of two different values, one stored
 575       on the floppy disk, the other being a value that is only known to the
 576       PI, and must be entered by hand for each message authentication. Then,
 577       in order to compromise the entire key, not only must the attacker have
 578       physical access to the machine, but would have to know the other half of
 579       the key, which would not be recorded anywhere except in the PLC
 580       database. This ultimately cannot work because of the need for human
 581       intervention each time a node needs to be authenticated.</para>
 582
 583       <para>Ultimately, the best solution for the circumstances here is to
 584       leave the entire key on the disk; leave physical security to the
 585       individual sites; and put checks in place to attempt to identify if the
 586       key is being reused elsewhere. As before, the post-boot manager system
 587       (running the real PlanetLab kernel), can be configured to prevent the
 588       floppy disk from being read by any logged in user (local or not).</para>
 589
 590       <para>If the key was identified as being reused elsewhere, appropriate
 591       actions would include deleting the key from the PLC database
 592       (effectively halting any use of it), and notifying the technical
 593       contacts and PIs at the site. If necessary, they could regenerate a new
 594       keys after corrective actions had been taken.</para>
 595     </section>
 596
 597     <section>
 598       <title>Adding New Nodes</title>
 599
 600       <para>It is important to have control over the process for which nodes
 601       are added to the PlanetLab system, and to be able to derive which party
 602       is responsible for that machine at any point in the future. This is
 603       because several different parties come to PLC for the list of nodes, and
 604       PLC needs to provide a list that only includes nodes that have been
 605       authorized. For one, the researchers who are looking to run experiments
 606       need to identify a set of PlanetLab machines. Two, non-PlanetLab related
 607       people who may have traffic related concerns or complaints, and are
 608       trying to track down who is responsible for a node and/or the
 609       researcher's experiment.</para>
 610
 611       <para>It is possible to envision at least several scenarios where having
 612       a non-authorized node in the PLC database would be a problem. One of
 613       which would be a researcher inadvertently using a rogue node (those who
 614       installed it could easily have root access) to run an experiment, and,
 615       that experiment being compromised across all of PlanetLab, or the
 616       results from their research being tampered with. Another could include a
 617       rogue node being used for malicious purposes, such as a spam relay, and
 618       the (initial) blame being directed at PLC, simply because of the
 619       association.</para>
 620
 621       <para>As shown previously, simply authorizing an entire network is
 622       insufficient, as the ability to identify who authorized an individual
 623       node on that subnet is unknown. Having the PlanetLab administrators add
 624       all nodes by hand incorporates too much overhead, given the number of
 625       nodes and the current growth of PlanetLab. This also places the
 626       administrators in a state where they may not have the contact
 627       information for the responsible party. A decent compromise will be to
 628       require either the PIs or technical contacts at each site to enter in
 629       their own nodes using the existing PLC interfaces. Given that one of the
 630       existing steps for bringing a node online involves generating a
 631       floppy-based network configuration file on the PlanetLab website, this
 632       process can be extended to also add record of the nodes with little
 633       additional impact to PIs and tech contacts. At this point, the per-node
 634       shared secret and a node identity necessary for node authentication
 635       would be generated and saved at PLC as well.</para>
 636     </section>
 637
 638     <section>
 639       <title>How To Remove Nodes</title>
 640
 641       <para>There may be the need for an administrator, PI, or technical
 642       contact to remove a node from the system. This can be done simply by
 643       removing the node record from the PLC database, thereby preventing it
 644       from successfully authenticating at boot time. In addition, a node could
 645       be effectively disabled (but not removed), by deleting the private key
 646       for that node from the database. Once restarted, it would not be able to
 647       come back online until a new key is generated.</para>
 648     </section>
 649
 650     <section>
 651       <title>Node Installation</title>
 652
 653       <para>The node installer shall be integrated into the Boot Manager,
 654       rather than continue to be a standalone component. This will allow the
 655       boot manager, when appropriate, to invoke the installer directly.</para>
 656     </section>
 657   </section>
 658
 659   <section>
 660     <title>Conclusion</title>
 661
 662     <para>As outlined above, this new system effectively encapsulates a new
 663     policy for node identity, and a new mechanism for verifying the node
 664     identity and authenticating node-initiated PLC changes. In total, the boot
 665     manager collectively will consist of:</para>
 666
 667     <orderedlist>
 668       <listitem>
 669         <para>A set of interfaces at PLC that are used to perform
 670         authenticated, node-initiated changes.</para>
 671       </listitem>
 672
 673       <listitem>
 674         <para>A set of interfaces at PLC that are used to add new nodes to the
 675         system.</para>
 676       </listitem>
 677
 678       <listitem>
 679         <para>A package downloaded by the boot cd at every boot, which used to
 680         install nodes, update configurations, or boot nodes, using the
 681         interfaces above.</para>
 682       </listitem>
 683
 684       <listitem>
 685         <para>The policy for identifying nodes, and when that identity should
 686         change.</para>
 687       </listitem>
 688     </orderedlist>
 689
 690     <para>Given the above recommendations, the boot strap process and the
 691     chain of trust for adding a new node now exists as detailed below. A site,
 692     a principle investigator, and a tech contact are assumed to be already
 693     present, and authorized.</para>
 694
 695     <orderedlist>
 696       <listitem>
 697         <para>The technical contact downloads a boot cd for the new node.
 698         Since the HTTPS certificate for the public web server is signed by a
 699         trusted third party, the image can be verified by either ensuring it
 700         was downloaded via HTTPS, or by downloading the PlanetLab public key
 701         and verifying a signed copy of the cd, also available on the
 702         website.</para>
 703       </listitem>
 704
 705       <listitem>
 706         <para>The now validated boot cd contains the CA certificate for the
 707         boot server, so any host initiated communication that is using this
 708         certificate on the cd can be sure that the server is in fact the
 709         PlanetLab boot server.</para>
 710       </listitem>
 711
 712       <listitem>
 713         <para>The PI logs into their account on the PlanetLab website, also
 714         over HTTPS and verifying the SSL certificates. Once logged in, they
 715         use a tool to generate a configuration file for the new node, which
 716         includes the network settings and node identity. During this
 717         configuration file generation, record of the nodes existence is
 718         entered into PLC, and a random, shared secret is generated for this
 719         machine. The shared secret is saved in the PLC database, and is also
 720         included in this configuration file.</para>
 721       </listitem>
 722
 723       <listitem>
 724         <para>Both the cd and the new configuration file (on a floppy disk),
 725         are inserted into the machine. The machine is configured such that it
 726         always starts off the cd, and never the floppy disk or the machines
 727         hard disks.</para>
 728       </listitem>
 729
 730       <listitem>
 731         <para>After the boot cd finishes bringing the machine online, loading
 732         all hardware and network settings from the floppy, it contacts the
 733         boot server using HTTPS and the certificate on the cd, and downloads
 734         and executes the boot manager.</para>
 735       </listitem>
 736
 737       <listitem>
 738         <para>The boot manager then contacts PLC to get the current state of
 739         the node it is currently running on.</para>
 740       </listitem>
 741
 742       <listitem>
 743         <para>Based on this state, the boot manager can either continue
 744         booting the node (if already installed), install the machine if
 745         necessary, or take any other action as appropriate. Since this is a
 746         new machine, the installation will be initiated.</para>
 747       </listitem>
 748
 749       <listitem>
 750         <para>After successful installation, the boot manager needs to change
 751         the state of the node such that the next time it starts, it will
 752         instead continue the normal boot process. The boot manager contacts
 753         PLC and requests a change of node state. This request consists of the
 754         node identity, data pertaining to the request itself, and a message
 755         authentication code based on the shared secret from the floppy disk
 756         and the request data.</para>
 757       </listitem>
 758
 759       <listitem>
 760         <para>The boot manager, in order to authenticate the request,
 761         generates its own message authentication code based on the submitted
 762         data and its own copy of the shared secret. If the message
 763         authenticate codes match, then the requested action is performed and
 764         the boot manager notified of success.</para>
 765       </listitem>
 766
 767       <listitem>
 768         <para>If the node is already installed, and no actions are necessary,
 769         the machine is booted. To protect the shared secret on the floppy disk
 770         from users of the machine, the kernel during runtime cannot access the
 771         floppy disk. At this point, control of the system is removed from the
 772         boot manager and run-time software takes control.</para>
 773       </listitem>
 774     </orderedlist>
 775
 776     <para>Any action the boot manager may need to take that requires some
 777     value to be changed in PLC can use the steps outlined in 8 through 10. As
 778     an extra precaution to prevent unauthorized nodes from booting, the
 779     process in step 7 should also use the authentication steps in 8 through
 780     10.</para>
 781
 782     <para>Given that the shared secret on the floppy disk can only be accessed
 783     in the cd environment (when the boot manager is running and the boot cd
 784     kernel provides floppy disk access), any operation that a node can perform
 785     that results in a change in data at PLC must be performed during this
 786     stage. During runtime, a node can still present its identity to PLC to
 787     receive node-specific packages or configuration files, but all interfaces
 788     that provide these packages or files cannot change any record or data at
 789     PLC.</para>
 790   </section>
 791
 792   <bibliography>
 793     <biblioentry>
 794       <abbrev>1</abbrev>
 795
 796       <title><ulink
 797       url="http://rhlinux.redhat.com/anaconda">Anaconda</ulink></title>
 798     </biblioentry>
 799
 800     <biblioentry>
 801       <abbrev>2</abbrev>
 802
 803       <title>Message Authentication using Hash Functions - The HMAC
 804       construction</title>
 805
 806       <authorgroup>
 807         <author>
 808           <firstname>Mihir</firstname>
 809
 810           <surname>Bellare</surname>
 811         </author>
 812
 813         <author>
 814           <firstname>Ran</firstname>
 815
 816           <surname>Canetti</surname>
 817         </author>
 818
 819         <author>
 820           <firstname>Hugo</firstname>
 821
 822           <surname>Krawczyk</surname>
 823         </author>
 824       </authorgroup>
 825
 826       <date>Spring 1996</date>
 827     </biblioentry>
 828   </bibliography>
 829 </article>