Stephen Soltesz [Tue, 11 Dec 2007 22:46:14 +0000 (22:46 +0000)]
add a global version of getListFromFile()
Stephen Soltesz [Tue, 11 Dec 2007 22:45:50 +0000 (22:45 +0000)]
Cache more stuff from plc in local files.
Stephen Soltesz [Tue, 11 Dec 2007 22:45:09 +0000 (22:45 +0000)]
record a node's boot_state according to PLC's db.
Stephen Soltesz [Tue, 11 Dec 2007 22:44:32 +0000 (22:44 +0000)]
Added support for sending Ctrl-C to some of the BayTechs, with the help of pexpect.py
Stephen Soltesz [Tue, 11 Dec 2007 22:43:55 +0000 (22:43 +0000)]
This is a better module for dealing with SSH logins using 'expect' like
processing. This should replace pyssh eventually.
Stephen Soltesz [Tue, 11 Dec 2007 22:40:46 +0000 (22:40 +0000)]
Disabled the print when loading a pkl file
Stephen Soltesz [Tue, 11 Dec 2007 22:40:16 +0000 (22:40 +0000)]
made the 'get' function global to allow calls from other modules.
Stephen Soltesz [Tue, 11 Dec 2007 22:39:15 +0000 (22:39 +0000)]
added getPersons() wrapper
Stephen Soltesz [Tue, 11 Dec 2007 22:38:51 +0000 (22:38 +0000)]
Stephen Soltesz [Wed, 28 Nov 2007 22:24:48 +0000 (22:24 +0000)]
Added a fix for HPiLO that got lost some how.
Stephen Soltesz [Wed, 28 Nov 2007 18:40:43 +0000 (18:40 +0000)]
Added two more APC models for brazil and berlin.
Stephen Soltesz [Tue, 27 Nov 2007 20:22:46 +0000 (20:22 +0000)]
Take out code to show passwords
Stephen Soltesz [Tue, 27 Nov 2007 20:22:22 +0000 (20:22 +0000)]
typo. Missed a shell=True arg to Popen.
Stephen Soltesz [Tue, 27 Nov 2007 18:33:23 +0000 (18:33 +0000)]
steps to make racadm workable.
Stephen Soltesz [Tue, 27 Nov 2007 18:30:46 +0000 (18:30 +0000)]
Better code for PCU types. Class based with specific exceptions for different
error conditions. Support for HPiLO, DRAC via racadm, and special cases for a
variety of weird configurations.
Stephen Soltesz [Tue, 27 Nov 2007 18:29:50 +0000 (18:29 +0000)]
Updated findbadpcu.py with changes made in reboot.py. Simpler interface and
return values.
Stephen Soltesz [Mon, 26 Nov 2007 23:51:09 +0000 (23:51 +0000)]
updated readme for DELL RAC3/4
Stephen Soltesz [Mon, 12 Nov 2007 21:21:05 +0000 (21:21 +0000)]
Adding subdirectories for remote commands to control ILO and DRAC cards over
HTTPS. The iloxml should probably be a subdirectory of cmdhttps...
Stephen Soltesz [Wed, 7 Nov 2007 21:22:38 +0000 (21:22 +0000)]
Policy.py includes updates to better handle PCUs
emailTxt includes new messages related to PCUs
Stephen Soltesz [Wed, 7 Nov 2007 21:21:28 +0000 (21:21 +0000)]
Added 'FORCED' to handle some special actions
Stephen Soltesz [Wed, 7 Nov 2007 20:33:44 +0000 (20:33 +0000)]
Add a retry to the apc_reboot() for which there are different models.
Stephen Soltesz [Wed, 7 Nov 2007 19:52:12 +0000 (19:52 +0000)]
Added new squence for apc_reboot()
Stephen Soltesz [Wed, 7 Nov 2007 18:22:05 +0000 (18:22 +0000)]
added some new cr
Stephen Soltesz [Wed, 7 Nov 2007 18:06:17 +0000 (18:06 +0000)]
trying to get ipal_reboot() to funciton properly for cambridge nodes.
Stephen Soltesz [Mon, 5 Nov 2007 22:33:08 +0000 (22:33 +0000)]
Ignore empty 'portstatus' dicts. This just means the ports are down.
Stephen Soltesz [Mon, 5 Nov 2007 22:32:19 +0000 (22:32 +0000)]
Allow queries using sitefilter regular expressions, rather than a single
loginbase. Allows displaying common sites like 'cernet*'.
Stephen Soltesz [Mon, 5 Nov 2007 22:30:35 +0000 (22:30 +0000)]
Take PCUs into account. Need to test.
Stephen Soltesz [Mon, 5 Nov 2007 22:29:50 +0000 (22:29 +0000)]
removes function definitions consolidated in reboot.py
Stephen Soltesz [Mon, 5 Nov 2007 22:29:28 +0000 (22:29 +0000)]
New message for PCU errors. Referrs to the pl-virtual-03 pcu status page
Stephen Soltesz [Mon, 5 Nov 2007 22:28:53 +0000 (22:28 +0000)]
added several utility functions for rebooting nodes from Monitor's diagnose and
action scripts.
Stephen Soltesz [Mon, 5 Nov 2007 19:17:54 +0000 (19:17 +0000)]
minor changes to reflect the new Dupal-Book format for the Tech Guide
Stephen Soltesz [Mon, 5 Nov 2007 17:16:28 +0000 (17:16 +0000)]
collects all nodes associated with a list of loginbase patterns
Stephen Soltesz [Fri, 2 Nov 2007 21:51:59 +0000 (21:51 +0000)]
Minor description of the dependencies that Monitor has for connecting to:
* RT
* MySQL
* and local database output formats.
Stephen Soltesz [Fri, 2 Nov 2007 21:48:37 +0000 (21:48 +0000)]
Run the findbad* commands and copy the files to the appropriate locations.
Stephen Soltesz [Fri, 2 Nov 2007 21:40:02 +0000 (21:40 +0000)]
corrected a bug in reporting nmreset errors.
Stephen Soltesz [Fri, 2 Nov 2007 21:17:48 +0000 (21:17 +0000)]
Changes necessary for the new operating environment. rt_tickets, returns empty string to signify error, and diagnose.py exits on failure. Can't return None, since this is seen specially by the pickle class. That's a bug.
Stephen Soltesz [Fri, 2 Nov 2007 21:15:48 +0000 (21:15 +0000)]
changed the import statement to the correct file
Stephen Soltesz [Fri, 2 Nov 2007 21:14:59 +0000 (21:14 +0000)]
Moved an import statment into 'main()' to avoid cmdline errors for imports by other modules
Stephen Soltesz [Fri, 2 Nov 2007 18:18:59 +0000 (18:18 +0000)]
Includes some checks for NM consistency via the 'last_updated' field in PLCdb.
Stephen Soltesz [Fri, 2 Nov 2007 18:17:52 +0000 (18:17 +0000)]
name tweak
Stephen Soltesz [Fri, 2 Nov 2007 18:17:25 +0000 (18:17 +0000)]
changed url generated for 'PCU's to refer to pl-virtual-03 rather than
my local machine.
Stephen Soltesz [Fri, 2 Nov 2007 18:16:04 +0000 (18:16 +0000)]
bounce into pl-virtual-03
Stephen Soltesz [Fri, 2 Nov 2007 18:11:55 +0000 (18:11 +0000)]
syncplcdb gets info from the PLC db necessary for site, node , and pcu
associations.
findbadpcu.py should output in native python pickle format, and be converted
later using pkl2php.py. This will facilitate my using the input for diagnose
and action.py
Stephen Soltesz [Fri, 2 Nov 2007 18:07:11 +0000 (18:07 +0000)]
add additional options
Stephen Soltesz [Fri, 2 Nov 2007 16:34:28 +0000 (16:34 +0000)]
pkl2php is a script that reads in a python pickle file and spits out the
equivalent as a php serialize file, for data sharing between python and php.
Stephen Soltesz [Fri, 2 Nov 2007 15:10:25 +0000 (15:10 +0000)]
www interface and support libraries for some of monitor's data. Specifically:
- bad nodes
- bad pcus
- and actions taken.
Stephen Soltesz [Wed, 24 Oct 2007 18:10:54 +0000 (18:10 +0000)]
findbadpcu.py : adding files to svn
automate.sh : local automation script
Stephen Soltesz [Tue, 16 Oct 2007 18:24:44 +0000 (18:24 +0000)]
Add support for serializing to PHPSerialize format. Helps exchange info
between python and php scripts.
Stephen Soltesz [Tue, 16 Oct 2007 18:23:34 +0000 (18:23 +0000)]
Reboot.py:
I've added additional functions to better handle baytech, APC and HP ILO
pcus. Also, improved the error reporting. Additional error handling needs
to be added, to get better diagnostic messages for configuration errors.
pyssh/__init__.py:
added some necessary arguments to the 'ssh' command for password
prompting.
Stephen Soltesz [Wed, 29 Aug 2007 17:26:50 +0000 (17:26 +0000)]
+ diagnose.py: added --refresh option so that cached values can be refresh, and either
preserved or not for future runs. Previously it was necessary to remove the
cached values manually.
+ emailTxt.py: tried to clarify what was needed for the bootcd and plnode.txt
file. I think some confusion is coming up based on the all-in-one bootcd.
+ findbad.py: lock calls to the plcAPI, to avoid hammering it. Also, be more
selective about the return values requested from Nodes and Sites. I was
getting everything.
+ mailer.py: extra debug messages.
+ monitor.py: this file is depricated. modification are incidental and not
important.
+ plc.py: add a filter argument to getSites and getNodes to allow specific
fields, rather than everything.
+ policy.py: lots of little fixes. moved more logic into Diagnose() from
Action(). Still need to fix Diagnose to act on sites when nodes are
up/improved.
+ soltesz.py: added refresh function, and return value for timed-out commands
from popen() calls.
Stephen Soltesz [Wed, 8 Aug 2007 13:36:46 +0000 (13:36 +0000)]
+ findbad.py: this actively probes all machines in the PLC db, using ping,
ssh, and then various commands on the machine to determine the actual bootstate.
These records are saved to disk for diagnose.py
+ diagnose.py: reads entries from findbad and previous actions, merging the
two together to determine if machines have improved, or gotten worse. All
actions to be performed are recorded and written to a diagnose_out pickle file
for action.py
+ action.py: reads the diagnose_out file from diagnose.py and performs the
actions. It permanently records the resuls in act_all pickle file.
These three in combination are Monitor.
Stephen Soltesz [Wed, 8 Aug 2007 13:32:43 +0000 (13:32 +0000)]
+ added 'production' namespace to the non-debug pickle files. This keeps
everything grouped together on a file list, and makes the mode very explicit.
Stephen Soltesz [Wed, 8 Aug 2007 13:31:32 +0000 (13:31 +0000)]
+ some cleanup. some dirtying.
Stephen Soltesz [Wed, 8 Aug 2007 13:30:42 +0000 (13:30 +0000)]
+ split the policy file into three classes: Merge(), Diagnose(), and Action().
This split is more natural and allows all the diagnosis/state-transition code to
live in once place. Action() is very simple, just taking the records from
Diagnose() and performing them.
Stephen Soltesz [Wed, 8 Aug 2007 13:28:55 +0000 (13:28 +0000)]
+ updated enableSliceCreation and enableSlices to reverse the effect of site
squeezing.
Stephen Soltesz [Wed, 8 Aug 2007 13:28:06 +0000 (13:28 +0000)]
+ add additional support for RT tickets, closing, changing Subject, and CCs.
emailViaRT() is the only needed call. If ticket_id is given, it uses
this, otherwise, a new ticket is created.
Stephen Soltesz [Wed, 8 Aug 2007 13:26:46 +0000 (13:26 +0000)]
+ add better messages for what to expect in the future
Stephen Soltesz [Wed, 8 Aug 2007 13:26:24 +0000 (13:26 +0000)]
+ format time record
Stephen Soltesz [Wed, 8 Aug 2007 13:25:57 +0000 (13:25 +0000)]
+ use OptionParser in optparse python module instead of getopt
Stephen Soltesz [Wed, 8 Aug 2007 13:25:11 +0000 (13:25 +0000)]
+ allow None arguments to constructor, and generate good defaults
Stephen Soltesz [Mon, 30 Jul 2007 13:51:20 +0000 (13:51 +0000)]
shouldn't be in cvs.
Stephen Soltesz [Tue, 3 Jul 2007 19:59:02 +0000 (19:59 +0000)]
+ new XMLRPC_SERVER name to boot.planet-lab.org
Stephen Soltesz [Tue, 3 Jul 2007 19:58:34 +0000 (19:58 +0000)]
+ use the emailViaRT() for email rather than standard email
Stephen Soltesz [Tue, 3 Jul 2007 19:57:59 +0000 (19:57 +0000)]
+ added temporary fix for ignoring tickets with a blacklist
Stephen Soltesz [Tue, 3 Jul 2007 19:57:16 +0000 (19:57 +0000)]
+ introduced rt command line emailViaRT
Stephen Soltesz [Tue, 3 Jul 2007 19:56:45 +0000 (19:56 +0000)]
+ updated tech guide url to go directly to NodeInstallation
Stephen Soltesz [Fri, 29 Jun 2007 12:42:22 +0000 (12:42 +0000)]
+ monitor.py -- modified the following three to use a record-based events,
rather than node-based
+ comon.py -- currently only looks at dbg nodes.
+ policy.py -- separated diagnoseSite() from actOnSite()
+ rt.py -- Retrieve all tickets once
+ config.py -- store for command line arguments used by other utilities.
Awkward.
+ emailTxt.py -- new messages for escalation.
+ mailer.py -- added a bcc option and hooks for config() options
+ plc.py -- added a few extra fields and utility functions
Stephen Soltesz [Fri, 29 Jun 2007 12:38:36 +0000 (12:38 +0000)]
+ blacklist.py -- manages a node blacklist on which no actions should ever be
taken
+ bootcds.py -- collects bootcd information from debug state nodes
+ bwlimit.py -- fetch all nodes with broken bwlimits.
+ dumpact.py -- pretty print the act_all.pkl db generated by monitor.py
+ getnodekey.py -- generate a known_hosts file based on the ssh_rsa_key field
of the PLC node db.
+ printpdb.py -- another pretty printer for pickle files.
+ soltesz.py -- utilitiy functions for pickles, config, etc.
Stephen Soltesz [Fri, 29 Jun 2007 12:32:58 +0000 (12:32 +0000)]
- I don't know how these ended up in cvs.
Faiyaz Ahmed [Wed, 16 May 2007 01:53:46 +0000 (01:53 +0000)]
Rewrite of policy engine.
Marc Fiuczynski [Thu, 19 Apr 2007 20:43:00 +0000 (20:43 +0000)]
added 'cleanSlices' to remove disabled users from a slice
Faiyaz Ahmed [Fri, 6 Apr 2007 17:38:14 +0000 (17:38 +0000)]
Increase threshold to a week for slice creation, 2 weeks for suspension.
Faiyaz Ahmed [Fri, 6 Apr 2007 16:16:54 +0000 (16:16 +0000)]
Update to new API.
Faiyaz Ahmed [Mon, 2 Apr 2007 20:59:37 +0000 (20:59 +0000)]
plctool - Marc's CLI util.
config.py - debug=false
Faiyaz Ahmed [Mon, 2 Apr 2007 20:57:57 +0000 (20:57 +0000)]
Migrate to new API.
Faiyaz Ahmed [Mon, 2 Apr 2007 20:28:50 +0000 (20:28 +0000)]
Fixed syntax error in passing PCU info.
Marc Fiuczynski [Thu, 22 Feb 2007 17:09:33 +0000 (17:09 +0000)]
- set API URL to www.planet-lab.org
- add authCheck method
Marc Fiuczynski [Mon, 19 Feb 2007 17:42:21 +0000 (17:42 +0000)]
fleshed out slice enable/disable support
Marc Fiuczynski [Mon, 12 Feb 2007 19:59:00 +0000 (19:59 +0000)]
Replace enableSliceCreation/removeSliceCreation with a single setSliceMax function.
Marc Fiuczynski [Mon, 12 Feb 2007 19:54:56 +0000 (19:54 +0000)]
o Fixed removeSliceCreation and enableSliceCreation functions to work with
new API.
Marc Fiuczynski [Mon, 12 Feb 2007 19:15:08 +0000 (19:15 +0000)]
o Fixed slices() function to use new API.
Marc Fiuczynski [Thu, 8 Feb 2007 22:43:11 +0000 (22:43 +0000)]
updated a number of functions to use new API
Marc Fiuczynski [Thu, 8 Feb 2007 19:59:03 +0000 (19:59 +0000)]
- Fix siteId() to work with new API
Marc Fiuczynski [Thu, 8 Feb 2007 19:43:09 +0000 (19:43 +0000)]
- Fix nodesDbg to use GetNodes API because Anon* API is now gone.
- Add renewAllSlices function to move forward all slice expiration dates that
are sooner than the date given as an argument.
- Add "allow_none=True" argument to xmlrpclib.Server so that None arg can be
marshalled via API.
Marc Fiuczynski [Thu, 1 Feb 2007 14:25:56 +0000 (14:25 +0000)]
check if maxslices arg is pased to enableSliceCreation
Marc Fiuczynski [Thu, 1 Feb 2007 14:20:19 +0000 (14:20 +0000)]
check if maxslices arg is pased to enableSliceCreation
Marc Fiuczynski [Wed, 24 Jan 2007 19:29:44 +0000 (19:29 +0000)]
updated so that plc.py can be used also nicely from the command line
Faiyaz Ahmed [Wed, 17 Jan 2007 19:46:40 +0000 (19:46 +0000)]
Update log.
Faiyaz Ahmed [Wed, 17 Jan 2007 19:33:04 +0000 (19:33 +0000)]
Added check so we dont keep resending the same email
Faiyaz Ahmed [Wed, 17 Jan 2007 16:03:30 +0000 (16:03 +0000)]
Except on MTA error and continue.
Faiyaz Ahmed [Thu, 11 Jan 2007 21:39:07 +0000 (21:39 +0000)]
Changed time to act on emails. WE GET TOO MUCH EMAIL ALREADY!
Faiyaz Ahmed [Wed, 10 Jan 2007 20:08:44 +0000 (20:08 +0000)]
* Emails users when slice renewal/creation is suspended, and when their slices are suspended.
Faiyaz Ahmed [Wed, 10 Jan 2007 20:06:30 +0000 (20:06 +0000)]
Hosed this file on alfred when rins'ing. Its RO DB access so (hopefullY) doesn't pose a serious lack of s3curity.
* Contains auth info for RT.
Faiyaz Ahmed [Tue, 14 Nov 2006 19:38:34 +0000 (19:38 +0000)]
*** empty log message ***
Faiyaz Ahmed [Tue, 14 Nov 2006 19:36:09 +0000 (19:36 +0000)]
SSH and telnet library
Faiyaz Ahmed [Tue, 14 Nov 2006 19:27:09 +0000 (19:27 +0000)]
*** empty log message ***
Faiyaz Ahmed [Tue, 14 Nov 2006 19:20:13 +0000 (19:20 +0000)]
* Sets nodes to reboot, uses PCU if available. Defaults to POD/email (with site squeezing)
* Slice emails, site slice creation revoke, freeze running slices
* Changed mailto target for summary email
Faiyaz Ahmed [Fri, 27 Oct 2006 20:24:24 +0000 (20:24 +0000)]
* Emails PI, then Slices if the node does not come up after a certain number of days.
* Beginnings of slice freeze and node rins via PLC api. Still need to finish PCU stuff.
Faiyaz Ahmed [Tue, 24 Oct 2006 20:27:32 +0000 (20:27 +0000)]
Uses CoMon's ability to find 'upness' to email. Changed queueing between threads; no more huge sleeps to maintain synch.
* Individual email messages per bucket
* emailed{} now pickled so as not to email the same site more than once.
* PLC anon API to get site basenames for email
* Searches RT for open/new in support or offline for tickets with hostname
* Beginnings of real chek for dbg
* Started squeezing of slices via PLC api.
* Work in progress. Not ready for human consumption.
Faiyaz Ahmed [Tue, 24 Oct 2006 20:19:06 +0000 (20:19 +0000)]
*** empty log message ***