monitor.git
16 years agoAdding subdirectories for remote commands to control ILO and DRAC cards over
Stephen Soltesz [Mon, 12 Nov 2007 21:21:05 +0000 (21:21 +0000)]
Adding subdirectories for remote commands to control ILO and DRAC cards over
HTTPS.  The iloxml should probably be a subdirectory of cmdhttps...

16 years agoPolicy.py includes updates to better handle PCUs
Stephen Soltesz [Wed, 7 Nov 2007 21:22:38 +0000 (21:22 +0000)]
Policy.py includes updates to better handle PCUs

emailTxt includes new messages related to PCUs

16 years agoAdded 'FORCED' to handle some special actions
Stephen Soltesz [Wed, 7 Nov 2007 21:21:28 +0000 (21:21 +0000)]
Added 'FORCED' to handle some special actions

16 years agoAdd a retry to the apc_reboot() for which there are different models.
Stephen Soltesz [Wed, 7 Nov 2007 20:33:44 +0000 (20:33 +0000)]
Add a retry to the apc_reboot() for which there are different models.

16 years agoAdded new squence for apc_reboot()
Stephen Soltesz [Wed, 7 Nov 2007 19:52:12 +0000 (19:52 +0000)]
Added new squence for apc_reboot()

16 years agoadded some new cr
Stephen Soltesz [Wed, 7 Nov 2007 18:22:05 +0000 (18:22 +0000)]
added some new cr

16 years agotrying to get ipal_reboot() to funciton properly for cambridge nodes.
Stephen Soltesz [Wed, 7 Nov 2007 18:06:17 +0000 (18:06 +0000)]
trying to get ipal_reboot() to funciton properly for cambridge nodes.

16 years agoIgnore empty 'portstatus' dicts. This just means the ports are down.
Stephen Soltesz [Mon, 5 Nov 2007 22:33:08 +0000 (22:33 +0000)]
Ignore empty 'portstatus' dicts.  This just means the ports are down.

16 years agoAllow queries using sitefilter regular expressions, rather than a single
Stephen Soltesz [Mon, 5 Nov 2007 22:32:19 +0000 (22:32 +0000)]
Allow queries using sitefilter regular expressions, rather than a single
loginbase.  Allows displaying common sites like 'cernet*'.

16 years agoTake PCUs into account. Need to test.
Stephen Soltesz [Mon, 5 Nov 2007 22:30:35 +0000 (22:30 +0000)]
Take PCUs into account.  Need to test.

16 years agoremoves function definitions consolidated in reboot.py
Stephen Soltesz [Mon, 5 Nov 2007 22:29:50 +0000 (22:29 +0000)]
removes function definitions consolidated in reboot.py

16 years agoNew message for PCU errors. Referrs to the pl-virtual-03 pcu status page
Stephen Soltesz [Mon, 5 Nov 2007 22:29:28 +0000 (22:29 +0000)]
New message for PCU errors.  Referrs to the pl-virtual-03 pcu status page

16 years agoadded several utility functions for rebooting nodes from Monitor's diagnose and
Stephen Soltesz [Mon, 5 Nov 2007 22:28:53 +0000 (22:28 +0000)]
added several utility functions for rebooting nodes from Monitor's diagnose and
action scripts.

16 years agominor changes to reflect the new Dupal-Book format for the Tech Guide
Stephen Soltesz [Mon, 5 Nov 2007 19:17:54 +0000 (19:17 +0000)]
minor changes to reflect the new Dupal-Book format for the Tech Guide

16 years agocollects all nodes associated with a list of loginbase patterns
Stephen Soltesz [Mon, 5 Nov 2007 17:16:28 +0000 (17:16 +0000)]
collects all nodes associated with a list of loginbase patterns

16 years agoMinor description of the dependencies that Monitor has for connecting to:
Stephen Soltesz [Fri, 2 Nov 2007 21:51:59 +0000 (21:51 +0000)]
Minor description of the dependencies that Monitor has for connecting to:

  * RT
  * MySQL
  * and local database output formats.

16 years agoRun the findbad* commands and copy the files to the appropriate locations.
Stephen Soltesz [Fri, 2 Nov 2007 21:48:37 +0000 (21:48 +0000)]
Run the findbad* commands and copy the files to the appropriate locations.

16 years agocorrected a bug in reporting nmreset errors.
Stephen Soltesz [Fri, 2 Nov 2007 21:40:02 +0000 (21:40 +0000)]
corrected a bug in reporting nmreset errors.

16 years agoChanges necessary for the new operating environment. rt_tickets, returns empty strin...
Stephen Soltesz [Fri, 2 Nov 2007 21:17:48 +0000 (21:17 +0000)]
Changes necessary for the new operating environment.  rt_tickets, returns empty string to signify error, and diagnose.py exits on failure.  Can't return None, since this is seen specially by the pickle class.  That's a bug.

16 years agochanged the import statement to the correct file
Stephen Soltesz [Fri, 2 Nov 2007 21:15:48 +0000 (21:15 +0000)]
changed the import statement to the correct file

16 years agoMoved an import statment into 'main()' to avoid cmdline errors for imports by other...
Stephen Soltesz [Fri, 2 Nov 2007 21:14:59 +0000 (21:14 +0000)]
Moved an import statment into 'main()' to avoid cmdline errors for imports by other modules

16 years agoIncludes some checks for NM consistency via the 'last_updated' field in PLCdb.
Stephen Soltesz [Fri, 2 Nov 2007 18:18:59 +0000 (18:18 +0000)]
Includes some checks for NM consistency via the 'last_updated' field in PLCdb.

16 years agoname tweak
Stephen Soltesz [Fri, 2 Nov 2007 18:17:52 +0000 (18:17 +0000)]
name tweak

16 years agochanged url generated for 'PCU's to refer to pl-virtual-03 rather than
Stephen Soltesz [Fri, 2 Nov 2007 18:17:25 +0000 (18:17 +0000)]
changed url generated for 'PCU's to refer to pl-virtual-03 rather than
my local machine.

16 years agobounce into pl-virtual-03
Stephen Soltesz [Fri, 2 Nov 2007 18:16:04 +0000 (18:16 +0000)]
bounce into pl-virtual-03

16 years agosyncplcdb gets info from the PLC db necessary for site, node , and pcu
Stephen Soltesz [Fri, 2 Nov 2007 18:11:55 +0000 (18:11 +0000)]
syncplcdb gets info from the PLC db necessary for site, node , and pcu
associations.

findbadpcu.py should output in native python pickle format, and be converted
later using pkl2php.py.  This will facilitate my using the input for diagnose
and action.py

16 years agoadd additional options
Stephen Soltesz [Fri, 2 Nov 2007 18:07:11 +0000 (18:07 +0000)]
add additional options

16 years agopkl2php is a script that reads in a python pickle file and spits out the
Stephen Soltesz [Fri, 2 Nov 2007 16:34:28 +0000 (16:34 +0000)]
pkl2php is a script that reads in a python pickle file and spits out the
equivalent as a php serialize file, for data sharing between python and php.

16 years agowww interface and support libraries for some of monitor's data. Specifically:
Stephen Soltesz [Fri, 2 Nov 2007 15:10:25 +0000 (15:10 +0000)]
www interface and support libraries for some of monitor's data.  Specifically:

 - bad nodes
 - bad pcus
 - and actions taken.

16 years agofindbadpcu.py : adding files to svn
Stephen Soltesz [Wed, 24 Oct 2007 18:10:54 +0000 (18:10 +0000)]
findbadpcu.py : adding files to svn
automate.sh   : local automation script

16 years agoAdd support for serializing to PHPSerialize format. Helps exchange info
Stephen Soltesz [Tue, 16 Oct 2007 18:24:44 +0000 (18:24 +0000)]
Add support for serializing to PHPSerialize format.  Helps exchange info
between python and php scripts.

16 years agoReboot.py:
Stephen Soltesz [Tue, 16 Oct 2007 18:23:34 +0000 (18:23 +0000)]
Reboot.py:
I've added additional functions to better handle baytech, APC and HP ILO
pcus.  Also, improved the error reporting.  Additional error handling needs
to be added, to get better diagnostic messages for configuration errors.

pyssh/__init__.py:
added some necessary arguments to the 'ssh' command for password
prompting.

16 years ago+ diagnose.py: added --refresh option so that cached values can be refresh, and either
Stephen Soltesz [Wed, 29 Aug 2007 17:26:50 +0000 (17:26 +0000)]
+ diagnose.py: added --refresh option so that cached values can be refresh, and either
preserved or not for future runs.  Previously it was necessary to remove the
cached values manually.
+ emailTxt.py: tried to clarify what was needed for the bootcd and plnode.txt
file.  I think some confusion is coming up based on the all-in-one bootcd.
+ findbad.py: lock calls to the plcAPI, to avoid hammering it.  Also, be more
selective about the return values requested from Nodes and Sites.  I was
getting everything.
+ mailer.py: extra debug messages.
+ monitor.py: this file is depricated.  modification are incidental and not
important.
+ plc.py: add a filter argument to getSites and getNodes to allow specific
fields, rather than everything.
+ policy.py: lots of little fixes.  moved more logic into Diagnose() from
Action().  Still need to fix Diagnose to act on sites when nodes are
up/improved.
+ soltesz.py: added refresh function, and return value for timed-out commands
from popen() calls.

16 years ago+ findbad.py: this actively probes all machines in the PLC db, using ping,
Stephen Soltesz [Wed, 8 Aug 2007 13:36:46 +0000 (13:36 +0000)]
+ findbad.py: this actively probes all machines in the PLC db, using ping,
ssh, and then various commands on the machine to determine the actual bootstate.
These records are saved to disk for diagnose.py
+ diagnose.py: reads entries from findbad and previous actions, merging the
two together to determine if machines have improved, or gotten worse.  All
actions to be performed are recorded and written to a diagnose_out pickle file
for action.py
+ action.py: reads the diagnose_out file from diagnose.py and performs the
actions.  It permanently records the resuls in act_all pickle file.

These three in combination are Monitor.

16 years ago+ added 'production' namespace to the non-debug pickle files. This keeps
Stephen Soltesz [Wed, 8 Aug 2007 13:32:43 +0000 (13:32 +0000)]
+ added 'production' namespace to the non-debug pickle files.  This keeps
everything grouped together on a file list, and makes the mode very explicit.

16 years ago+ some cleanup. some dirtying.
Stephen Soltesz [Wed, 8 Aug 2007 13:31:32 +0000 (13:31 +0000)]
+ some cleanup. some dirtying.

16 years ago+ split the policy file into three classes: Merge(), Diagnose(), and Action().
Stephen Soltesz [Wed, 8 Aug 2007 13:30:42 +0000 (13:30 +0000)]
+ split the policy file into three classes: Merge(), Diagnose(), and Action().
This split is more natural and allows all the diagnosis/state-transition code to
live in once place.  Action() is very simple, just taking the records from
Diagnose() and performing them.

16 years ago+ updated enableSliceCreation and enableSlices to reverse the effect of site
Stephen Soltesz [Wed, 8 Aug 2007 13:28:55 +0000 (13:28 +0000)]
+ updated enableSliceCreation and enableSlices to reverse the effect of site
squeezing.

16 years ago+ add additional support for RT tickets, closing, changing Subject, and CCs.
Stephen Soltesz [Wed, 8 Aug 2007 13:28:06 +0000 (13:28 +0000)]
+ add additional support for RT tickets, closing, changing Subject, and CCs.
emailViaRT() is the only needed call.  If ticket_id is given, it uses
this, otherwise, a new ticket is created.

16 years ago+ add better messages for what to expect in the future
Stephen Soltesz [Wed, 8 Aug 2007 13:26:46 +0000 (13:26 +0000)]
+ add better messages for what to expect in the future

16 years ago+ format time record
Stephen Soltesz [Wed, 8 Aug 2007 13:26:24 +0000 (13:26 +0000)]
+ format time record

16 years ago+ use OptionParser in optparse python module instead of getopt
Stephen Soltesz [Wed, 8 Aug 2007 13:25:57 +0000 (13:25 +0000)]
+ use OptionParser in optparse python module instead of getopt

16 years ago+ allow None arguments to constructor, and generate good defaults
Stephen Soltesz [Wed, 8 Aug 2007 13:25:11 +0000 (13:25 +0000)]
+ allow None arguments to constructor, and generate good defaults

16 years agoshouldn't be in cvs.
Stephen Soltesz [Mon, 30 Jul 2007 13:51:20 +0000 (13:51 +0000)]
shouldn't be in cvs.

16 years ago+ new XMLRPC_SERVER name to boot.planet-lab.org
Stephen Soltesz [Tue, 3 Jul 2007 19:59:02 +0000 (19:59 +0000)]
+ new XMLRPC_SERVER name to boot.planet-lab.org

16 years ago+ use the emailViaRT() for email rather than standard email
Stephen Soltesz [Tue, 3 Jul 2007 19:58:34 +0000 (19:58 +0000)]
+ use the emailViaRT() for email rather than standard email

16 years ago+ added temporary fix for ignoring tickets with a blacklist
Stephen Soltesz [Tue, 3 Jul 2007 19:57:59 +0000 (19:57 +0000)]
+ added temporary fix for ignoring tickets with a blacklist

16 years ago+ introduced rt command line emailViaRT
Stephen Soltesz [Tue, 3 Jul 2007 19:57:16 +0000 (19:57 +0000)]
+ introduced rt command line emailViaRT

16 years ago+ updated tech guide url to go directly to NodeInstallation
Stephen Soltesz [Tue, 3 Jul 2007 19:56:45 +0000 (19:56 +0000)]
+ updated tech guide url to go directly to NodeInstallation

16 years ago+ monitor.py -- modified the following three to use a record-based events,
Stephen Soltesz [Fri, 29 Jun 2007 12:42:22 +0000 (12:42 +0000)]
+ monitor.py -- modified the following three to use a record-based events,
rather than node-based
+ comon.py  -- currently only looks at dbg nodes.
+ policy.py -- separated diagnoseSite() from actOnSite()
+ rt.py  -- Retrieve all tickets once
+ config.py  -- store for command line arguments used by other utilities.
Awkward.
+ emailTxt.py -- new messages for escalation.
+ mailer.py -- added a bcc option and hooks for config() options
+ plc.py  -- added a few extra fields and utility functions

16 years ago+ blacklist.py -- manages a node blacklist on which no actions should ever be
Stephen Soltesz [Fri, 29 Jun 2007 12:38:36 +0000 (12:38 +0000)]
+ blacklist.py -- manages a node blacklist on which no actions should ever be
taken
+ bootcds.py -- collects bootcd information from debug state nodes
+ bwlimit.py -- fetch all nodes with broken bwlimits.
+ dumpact.py -- pretty print the act_all.pkl db generated by monitor.py
+ getnodekey.py -- generate a known_hosts file based on the ssh_rsa_key field
of the PLC node db.
+ printpdb.py -- another pretty printer for pickle files.
+ soltesz.py -- utilitiy functions for pickles, config, etc.

16 years ago- I don't know how these ended up in cvs.
Stephen Soltesz [Fri, 29 Jun 2007 12:32:58 +0000 (12:32 +0000)]
- I don't know how these ended up in cvs.

16 years agoRewrite of policy engine.
Faiyaz Ahmed [Wed, 16 May 2007 01:53:46 +0000 (01:53 +0000)]
Rewrite of policy engine.

17 years agoadded 'cleanSlices' to remove disabled users from a slice
Marc Fiuczynski [Thu, 19 Apr 2007 20:43:00 +0000 (20:43 +0000)]
added 'cleanSlices' to remove disabled users from a slice

17 years agoIncrease threshold to a week for slice creation, 2 weeks for suspension.
Faiyaz Ahmed [Fri, 6 Apr 2007 17:38:14 +0000 (17:38 +0000)]
Increase threshold to a week for slice creation, 2 weeks for suspension.

17 years agoUpdate to new API.
Faiyaz Ahmed [Fri, 6 Apr 2007 16:16:54 +0000 (16:16 +0000)]
Update to new API.

17 years agoplctool - Marc's CLI util.
Faiyaz Ahmed [Mon, 2 Apr 2007 20:59:37 +0000 (20:59 +0000)]
plctool - Marc's CLI util.
config.py - debug=false

17 years agoMigrate to new API.
Faiyaz Ahmed [Mon, 2 Apr 2007 20:57:57 +0000 (20:57 +0000)]
Migrate to new API.

17 years agoFixed syntax error in passing PCU info.
Faiyaz Ahmed [Mon, 2 Apr 2007 20:28:50 +0000 (20:28 +0000)]
Fixed syntax error in passing PCU info.

17 years ago- set API URL to www.planet-lab.org
Marc Fiuczynski [Thu, 22 Feb 2007 17:09:33 +0000 (17:09 +0000)]
- set API URL to www.planet-lab.org
- add authCheck method

17 years agofleshed out slice enable/disable support
Marc Fiuczynski [Mon, 19 Feb 2007 17:42:21 +0000 (17:42 +0000)]
fleshed out slice enable/disable support

17 years agoReplace enableSliceCreation/removeSliceCreation with a single setSliceMax function.
Marc Fiuczynski [Mon, 12 Feb 2007 19:59:00 +0000 (19:59 +0000)]
Replace enableSliceCreation/removeSliceCreation with a single setSliceMax function.

17 years agoo Fixed removeSliceCreation and enableSliceCreation functions to work with
Marc Fiuczynski [Mon, 12 Feb 2007 19:54:56 +0000 (19:54 +0000)]
o Fixed removeSliceCreation and enableSliceCreation functions to work with
  new API.

17 years agoo Fixed slices() function to use new API.
Marc Fiuczynski [Mon, 12 Feb 2007 19:15:08 +0000 (19:15 +0000)]
o Fixed slices() function to use new API.

17 years agoupdated a number of functions to use new API
Marc Fiuczynski [Thu, 8 Feb 2007 22:43:11 +0000 (22:43 +0000)]
updated a number of functions to use new API

17 years ago- Fix siteId() to work with new API
Marc Fiuczynski [Thu, 8 Feb 2007 19:59:03 +0000 (19:59 +0000)]
- Fix siteId() to work with new API

17 years ago- Fix nodesDbg to use GetNodes API because Anon* API is now gone.
Marc Fiuczynski [Thu, 8 Feb 2007 19:43:09 +0000 (19:43 +0000)]
- Fix nodesDbg to use GetNodes API because Anon* API is now gone.
- Add renewAllSlices function to move forward all slice expiration dates that
   are sooner than the date given as an argument.
- Add "allow_none=True" argument to xmlrpclib.Server so that None arg can be
   marshalled via API.

17 years agocheck if maxslices arg is pased to enableSliceCreation
Marc Fiuczynski [Thu, 1 Feb 2007 14:25:56 +0000 (14:25 +0000)]
check if maxslices arg is pased to enableSliceCreation

17 years agocheck if maxslices arg is pased to enableSliceCreation
Marc Fiuczynski [Thu, 1 Feb 2007 14:20:19 +0000 (14:20 +0000)]
check if maxslices arg is pased to enableSliceCreation

17 years agoupdated so that plc.py can be used also nicely from the command line
Marc Fiuczynski [Wed, 24 Jan 2007 19:29:44 +0000 (19:29 +0000)]
updated so that plc.py can be used also nicely from the command line

17 years agoUpdate log.
Faiyaz Ahmed [Wed, 17 Jan 2007 19:46:40 +0000 (19:46 +0000)]
Update log.

17 years agoAdded check so we dont keep resending the same email
Faiyaz Ahmed [Wed, 17 Jan 2007 19:33:04 +0000 (19:33 +0000)]
Added check so we dont keep resending the same email

17 years agoExcept on MTA error and continue.
Faiyaz Ahmed [Wed, 17 Jan 2007 16:03:30 +0000 (16:03 +0000)]
Except on MTA error and continue.

17 years agoChanged time to act on emails. WE GET TOO MUCH EMAIL ALREADY!
Faiyaz Ahmed [Thu, 11 Jan 2007 21:39:07 +0000 (21:39 +0000)]
Changed time to act on emails.  WE GET TOO MUCH EMAIL ALREADY!

17 years ago* Emails users when slice renewal/creation is suspended, and when their slices are...
Faiyaz Ahmed [Wed, 10 Jan 2007 20:08:44 +0000 (20:08 +0000)]
* Emails users when slice renewal/creation is suspended, and when their slices are suspended.

17 years agoHosed this file on alfred when rins'ing. Its RO DB access so (hopefullY) doesn't...
Faiyaz Ahmed [Wed, 10 Jan 2007 20:06:30 +0000 (20:06 +0000)]
Hosed this file on alfred when rins'ing.  Its RO DB access so (hopefullY) doesn't pose a serious lack of s3curity.

*  Contains auth info for RT.

17 years ago*** empty log message ***
Faiyaz Ahmed [Tue, 14 Nov 2006 19:38:34 +0000 (19:38 +0000)]
*** empty log message ***

17 years agoSSH and telnet library
Faiyaz Ahmed [Tue, 14 Nov 2006 19:36:09 +0000 (19:36 +0000)]
SSH and telnet library

17 years ago*** empty log message ***
Faiyaz Ahmed [Tue, 14 Nov 2006 19:27:09 +0000 (19:27 +0000)]
*** empty log message ***

17 years ago* Sets nodes to reboot, uses PCU if available. Defaults to POD/email (with site...
Faiyaz Ahmed [Tue, 14 Nov 2006 19:20:13 +0000 (19:20 +0000)]
*  Sets nodes to reboot, uses PCU if available.  Defaults to POD/email (with site squeezing)
*  Slice emails, site slice creation revoke, freeze running slices
*  Changed mailto target for summary email

17 years ago* Emails PI, then Slices if the node does not come up after a certain number of days.
Faiyaz Ahmed [Fri, 27 Oct 2006 20:24:24 +0000 (20:24 +0000)]
* Emails PI, then Slices if the node does not come up after a certain number of days.
* Beginnings of slice freeze and node rins via PLC api.  Still need to finish PCU stuff.

17 years agoUses CoMon's ability to find 'upness' to email. Changed queueing between threads...
Faiyaz Ahmed [Tue, 24 Oct 2006 20:27:32 +0000 (20:27 +0000)]
Uses CoMon's ability to find 'upness' to email.  Changed queueing between threads;  no more huge sleeps to maintain synch.

* Individual email messages per bucket
* emailed{} now pickled so as not to email the same site more than once.
* PLC anon API to get site basenames for email
* Searches RT for open/new in support or offline for tickets with hostname
* Beginnings of real chek for dbg
* Started squeezing of slices via PLC api.
*  Work in progress.  Not ready for human consumption.

17 years ago*** empty log message ***
Faiyaz Ahmed [Tue, 24 Oct 2006 20:19:06 +0000 (20:19 +0000)]
*** empty log message ***

17 years agoThis commit was generated by cvs2svn to compensate for changes in r2,
Faiyaz Ahmed [Tue, 3 Oct 2006 21:45:59 +0000 (21:45 +0000)]
This commit was generated by cvs2svn to compensate for changes in r2,
which included commits to RCS files with non-trunk default branches.

17 years agoNew repository initialized by cvs2svn.
Planet-Lab Support [Tue, 3 Oct 2006 21:45:59 +0000 (21:45 +0000)]
New repository initialized by cvs2svn.