Stephen Soltesz [Mon, 4 Aug 2008 15:32:45 +0000 (15:32 +0000)]
Added a check for bad dns on the node that prevents bootmanager from booting.
added a findbad.py check after running grouprins in automate_pl03.sh
additional todos.
Stephen Soltesz [Fri, 1 Aug 2008 22:09:25 +0000 (22:09 +0000)]
wrong path on install
Stephen Soltesz [Fri, 1 Aug 2008 22:08:06 +0000 (22:08 +0000)]
handle monitorconfig.py correctly
Stephen Soltesz [Fri, 1 Aug 2008 22:03:31 +0000 (22:03 +0000)]
latest addition of /var/lib/ for current and archived data files
Stephen Soltesz [Fri, 1 Aug 2008 20:48:32 +0000 (20:48 +0000)]
This commit changes the 'soltesz.py' module into 'moncommands.py' and
'database.py'
Also, findbad*.py include a timeout that should allow them to exit even if the
'futex' bug gets hung somewhere.
Also, the mailer, rt, and others are updated to use monitorconfig.py as the
source of their username and password information rather than random files
here and there. This also allows us to keep this information out of svn.
Stephen Soltesz [Fri, 1 Aug 2008 20:37:43 +0000 (20:37 +0000)]
renamed from 'soltesz' to reflect it's function and content. Should be able
to hide the re-implementation of some of the backend ultimately.
Stephen Soltesz [Fri, 1 Aug 2008 20:36:50 +0000 (20:36 +0000)]
gone.
Stephen Soltesz [Fri, 1 Aug 2008 20:36:28 +0000 (20:36 +0000)]
so avoid getting this messed up with my local copy all the time.
Stephen Soltesz [Fri, 1 Aug 2008 20:34:56 +0000 (20:34 +0000)]
allow reference to the data dir and the install path.
Stephen Soltesz [Fri, 1 Aug 2008 20:31:52 +0000 (20:31 +0000)]
the propername for this file
Stephen Soltesz [Fri, 1 Aug 2008 20:30:23 +0000 (20:30 +0000)]
adding to maintain history from 'soltesz.py' but it's a bad name choice.
Stephen Soltesz [Fri, 1 Aug 2008 17:56:01 +0000 (17:56 +0000)]
latest todo items
Stephen Soltesz [Fri, 1 Aug 2008 17:55:25 +0000 (17:55 +0000)]
ignore fields
Stephen Soltesz [Fri, 1 Aug 2008 17:36:10 +0000 (17:36 +0000)]
a unified configuration file for monitor scripts.
all usernames and passwords should go here
Stephen Soltesz [Thu, 31 Jul 2008 20:56:25 +0000 (20:56 +0000)]
Remove binary files.
Stephen Soltesz [Thu, 31 Jul 2008 20:56:01 +0000 (20:56 +0000)]
Initial import of Rpyc library. License is public domain, so it's no problem.
Stephen Soltesz [Thu, 31 Jul 2008 20:54:47 +0000 (20:54 +0000)]
Stephen Soltesz [Thu, 31 Jul 2008 20:48:50 +0000 (20:48 +0000)]
major steps to complete for the packaging of monitor-server
Stephen Soltesz [Thu, 31 Jul 2008 20:42:36 +0000 (20:42 +0000)]
Added additional statements to the svn:ignore propset
Stephen Soltesz [Thu, 31 Jul 2008 20:40:57 +0000 (20:40 +0000)]
deprecated.
Stephen Soltesz [Thu, 31 Jul 2008 20:40:22 +0000 (20:40 +0000)]
AM nagios/plc2nagios.py
a script I wrote a while ago to translate the plc db into a nagios
configuration file. might be helpful for someone else trying a better
approach with nagios
M syncplcdb.py
fixed a bug to avoid an inconsistency in the PLCDB wrt federation
migration.
AM kill.cmd.sh
continue running even if a command fails.
_M bootcd
renamed, and added to the repository. also added the ignore set property.
M getconf.py
renamed to look in bootcd dir.
A docs
AM docs/ipalprotocol.pdf
A docs/ilo2-auto-export-buffer-setup.pdf
documents that might be helpful for others maintaining the PCUs
AM rtinfo.py
sketch of code to read through a rt db cache and show useful info like
'last updated by email', which is not visible through the gui.
M reboot.py
updated to include custom code for the new PCU in plab1-itec.uni-klu.ac.at
_M ssh
A nodediff.py
template for comparing the nodes up or down between two time periods.
Stephen Soltesz [Thu, 31 Jul 2008 20:26:41 +0000 (20:26 +0000)]
Stephen Soltesz [Wed, 30 Jul 2008 22:01:08 +0000 (22:01 +0000)]
Added the AMT sample app from the IntelAMTSDK. It pulls in all cpp and
include files necessary to compile it.
Stephen Soltesz [Wed, 30 Jul 2008 20:55:58 +0000 (20:55 +0000)]
I will try to get the rpm to work with lower-case name
Stephen Soltesz [Wed, 30 Jul 2008 20:55:23 +0000 (20:55 +0000)]
Massive commit of all changes, and added files for the Monitor-server package.
Stephen Soltesz [Wed, 30 Jul 2008 20:05:07 +0000 (20:05 +0000)]
Adding third-party module used for Monitor's web pages.
Stephen Soltesz [Wed, 30 Jul 2008 20:02:24 +0000 (20:02 +0000)]
added for the first time
Stephen Soltesz [Wed, 30 Jul 2008 19:36:04 +0000 (19:36 +0000)]
add spec files for the server-side rpm package of monitor
Stephen Soltesz [Mon, 21 Jul 2008 16:30:31 +0000 (16:30 +0000)]
The most current version of everything.
Stephen Soltesz [Fri, 18 Jul 2008 18:00:30 +0000 (18:00 +0000)]
Tagging module Monitor - Monitor-1.0-5
Incremental improvements
Stephen Soltesz [Thu, 10 Jul 2008 18:16:07 +0000 (18:16 +0000)]
Completes support for the ePowerSwitch series.
Does not support the 8XM, from site 'fem'.
Stephen Soltesz [Thu, 3 Jul 2008 22:53:24 +0000 (22:53 +0000)]
Includes support for IntelAMT as well as better support for existing IPAL over
a proprietary interface at port 9100.
Stephen Soltesz [Mon, 30 Jun 2008 20:44:30 +0000 (20:44 +0000)]
Take out pcu handling in this file, since it is handled separately by
grouprins.py now
Stephen Soltesz [Tue, 24 Jun 2008 21:05:40 +0000 (21:05 +0000)]
Script designed to help transfer the 'power-users' from the public plc into a
private plc, complete with all their sites, slices, and pre-registered ssh
keys. The goal was to make their experience of the test-plc equal to the
public-plc, such that all they needed to do was log into the node without
visiting the test-plc's interface.
Stephen Soltesz [Tue, 24 Jun 2008 19:24:24 +0000 (19:24 +0000)]
Tool to find stray node network entries in the PLC db. There were currently
289 nn entires that were not associated with a valid node. This seems like an
error to me.
Stephen Soltesz [Mon, 23 Jun 2008 18:22:39 +0000 (18:22 +0000)]
text sketch of the sqlobject model to be designed for monitor
Stephen Soltesz [Mon, 23 Jun 2008 17:20:55 +0000 (17:20 +0000)]
Stephen Soltesz [Mon, 23 Jun 2008 17:05:42 +0000 (17:05 +0000)]
Include other options for the iLO, since 'reset' doesn't work when the machine
is powered off. TODO: add the check to power the host On if it is off.
Stephen Soltesz [Mon, 23 Jun 2008 17:04:48 +0000 (17:04 +0000)]
a template for a tool that will spit out the configuration for a node to see
if it has any errors.
Stephen Soltesz [Mon, 23 Jun 2008 17:04:08 +0000 (17:04 +0000)]
commit of tools I use, but are not documented or guaranteed to work for anyone
else.
Stephen Soltesz [Mon, 23 Jun 2008 17:00:06 +0000 (17:00 +0000)]
simple script to collect the info Scott requested when a site leaves PL.
Stephen Soltesz [Mon, 23 Jun 2008 16:57:53 +0000 (16:57 +0000)]
Massive commit. Just put all local changes into svn.
Stephen Soltesz [Mon, 16 Jun 2008 18:48:34 +0000 (18:48 +0000)]
add timeout
Stephen Soltesz [Tue, 20 May 2008 19:43:20 +0000 (19:43 +0000)]
For dumping the diagnose_out file.
Stephen Soltesz [Tue, 20 May 2008 19:42:15 +0000 (19:42 +0000)]
allow RT module to be removed.
Stephen Soltesz [Tue, 20 May 2008 19:37:20 +0000 (19:37 +0000)]
These modules are not used.
Stephen Soltesz [Tue, 20 May 2008 19:34:03 +0000 (19:34 +0000)]
for access to the www.printbadnodes module
Stephen Soltesz [Mon, 19 May 2008 18:45:23 +0000 (18:45 +0000)]
clean kernel parsing.
Stephen Soltesz [Mon, 19 May 2008 18:43:26 +0000 (18:43 +0000)]
Adding the model for log records
Stephen Soltesz [Mon, 19 May 2008 18:37:48 +0000 (18:37 +0000)]
update
Stephen Soltesz [Mon, 19 May 2008 18:36:27 +0000 (18:36 +0000)]
adding files
Stephen Soltesz [Mon, 19 May 2008 17:54:33 +0000 (17:54 +0000)]
Tagging module Monitor - Monitor-1.0-4
tagging everything for OneLab tech-transfer.
Stephen Soltesz [Mon, 19 May 2008 17:53:26 +0000 (17:53 +0000)]
new messages for alpha node groups, etc.
Stephen Soltesz [Mon, 19 May 2008 17:52:56 +0000 (17:52 +0000)]
mass commit
Stephen Soltesz [Tue, 13 May 2008 18:16:11 +0000 (18:16 +0000)]
Run process with timeout, and allow an arbitrary path for the source of the
pickle files, instead of the default PICKLE_PATH
Stephen Soltesz [Tue, 13 May 2008 18:13:55 +0000 (18:13 +0000)]
fixed call to hpilo script. I think added a timeout too.
now works correctly with findbad.py cron job. Doesn't hang indefinitely now.
Stephen Soltesz [Tue, 13 May 2008 18:11:59 +0000 (18:11 +0000)]
Read nodes from a given file, for batch updates when using nodequery and
nodereboot or grouprins.py
Stephen Soltesz [Tue, 13 May 2008 18:10:44 +0000 (18:10 +0000)]
Stephen Soltesz [Tue, 13 May 2008 18:09:47 +0000 (18:09 +0000)]
Improvements for older records. Consolidated code related to ending a
record.
Stephen Soltesz [Fri, 9 May 2008 21:31:19 +0000 (21:31 +0000)]
Tagging module Monitor - Monitor-1.0-3
Marc Fiuczynski [Tue, 6 May 2008 02:55:18 +0000 (02:55 +0000)]
A few changes to improve upon the script:
- try to make it stand alone python script
- uses xmlrpc directly; no longer needs to import plc module
- fetches nodenetworks for all hosts and caches it locally
to avoid having to invoke the API n times (where n is the
# of nodes at the PLC).
Still needs:
- a proper help/usage message printed
- a way to export full functionality (e.g., delete)
- a way to specify XMLRPC_SERVER as a command line option, as
now it by default assumes www.planet-lab.org/PLCAPI
Stephen Soltesz [Mon, 5 May 2008 17:58:09 +0000 (17:58 +0000)]
Tagging module Monitor - Monitor-1.0-2
Stephen Soltesz [Mon, 5 May 2008 17:01:20 +0000 (17:01 +0000)]
last typo
Stephen Soltesz [Mon, 5 May 2008 16:58:42 +0000 (16:58 +0000)]
fixes to make them more stand-alone and general.
Thierry Parmentelat [Mon, 5 May 2008 12:09:39 +0000 (12:09 +0000)]
check consistency of specfiles:
* set pldistro in release when needed (Monitor)
* remove it when already part of the rpm name (bootcd, noderepo)
Stephen Soltesz [Fri, 2 May 2008 19:18:20 +0000 (19:18 +0000)]
Major improvements. Actually useful for daily operations.
Stephen Soltesz [Wed, 23 Apr 2008 21:00:10 +0000 (21:00 +0000)]
Tagging module Monitor - Monitor-1.0-1
This should be ready for 4.2rc2
Stephen Soltesz [Mon, 14 Apr 2008 17:59:45 +0000 (17:59 +0000)]
Add a field for the currently observed status as well as the PLC db
configuration.
Stephen Soltesz [Mon, 14 Apr 2008 17:59:17 +0000 (17:59 +0000)]
Add an option to end a monitor record for a node. This results in the
accounting starting over.
Stephen Soltesz [Mon, 14 Apr 2008 17:58:36 +0000 (17:58 +0000)]
Added a convenience script for making a single command line call.
Stephen Soltesz [Fri, 11 Apr 2008 21:02:53 +0000 (21:02 +0000)]
instructs user how to create the 'auth.py' file.
Stephen Soltesz [Fri, 11 Apr 2008 20:59:45 +0000 (20:59 +0000)]
This is a template script for adding the 'Site Assistant' user into the myPLC
db, creating an rsa key, uploading it to the user account, and eventually
doing some other post-processing setup for monitor.
Stephen Soltesz [Wed, 9 Apr 2008 17:19:37 +0000 (17:19 +0000)]
- additional functions for displaying the pcu.
Stephen Soltesz [Wed, 9 Apr 2008 17:17:59 +0000 (17:17 +0000)]
- add reporting of pcu state
Stephen Soltesz [Wed, 9 Apr 2008 17:16:13 +0000 (17:16 +0000)]
- add a checked time to each record.
Stephen Soltesz [Wed, 9 Apr 2008 17:15:53 +0000 (17:15 +0000)]
- some code cleaning.
- fixed the bug that missed entries in act_all without no previous records.
- take RT tickets into account better.
Stephen Soltesz [Wed, 9 Apr 2008 17:14:48 +0000 (17:14 +0000)]
- tweaks.
Stephen Soltesz [Wed, 9 Apr 2008 17:13:58 +0000 (17:13 +0000)]
- added a checked time value
- added new kernel version. need a better way to do this.
Stephen Soltesz [Wed, 9 Apr 2008 17:13:05 +0000 (17:13 +0000)]
- additional regular rotations.
Stephen Soltesz [Wed, 9 Apr 2008 17:09:33 +0000 (17:09 +0000)]
-added commands to get and set the ticket status so this can be done automatically on node restoration.
Stephen Soltesz [Wed, 9 Apr 2008 17:08:58 +0000 (17:08 +0000)]
indent change
Stephen Soltesz [Wed, 9 Apr 2008 17:08:38 +0000 (17:08 +0000)]
- cleaning of code.
- save all of the RT db.
Stephen Soltesz [Wed, 9 Apr 2008 17:06:36 +0000 (17:06 +0000)]
-Some code cleaning to remove old ipal implementation.
-Better pcuid mappings to different pcus.
-takes command line argument when run as a program 'reboot.py <hostname>'
Stephen Soltesz [Wed, 9 Apr 2008 16:56:35 +0000 (16:56 +0000)]
add simple command line tools for manipulating node groups, and for querying
the information collected by monitor for a given node.
Stephen Soltesz [Wed, 9 Apr 2008 13:58:37 +0000 (13:58 +0000)]
take away the lowercase 'm'onitor.spec.
Stephen Soltesz [Tue, 8 Apr 2008 21:20:31 +0000 (21:20 +0000)]
Added two requirements.
Stephen Soltesz [Tue, 8 Apr 2008 20:59:25 +0000 (20:59 +0000)]
capitalize for build?
Stephen Soltesz [Tue, 8 Apr 2008 20:50:19 +0000 (20:50 +0000)]
used the wrong spec file as a template.
Stephen Soltesz [Tue, 8 Apr 2008 20:30:58 +0000 (20:30 +0000)]
Initial add of monitor spec, init, and cron file for the monitor root account scripts
Stephen Soltesz [Mon, 7 Apr 2008 20:49:49 +0000 (20:49 +0000)]
Simpler interface to api. Given a single object, it preserves the auth
variable and passes it to all subsequent calls transparently.
Stephen Soltesz [Fri, 4 Apr 2008 20:18:59 +0000 (20:18 +0000)]
a key for the monitor user.
Stephen Soltesz [Fri, 21 Mar 2008 17:26:49 +0000 (17:26 +0000)]
Basic script to collect ssh_rsa_keys for all nodes and dump into a known_hosts
file. Problems:
* needs to be updated periodically.
* needs to co-exist with a user's non-pl entries in known_hosts
* there doesn't seem to be a way to configure ssh to read two known_hosts files.
Stephen Soltesz [Thu, 28 Feb 2008 21:14:40 +0000 (21:14 +0000)]
Add a new BayTech prompt type.
Stephen Soltesz [Tue, 11 Dec 2007 22:50:16 +0000 (22:50 +0000)]
This should be a global view of all things Monitor is doing, with
instantanious view for the health of a Site, Node, it's PCUs, and what actions
have been taken by Monitor or what external states are blocking it's progres..
Stephen Soltesz [Tue, 11 Dec 2007 22:49:15 +0000 (22:49 +0000)]
Added a variety of filters to limit the nodes displayed. Also, added a 'nodesonly' option
Stephen Soltesz [Tue, 11 Dec 2007 22:48:37 +0000 (22:48 +0000)]
added some minor status message at the end
Stephen Soltesz [Tue, 11 Dec 2007 22:47:47 +0000 (22:47 +0000)]
just assume that the host is up by using -P0 arg to nmap. Without this, nmap missed some hosts that really were up.
Stephen Soltesz [Tue, 11 Dec 2007 22:47:04 +0000 (22:47 +0000)]
better support for PCUs
Stephen Soltesz [Tue, 11 Dec 2007 22:46:14 +0000 (22:46 +0000)]
add a global version of getListFromFile()
Stephen Soltesz [Tue, 11 Dec 2007 22:45:50 +0000 (22:45 +0000)]
Cache more stuff from plc in local files.