Stephen Soltesz [Tue, 29 Jun 2010 22:01:36 +0000 (22:01 +0000)]
add checkrt to indicate when a site has new or open tickets
add checkescalation to infer the penalty applied to a site based on the state
of it's site and slices
add extra RT configuration fields to auth.py
Stephen Soltesz [Tue, 29 Jun 2010 20:54:48 +0000 (20:54 +0000)]
add support for the myops object tags. Applies to sites, slices, and persons.
Sites with 'exempt_site_until' are not disabled
Persons with 'exempt_site_until' are not emailed
Slices with 'exempt_slice_until' are not suspended
This feature will replace the 'blacklist' command line tool.
Currently, there is no GUI support for Person or Site Tags.
Stephen Soltesz [Mon, 28 Jun 2010 15:47:47 +0000 (15:47 +0000)]
add a warning when given loginbase returns nothing
add two time functions to convert strings to timestamp or datetime objects
Stephen Soltesz [Fri, 25 Jun 2010 21:17:43 +0000 (21:17 +0000)]
add real checks for RebootNodeWithPCU. Report errors returned by API
add notes_url to pcu service
Stephen Soltesz [Fri, 25 Jun 2010 15:40:50 +0000 (15:40 +0000)]
add comon_analysis graph
Stephen Soltesz [Mon, 21 Jun 2010 20:59:37 +0000 (20:59 +0000)]
Stephen Soltesz [Mon, 21 Jun 2010 20:37:41 +0000 (20:37 +0000)]
a simple auth file for accessing remote plc
Stephen Soltesz [Mon, 21 Jun 2010 20:30:59 +0000 (20:30 +0000)]
simplify plc_users_to_nagios imports as with plc_hosts...
Stephen Soltesz [Mon, 21 Jun 2010 20:27:16 +0000 (20:27 +0000)]
typo
Stephen Soltesz [Mon, 21 Jun 2010 20:26:05 +0000 (20:26 +0000)]
make plc.py simpler to reduce the dependencies for plc_hosts_to_nagios.
add cron script to regenerate config files daily.
add dependencies and setup to monitor-nagios rpm
improve monitor-nagios.init script (I still think it may need to only be run once).
Stephen Soltesz [Mon, 21 Jun 2010 18:13:46 +0000 (18:13 +0000)]
add an escalation for a bad pcu status.
every observed service has an associated action
Stephen Soltesz [Fri, 18 Jun 2010 23:05:43 +0000 (23:05 +0000)]
add check to see if mysqld is running in init script
Stephen Soltesz [Fri, 18 Jun 2010 22:57:02 +0000 (22:57 +0000)]
create a skeleton init script for monitor-nagios. not sure if this really
needs to run every time, since setup only needs to happen once.
Stephen Soltesz [Fri, 18 Jun 2010 22:11:35 +0000 (22:11 +0000)]
typo
Stephen Soltesz [Fri, 18 Jun 2010 22:09:43 +0000 (22:09 +0000)]
attempting to separate server and nagios packages explicitly
Stephen Soltesz [Fri, 18 Jun 2010 21:55:13 +0000 (21:55 +0000)]
update nagios scripts with new paths
add monitor-nagios package to spec file
remove pcucontrol from setup.py
Stephen Soltesz [Fri, 18 Jun 2010 21:44:49 +0000 (21:44 +0000)]
move files into function-specific directories
Stephen Soltesz [Fri, 18 Jun 2010 21:43:17 +0000 (21:43 +0000)]
move nagios files to nagios dir
Stephen Soltesz [Fri, 18 Jun 2010 21:40:16 +0000 (21:40 +0000)]
add a nagios dir to the monitor tree
Stephen Soltesz [Fri, 18 Jun 2010 21:24:39 +0000 (21:24 +0000)]
add a module for generating nagios configuration objects from python objects
improved generation for plc sites/hosts
separated site escalation from notification
host reboot stubs
host pcu service check stubs
Stephen Soltesz [Fri, 18 Jun 2010 21:21:08 +0000 (21:21 +0000)]
move some routines from plccache to generic to avoid pulling in db routines
Stephen Soltesz [Fri, 18 Jun 2010 21:19:44 +0000 (21:19 +0000)]
add external commands as stubs for the nagios plugins
Stephen Soltesz [Fri, 4 Jun 2010 23:16:01 +0000 (23:16 +0000)]
convert some sites and users into nagios a configuration
added hostescalation, automated reboot, custom notify commands
needs more testing
Stephen Soltesz [Fri, 4 Jun 2010 21:56:10 +0000 (21:56 +0000)]
add logging to reboot.py
Stephen Soltesz [Thu, 3 Jun 2010 18:31:01 +0000 (18:31 +0000)]
rename and split plc2nagios file
Stephen Soltesz [Thu, 3 Jun 2010 17:35:30 +0000 (17:35 +0000)]
add some service escalation templates
Stephen Soltesz [Tue, 25 May 2010 21:15:27 +0000 (21:15 +0000)]
add generic routines for manipulating lists from PLCAPI
Barış Metin [Fri, 21 May 2010 08:39:49 +0000 (08:39 +0000)]
fcdistro -> distroname
Stephen Soltesz [Thu, 20 May 2010 19:26:57 +0000 (19:26 +0000)]
Branch 3.0 for module Monitor created (as new trunk) from tag Monitor-3.0-35
Stephen Soltesz [Thu, 20 May 2010 19:25:55 +0000 (19:25 +0000)]
Setting tag Monitor-3.0-35
Add CSV link on Advanced query
Preparing to branch
Stephen Soltesz [Thu, 20 May 2010 17:46:14 +0000 (17:46 +0000)]
add a CVS Format link to the advanced query page
Barış Metin [Wed, 12 May 2010 15:00:59 +0000 (15:00 +0000)]
Setting tag Monitor-3.0-34
* copy selections to clipbord on Advanced Query page
* RPM Pattern as regexp
* scan ipmi port
Barış Metin [Tue, 11 May 2010 20:02:10 +0000 (20:02 +0000)]
match rpm pattern with regexp
Barış Metin [Tue, 11 May 2010 19:37:27 +0000 (19:37 +0000)]
move "copy to clipboard" button to table header
Barış Metin [Mon, 10 May 2010 17:33:44 +0000 (17:33 +0000)]
scan ipmi port too
Barış Metin [Thu, 6 May 2010 10:26:36 +0000 (10:26 +0000)]
in Advanced Query, select rows and copy values to clipboard in csv format.
Thierry Parmentelat [Wed, 5 May 2010 07:55:43 +0000 (07:55 +0000)]
set default pagesize on all views to 999
Barış Metin [Tue, 27 Apr 2010 10:46:24 +0000 (10:46 +0000)]
Setting tag Monitor-3.0-33
handle hostname changes
Barış Metin [Mon, 26 Apr 2010 08:59:53 +0000 (08:59 +0000)]
handle hostname changes
Thierry Parmentelat [Tue, 20 Apr 2010 08:27:27 +0000 (08:27 +0000)]
Setting tag Monitor-3.0-32
from this version, suitable for 5.0
requires bootcd with the new 5.0 naming style 3-part nodefamily
Thierry Parmentelat [Tue, 20 Apr 2010 08:25:12 +0000 (08:25 +0000)]
for 5.0, requires bootcd with new 3-part nodefamily
Stephen Soltesz [Mon, 12 Apr 2010 14:50:50 +0000 (14:50 +0000)]
Setting tag Monitor-3.0-31
added fix for node delete/add causing conflicts in MyOps db.
added statistics scripts
Stephen Soltesz [Thu, 8 Apr 2010 19:34:35 +0000 (19:34 +0000)]
fixes bug in myops for a node with different node_id. This occurs when
deleting and then adding a node with the same name in plc.
Barış Metin [Tue, 6 Apr 2010 13:15:50 +0000 (13:15 +0000)]
fix path
Stephen Soltesz [Thu, 25 Mar 2010 19:51:00 +0000 (19:51 +0000)]
add myops_restoration
Stephen Soltesz [Sat, 13 Mar 2010 20:00:27 +0000 (20:00 +0000)]
fixed typo on logger name for exceptions.
Stephen Soltesz [Tue, 2 Mar 2010 19:30:13 +0000 (19:30 +0000)]
add new scripts
Barış Metin [Tue, 16 Feb 2010 14:28:43 +0000 (14:28 +0000)]
ops... fix path.
Stephen Soltesz [Thu, 11 Feb 2010 20:14:07 +0000 (20:14 +0000)]
R routines for printing some statistics
Stephen Soltesz [Thu, 11 Feb 2010 20:12:52 +0000 (20:12 +0000)]
Stephen Soltesz [Thu, 11 Feb 2010 20:08:28 +0000 (20:08 +0000)]
test
Stephen Soltesz [Thu, 21 Jan 2010 20:22:14 +0000 (20:22 +0000)]
add more info to sliceavg
parserpms does a better job of sorting and converting entries with multiple versions
Stephen Soltesz [Thu, 21 Jan 2010 20:15:57 +0000 (20:15 +0000)]
add a conversion class for datetime and time stamps, since I need this all the time.
'Created' value in mailer.py is causing problems for PLE
move print statements to stderr in plccache.py and comon.py
add an 'escapeName' routine in dbpickle to allow filepaths in output names
fix bug in scanapi that missed debug node if there was no bootmanager.log
add checks for yum.config files
Stephen Soltesz [Thu, 21 Jan 2010 19:47:29 +0000 (19:47 +0000)]
replace some print statements to stderr
add HistorySiteRecord to checksync
Barış Metin [Thu, 21 Jan 2010 10:50:38 +0000 (10:50 +0000)]
Setting tag Monitor-3.0-30
* fix paths for automate script
Barış Metin [Wed, 20 Jan 2010 14:40:11 +0000 (14:40 +0000)]
fix paths
Barış Metin [Tue, 22 Dec 2009 17:12:17 +0000 (17:12 +0000)]
Setting tag Monitor-3.0-29
- separate pcucontrol as an svn module
- restore easy_instal back into post install stage of server-deps
- template imporovements for web interface
Barış Metin [Tue, 22 Dec 2009 15:54:28 +0000 (15:54 +0000)]
move easy_install calls back to post install.
running easy_install didn't work as I thought it would, every now and
again it fails and break our build.
Barış Metin [Tue, 22 Dec 2009 12:14:39 +0000 (12:14 +0000)]
require pcucontrol.
Barış Metin [Tue, 22 Dec 2009 12:03:57 +0000 (12:03 +0000)]
remove pcucontrol from Monitor.spec
Barış Metin [Tue, 22 Dec 2009 12:02:27 +0000 (12:02 +0000)]
move pcucontrol package into pcucontrol module.
Stephen Soltesz [Fri, 18 Dec 2009 21:13:30 +0000 (21:13 +0000)]
move nodelist.kid headers into node_template.kid to remove redundancy.
comment-out the boot/down summary at the top of the nodelist.kid page; ...
Thierry Parmentelat [Fri, 18 Dec 2009 18:17:36 +0000 (18:17 +0000)]
work around the lack of libm.a on f12
Barış Metin [Fri, 18 Dec 2009 16:08:30 +0000 (16:08 +0000)]
merged pcucontrol into monitor-server. although monitor-pcucontrol may
be utilized as a seperate package it makes managing the %files more
complicated for the moment. if we had need to generalize it at some
point, we can manage it in a separate rpm (and/or svn module?)
Barış Metin [Thu, 17 Dec 2009 21:12:39 +0000 (21:12 +0000)]
ok, don't break anything on f8 too :)
Barış Metin [Thu, 17 Dec 2009 21:02:10 +0000 (21:02 +0000)]
fix f12 build
Barış Metin [Thu, 17 Dec 2009 16:27:38 +0000 (16:27 +0000)]
Setting tag Monitor-3.0-28
do not need buildrequires. a new tag to fix centos builds
Barış Metin [Thu, 17 Dec 2009 14:40:38 +0000 (14:40 +0000)]
comment out buildrequires
Barış Metin [Thu, 17 Dec 2009 11:52:31 +0000 (11:52 +0000)]
Setting tag Monitor-3.0-27
fix rpm build issues
Barış Metin [Thu, 17 Dec 2009 11:42:59 +0000 (11:42 +0000)]
setuptools don't really care about --build-directory.
It's just easier to export TMPDIR. Thanks to Thierry.
Barış Metin [Thu, 17 Dec 2009 09:59:00 +0000 (09:59 +0000)]
add *egg/ directories to the package. easy_install can bring in
additional dependencies (that's the case for f12 build).
Barış Metin [Wed, 16 Dec 2009 15:41:21 +0000 (15:41 +0000)]
Setting tag Monitor-3.0-26
to many changes, but mostly moved stuff around. there are some small fixes here and there.
Barış Metin [Wed, 16 Dec 2009 14:51:54 +0000 (14:51 +0000)]
handle IndexError in getpcu
Barış Metin [Mon, 14 Dec 2009 23:22:29 +0000 (23:22 +0000)]
require easy_install at build time
Barış Metin [Mon, 14 Dec 2009 15:57:04 +0000 (15:57 +0000)]
hope I got the merge wright...
svn merge -r 15903:16132 https://svn.planet-lab.org/svn/Monitor/branches/monitor-
20091130 .
Stephen Soltesz [Mon, 7 Dec 2009 23:04:38 +0000 (23:04 +0000)]
report any expired sites & nodes
Stephen Soltesz [Mon, 7 Dec 2009 21:01:32 +0000 (21:01 +0000)]
add two cases for resolving nodes that run out of disk space during boot-strap
Stephen Soltesz [Mon, 7 Dec 2009 21:00:56 +0000 (21:00 +0000)]
only enable a site if the 'enabled' field is False.
NOTE: This will address ticket: https://svn.planet-lab.org/ticket/592
Stephen Soltesz [Mon, 7 Dec 2009 20:59:46 +0000 (20:59 +0000)]
added supported_ports to class definition
removed references to self.transport.verbose since here is no self.transport
Stephen Soltesz [Thu, 3 Dec 2009 02:45:58 +0000 (02:45 +0000)]
reformat time install_date to timestamp when returned by advanced query
1/3 of online nodes in PLC did not have the /usr/boot/plnode.txt file, so add
a secondary check to install_date to check for another file in /usr/boot/
Stephen Soltesz [Mon, 30 Nov 2009 16:48:22 +0000 (16:48 +0000)]
sort last_changed by correct times
Stephen Soltesz [Sat, 21 Nov 2009 02:07:43 +0000 (02:07 +0000)]
I think this applies svn ignore property everywhere.
Stephen Soltesz [Sat, 21 Nov 2009 02:01:30 +0000 (02:01 +0000)]
deprecate www directory and its legacy scripts
Stephen Soltesz [Sat, 21 Nov 2009 01:38:00 +0000 (01:38 +0000)]
remove a lot of deprecated files ;
move non-user or setup scripts to extra/ directory
Stephen Soltesz [Sat, 21 Nov 2009 00:58:05 +0000 (00:58 +0000)]
Stephen Soltesz [Fri, 20 Nov 2009 23:18:33 +0000 (23:18 +0000)]
Setting tag Monitor-3.0-25
add option for site status to include both node & pcu status
improve ticket handling
template gadget.xml for a site-specific google-gadget summary
Stephen Soltesz [Fri, 20 Nov 2009 22:36:17 +0000 (22:36 +0000)]
added templating to google gadget xml file in monitor-server; previously it
was hard-coded to monitor.planet-lab.org ; now PLE can have their own google
gadget.
added policy to close tickets if all nodes & pcus at a site are ok, to prevent
some leaking tickets.
Barış Metin [Fri, 20 Nov 2009 10:43:40 +0000 (10:43 +0000)]
correct message
Barış Metin [Fri, 20 Nov 2009 10:35:29 +0000 (10:35 +0000)]
- check if the site is in 'pending' state on all site actions
- clean-up
Stephen Soltesz [Thu, 19 Nov 2009 20:42:07 +0000 (20:42 +0000)]
add checkpcu option to findall.py & clarify help text.
Stephen Soltesz [Thu, 19 Nov 2009 19:14:19 +0000 (19:14 +0000)]
add option to check pcu status as part of the condition for marking a site as
'good' or 'down'
Stephen Soltesz [Mon, 9 Nov 2009 20:34:34 +0000 (20:34 +0000)]
try looking for a shorter prompt initially, since local admin reports that
this is typical behavior of the pcu.
Barış Metin [Wed, 28 Oct 2009 13:46:14 +0000 (13:46 +0000)]
fix repository link and comment out wiki pages as they're not present
Barış Metin [Thu, 22 Oct 2009 07:59:29 +0000 (07:59 +0000)]
- add install_date
Setting tag Monitor-3.0-24
--This line, and those below, will be ignored--
Please write a changelog for this new tag in the section above
DIFF=========
Index: nodequery.py
===================================================================
--- nodequery.py (.../tags/Monitor-3.0-23) (révision 15400)
+++ nodequery.py (.../trunk) (révision 15400)
@@ -38,6 +38,8 @@
fbnode['bootcd_version'] = "unknown"
if not fbnode['boot_server']:
fbnode['boot_server'] = "unknown"
+ if not fbnode['install_date']:
+ fbnode['install_date'] = "unknown"
fbnode['pcu'] = color_pcu_state(fbnode)
if not fields:
@@ -60,7 +62,7 @@
#print "ERROR!!!!!!!!!!!!!!!!!!!!!"
pass
- print "%(hostname)-45s | %(date_checked)11.11s | %(boot_state)5.5s| %(observed_status)8.8s | %(ssh_status)5.5s | %(pcu)6.6s | %(bootcd_version)6.6s | %(boot_server)s | %(kernel_version)s" % fbnode
+ print "%(hostname)-45s | %(date_checked)11.11s | %(boot_state)5.5s| %(observed_status)8.8s | %(ssh_status)5.5s | %(pcu)6.6s | %(bootcd_version)6.6s | %(boot_server)s | %(install_date)s | %(kernel_version)s" % fbnode
else:
format = ""
for f in fields:
Index: web/MonitorWeb/monitorweb/controllers.py
===================================================================
--- web/MonitorWeb/monitorweb/controllers.py (.../tags/Monitor-3.0-23) (révision 15400)
+++ web/MonitorWeb/monitorweb/controllers.py (.../trunk) (révision 15400)
@@ -54,6 +54,7 @@
kernel_version = widgets.CheckBox(label="Kernel")
bootcd_version = widgets.CheckBox(label="BootCD")
boot_server = widgets.CheckBox(label="Boot Server")
+ install_date = widgets.CheckBox(label="Installation Date")
observed_status = widgets.CheckBox(label="Observed Status")
uptime = widgets.CheckBox(label="Uptime")
traceroute = widgets.CheckBox(label="Traceroute")
Index: web/MonitorWeb/monitorweb/templates/nodescanhistory.kid
===================================================================
--- web/MonitorWeb/monitorweb/templates/nodescanhistory.kid (.../tags/Monitor-3.0-23) (révision 15400)
+++ web/MonitorWeb/monitorweb/templates/nodescanhistory.kid (.../trunk) (révision 15400)
@@ -63,6 +63,7 @@
<th class="sortable plekit_table">kernel</th>
<th class="sortable plekit_table">BootCD</th>
<th class="sortable plekit_table">Boot Server</th>
+ <th class="sortable plekit_table">Installation Date</th>
<th class="sortable plekit_table">Last_contact</th>
</tr>
</thead>
@@ -78,6 +79,7 @@
<td nowrap="true" py:content="node.kernel"></td>
<td nowrap="true" py:content="node.node.bootcd_version"></td>
<td nowrap="true" py:content="node.node.boot_server"></td>
+ <td nowrap="true" py:content="node.node.install_date"></td>
<td id="node-${node.node.observed_status}" py:content="diff_time(node.node.plc_node_stats['last_contact'])"></td>
</span>
</tr>
Index: web/MonitorWeb/monitorweb/templates/node_template.kid
===================================================================
--- web/MonitorWeb/monitorweb/templates/node_template.kid (.../tags/Monitor-3.0-23) (révision 15400)
+++ web/MonitorWeb/monitorweb/templates/node_template.kid (.../trunk) (révision 15400)
@@ -16,6 +16,7 @@
<th>kernel</th>
<th>BootCD</th>
<th>Boot Server</th>
+ <th>Installation Date</th>
<th>last_contact</th>
</span>
<span py:if="node is not None">
@@ -43,6 +44,7 @@
<td nowrap="true" py:content="node.kernel"></td>
<td nowrap="true" py:content="node.node.bootcd_version"></td>
<td nowrap="true" py:content="node.node.boot_server"></td>
+ <td nowrap="true" py:content="node.node.install_date"></td>
<td id="node-${node.node.observed_status}" py:content="diff_time(node.node.plc_node_stats['last_contact'])"></td>
</span>
</span>
Index: upgrade/monitor-server-3.0-23.sql
===================================================================
--- upgrade/monitor-server-3.0-23.sql (.../tags/Monitor-3.0-23) (révision 0)
+++ upgrade/monitor-server-3.0-23.sql (.../trunk) (révision 15400)
@@ -0,0 +1,3 @@
+
+ALTER TABLE findbadnoderecord ADD COLUMN install_date varchar DEFAULT NULL;
+ALTER TABLE findbadnoderecord_history ADD COLUMN install_date varchar DEFAULT NULL;
Index: monitor/database/info/findbad.py
===================================================================
--- monitor/database/info/findbad.py (.../tags/Monitor-3.0-23) (révision 15400)
+++ monitor/database/info/findbad.py (.../trunk) (révision 15400)
@@ -39,6 +39,7 @@
kernel_version = Field(String,default=None)
bootcd_version = Field(String,default=None)
boot_server = Field(String,default=None)
+ install_date = Field(String,default=None)
nm_status = Field(String,default=None)
fs_status = Field(String,default=None)
iptables_status = Field(String,default=None)
Index: monitor/scanapi.py
===================================================================
--- monitor/scanapi.py (.../tags/Monitor-3.0-23) (révision 15400)
+++ monitor/scanapi.py (.../trunk) (révision 15400)
@@ -238,6 +238,7 @@
echo ' "bmlog":"'`ls /tmp/bm.log`'",'
echo ' "bootcd_version":"'`cat /mnt/cdrom/bootme/ID`'",'
echo ' "boot_server":"'`cat /mnt/cdrom/bootme/BOOTSERVER`'",'
+ echo ' "install_date":"'`python -c "import os,time,stat; print time.ctime(os.stat('/usr/boot/plnode.txt')[stat.ST_CTIME])"`'",'
echo ' "nm_status":"'`ps ax | grep nm.py | grep -v grep`'",'
echo ' "dns_status":"'`host boot.planet-lab.org 2>&1`'",'
echo ' "iptables_status":"'`iptables -t mangle -nL | awk '$1~/^[A-Z]+$/ {modules[$1]=1;}END{for (k in modules) {if (k) printf "%s ",k;}}'`'",'
@@ -262,6 +263,7 @@
else:
values.update({'kernel_version': "", 'bmlog' : "", 'bootcd_version' : '',
'boot_server' : '',
+ 'install_date' : '',
'nm_status' : '',
'fs_status' : '',
'uptime' : '',
Barış Metin [Tue, 20 Oct 2009 16:46:32 +0000 (16:46 +0000)]
add install_date field
Barış Metin [Mon, 19 Oct 2009 08:31:00 +0000 (08:31 +0000)]
- remove monitor-client.cron
- remove unused monitor-client init script
- fix UP/DOWN summary on nodes page.
- make node page display all nodes by default
- add boot_server field
- add myops_ssh_key to the keychain
- use ext_consortium_id to distinguish pending sites.
Setting tag Monitor-3.0-23
--This line, and those below, will be ignored--
Please write a changelog for this new tag in the section above
DIFF=========
Index: monitor.cron
===================================================================
--- monitor.cron (.../tags/Monitor-3.0-22) (révision 15357)
+++ monitor.cron (.../trunk) (révision 15357)
@@ -1,5 +0,0 @@
-# Runs once a day at 12pm to fetch the monitor account keys in case it was
-# inaccessible previously due to a network outage.
-
-0 12 * * * root /etc/init.d/monitor start > /dev/null 2>&1
-
Index: monitor-client.init
===================================================================
--- monitor-client.init (.../tags/Monitor-3.0-22) (révision 15357)
+++ monitor-client.init (.../trunk) (révision 15357)
@@ -1,53 +0,0 @@
-#!/bin/bash
-#
-# monitor Enables the monitor account by setting up the ssh key from the enabled PLC.
-#
-# Load before nm, vcached, and vservers, vserver-reference
-# chkconfig: 3 59 80
-# description: Fetch monitor ssh key to enable access to machine via monitor immediately.
-#
-# Stephen Soltesz <soltesz@cs.princeton.edu>
-# Copyright (C) 2008 The Trustees of Princeton University
-#
-# $Id$
-#
-
-case "$1" in
- start|restart|reload)
- ;;
- stop|status)
- exit 0
- ;;
- *)
- echo $"Usage: $0 {start|stop|restart|status}"
- exit 1
- ;;
-esac
-
-#
-# NOTE: This user is not used by monitor for the moment so better not create it.
-#
-
-# # NOTE: If user already exists, this just exists with status 9. I think it's
-# # ok to simply let this command check and error out.
-# # Parse PLC configuration
-# if [ -r /etc/planetlab/plc_config ] ; then
-# . /etc/planetlab/plc_config
-# else
-# PLC_NAME="PlanetLab"
-# PLC_SLICE_PREFIX="pl"
-# PLC_BOOT_HOST="boot.planet-lab.org"
-# fi
-
-# USER="${PLC_SLICE_PREFIX}_monitor"
-# /usr/sbin/useradd -p "" -m $USER &> /dev/null || :
-
-# if [ ! -d /home/$USER/.ssh ] ; then
-# mkdir /home/$USER/.ssh
-# chmod 700 /home/$USER/.ssh
-# chown $USER.$USER /home/$USER/.ssh
-# fi
-
-# URL="http://${PLC_BOOT_HOST}/PlanetLabConf/keys.php?$USER"
-# curl -s "$URL" > /home/$USER/.ssh/authorized_keys
-# chown $USER.$USER /home/$USER/.ssh/authorized_keys
Index: nodequery.py
===================================================================
--- nodequery.py (.../tags/Monitor-3.0-22) (révision 15357)
+++ nodequery.py (.../trunk) (révision 15357)
@@ -36,6 +36,8 @@
fbnode['bootcd_version'] = fbnode['bootcd_version'].split()[-1]
else:
fbnode['bootcd_version'] = "unknown"
+ if not fbnode['boot_server']:
+ fbnode['boot_server'] = "unknown"
fbnode['pcu'] = color_pcu_state(fbnode)
if not fields:
@@ -58,7 +60,7 @@
#print "ERROR!!!!!!!!!!!!!!!!!!!!!"
pass
- print "%(hostname)-45s | %(date_checked)11.11s | %(boot_state)5.5s| %(observed_status)8.8s | %(ssh_status)5.5s | %(pcu)6.6s | %(bootcd_version)6.6s | %(kernel_version)s" % fbnode
+ print "%(hostname)-45s | %(date_checked)11.11s | %(boot_state)5.5s| %(observed_status)8.8s | %(ssh_status)5.5s | %(pcu)6.6s | %(bootcd_version)6.6s | %(boot_server)s | %(kernel_version)s" % fbnode
else:
format = ""
for f in fields:
Index: web/MonitorWeb/monitorweb/controllers.py
===================================================================
--- web/MonitorWeb/monitorweb/controllers.py (.../tags/Monitor-3.0-22) (révision 15357)
+++ web/MonitorWeb/monitorweb/controllers.py (.../trunk) (révision 15357)
@@ -53,6 +53,7 @@
external_dns_status = widgets.CheckBox(label="Hostname Resolves?")
kernel_version = widgets.CheckBox(label="Kernel")
bootcd_version = widgets.CheckBox(label="BootCD")
+ boot_server = widgets.CheckBox(label="Boot Server")
observed_status = widgets.CheckBox(label="Observed Status")
uptime = widgets.CheckBox(label="Uptime")
traceroute = widgets.CheckBox(label="Traceroute")
Index: web/MonitorWeb/monitorweb/templates/nodescanhistory.kid
===================================================================
--- web/MonitorWeb/monitorweb/templates/nodescanhistory.kid (.../tags/Monitor-3.0-22) (révision 15357)
+++ web/MonitorWeb/monitorweb/templates/nodescanhistory.kid (.../trunk) (révision 15357)
@@ -62,6 +62,7 @@
<th class="sortable plekit_table">Stat</th>
<th class="sortable plekit_table">kernel</th>
<th class="sortable plekit_table">BootCD</th>
+ <th class="sortable plekit_table">Boot Server</th>
<th class="sortable plekit_table">Last_contact</th>
</tr>
</thead>
@@ -76,6 +77,7 @@
<td py:content="node.node.plc_node_stats['boot_state']">boot</td>
<td nowrap="true" py:content="node.kernel"></td>
<td nowrap="true" py:content="node.node.bootcd_version"></td>
+ <td nowrap="true" py:content="node.node.boot_server"></td>
<td id="node-${node.node.observed_status}" py:content="diff_time(node.node.plc_node_stats['last_contact'])"></td>
</span>
</tr>
Index: web/MonitorWeb/monitorweb/templates/node_template.kid
===================================================================
--- web/MonitorWeb/monitorweb/templates/node_template.kid (.../tags/Monitor-3.0-22) (révision 15357)
+++ web/MonitorWeb/monitorweb/templates/node_template.kid (.../trunk) (révision 15357)
@@ -15,6 +15,7 @@
<th>pcu</th>
<th>kernel</th>
<th>BootCD</th>
+ <th>Boot Server</th>
<th>last_contact</th>
</span>
<span py:if="node is not None">
@@ -41,6 +42,7 @@
</td>
<td nowrap="true" py:content="node.kernel"></td>
<td nowrap="true" py:content="node.node.bootcd_version"></td>
+ <td nowrap="true" py:content="node.node.boot_server"></td>
<td id="node-${node.node.observed_status}" py:content="diff_time(node.node.plc_node_stats['last_contact'])"></td>
</span>
</span>
Index: web/MonitorWeb/monitorweb/templates/nodelist.kid
===================================================================
--- web/MonitorWeb/monitorweb/templates/nodelist.kid (.../tags/Monitor-3.0-22) (révision 15357)
+++ web/MonitorWeb/monitorweb/templates/nodelist.kid (.../trunk) (révision 15357)
@@ -17,8 +17,8 @@
</script>
<center>
- <b py:content="'BOOT: %d' % len([agg for agg in query if agg.node.observed_status == 'BOOT'])"></b> |
- <b py:content="'DOWN: %d' % len([agg for agg in query if agg.node.observed_status == 'DOWN'])"></b><br/>
+ <b py:content="'UP: %d' % len([agg for agg in query if agg.node.status in ('online', 'good')])"></b> |
+ <b py:content="'DOWN: %d' % len([agg for agg in query if agg.node.status not in ('online', 'good')])"></b><br/>
</center>
<table id="nodelist" cellpadding="0" border="0" class="plekit_table sortable-onload-2 colstyle-alt no-arrow paginationcallback-nodelist_paginator max-pages-10 paginate-25">
@@ -58,6 +58,7 @@
<th class="sortable plekit_table">pcu</th>
<th class="sortable plekit_table">kernel</th>
<th class="sortable plekit_table">BootCD</th>
+ <th class="sortable plekit_table">Boot Server</th>
<th class="sortable-sortLastContact plekit_table">Last_contact</th>
</tr>
</thead>
Index: web/MonitorWeb/monitorweb/templates/nodefast.kid
===================================================================
--- web/MonitorWeb/monitorweb/templates/nodefast.kid (.../tags/Monitor-3.0-22) (révision 15357)
+++ web/MonitorWeb/monitorweb/templates/nodefast.kid (.../trunk) (révision 15357)
@@ -17,16 +17,16 @@
</script>
<center>
- <b py:content="'BOOT: %d' % len([agg for agg in query if agg.node.status == 'good'])"></b> |
- <b py:content="'DOWN: %d' % len([agg for agg in query if agg.node.status == 'down'])"></b><br/>
+ <b py:content="'UP: %d' % len([agg for agg in query if agg.node.status in ('online', 'good')])"></b> |
+ <b py:content="'DOWN: %d' % len([agg for agg in query if agg.node.status not in ('online', 'good')])"></b><br/>
</center>
-<table id="nodelist" cellpadding="0" border="0" class="plekit_table sortable-onload-2 colstyle-alt no-arrow paginationcallback-nodelist_paginator max-pages-10 paginate-25">
+<table id="nodelist" cellpadding="0" border="0" class="plekit_table sortable-onload-2 colstyle-alt no-arrow paginationcallback-nodelist_paginator max-pages-10 paginate-999">
<thead>
<tr class='pagesize_area'><td class='pagesize_area' colspan='10'>
<form class='pagesize' action='satisfy_xhtml_validator'><fieldset>
- <input class='pagesize_input' type='text' id="nodelist_pagesize" value='25'
+ <input class='pagesize_input' type='text' id="nodelist_pagesize" value='999'
onkeyup='plekit_pagesize_set("nodelist","nodelist_pagesize", 25);'
size='3' maxlength='3' />
<label class='pagesize_label'> items/page </label>
Index: web/MonitorWeb/dev.cfg
===================================================================
--- web/MonitorWeb/dev.cfg (.../tags/Monitor-3.0-22) (révision 15357)
+++ web/MonitorWeb/dev.cfg (.../trunk) (révision 15357)
@@ -31,7 +31,7 @@
autoreload.package="monitorweb"
-server.socket_host="monitor.planet-lab.org"
+server.socket_host="www.planet-lab.eu"
server.socket_port=8082
#server.webpath="/monitor/"
#base_url_filter.on = False
Index: upgrade/monitor-server-3.0-22.sql
===================================================================
--- upgrade/monitor-server-3.0-22.sql (.../tags/Monitor-3.0-22) (révision 0)
+++ upgrade/monitor-server-3.0-22.sql (.../trunk) (révision 15357)
@@ -0,0 +1,5 @@
+-- If there's an existing database, these commands will upgrade it to the
+-- current version
+
+ALTER TABLE findbadnoderecord ADD COLUMN boot_server varchar DEFAULT NULL;
+ALTER TABLE findbadnoderecord_history ADD COLUMN boot_server varchar DEFAULT NULL;
Index: Monitor.spec
===================================================================
--- Monitor.spec (.../tags/Monitor-3.0-22) (révision 15357)
+++ Monitor.spec (.../trunk) (révision 15357)
@@ -129,8 +129,8 @@
%install
rm -rf $RPM_BUILD_ROOT
#################### CLIENT
-install -D -m 755 monitor-client.init $RPM_BUILD_ROOT/%{_initrddir}/monitor
-install -D -m 644 monitor.cron $RPM_BUILD_ROOT/%{_sysconfdir}/cron.d/monitor
+#install -D -m 755 monitor-client.init $RPM_BUILD_ROOT/%{_initrddir}/monitor
+#install -D -m 644 monitor.cron $RPM_BUILD_ROOT/%{_sysconfdir}/cron.d/monitor
install -D -m 755 timeout.pl $RPM_BUILD_ROOT/usr/bin/timeout.pl
@@ -208,8 +208,8 @@
%files client
%defattr(-,root,root)
-%{_initrddir}/monitor
-%{_sysconfdir}/cron.d/monitor
+#%{_initrddir}/monitor
+#%{_sysconfdir}/cron.d/monitor
/usr/bin/timeout.pl
%files pcucontrol
Index: monitor/wrapper/plc.py
===================================================================
--- monitor/wrapper/plc.py (.../tags/Monitor-3.0-22) (révision 15357)
+++ monitor/wrapper/plc.py (.../trunk) (révision 15357)
@@ -14,6 +14,11 @@
import traceback
from monitor import database
+# note: this needs to be consistent with the value in PLEWWW/planetlab/includes/plc_functions.php
+PENDING_CONSORTIUM_ID = 0
+# not used in monitor
+#APPROVED_CONSORTIUM_ID = 999999
+
try:
from monitor import config
debug = config.debug
@@ -116,12 +121,12 @@
except:
print "Call %s FAILED: Using old cached data" % cachename
load_old_cache = True
-
+
if load_old_cache:
values = database.dbLoad(cachename)
else:
database.dbDump(cachename, values)
-
+
return values
else:
values = database.dbLoad(cachename)
@@ -324,6 +329,22 @@
#'last_updated', 'peer_node_id', 'ssh_rsa_key' ])
return nodes
+
+# Check if the site is a pending site that needs to be approved.
+def isPendingSite(loginbase):
+ api = xmlrpclib.Server(auth.server, verbose=False)
+ try:
+ site = api.GetSites(auth.auth, loginbase)[0]
+ except Exception, exc:
+ login.info("ERROR: No site %s" % loginbase)
+ return False
+
+ if not site['enabled'] and site['ext_consortium_id'] == PENDING_CONSORTIUM_ID:
+ return True
+
+ return False
+
+
'''
Sets boot state of a node.
'''
@@ -400,6 +421,7 @@
def enableSlices(nodename):
api = xmlrpclib.Server(auth.server, verbose=False, allow_none=True)
+
for slice in slices(siteId(nodename)):
logger.info("Enabling slices %s" % slice)
try:
@@ -417,6 +439,7 @@
logger.info("enableSlices: %s" % exc)
print "exception: %s" % exc
+
#I'm commenting this because this really should be a manual process.
#'''
#Enable suspended site slices.
@@ -428,6 +451,12 @@
# api.SliceAttributeAdd(auth.auth, slice, "plc_slice_state", {"state" : "suspended"})
#
def enableSiteSliceCreation(loginbase):
+ if isPendingSite(loginbase):
+ msg = "INFO: enableSiteSliceCreation: Pending Site (%s)" % loginbase
+ print msg
+ logger.info(msg)
+ return
+
api = xmlrpclib.Server(auth.server, verbose=False, allow_none=True)
try:
logger.info("Enabling slice creation for site %s" % loginbase)
@@ -442,10 +471,7 @@
api = xmlrpclib.Server(auth.server, verbose=False, allow_none=True)
try:
loginbase = siteId(nodename)
- logger.info("Enabling slice creation for site %s" % loginbase)
- if not debug:
- logger.info("\tcalling UpdateSite(%s, enabled=True)" % loginbase)
- api.UpdateSite(auth.auth, loginbase, {'enabled': True})
+ enableSiteSliceCreation(loginbase)
except Exception, exc:
print "ERROR: enableSliceCreation: %s" % exc
logger.info("ERROR: enableSliceCreation: %s" % exc)
@@ -453,13 +479,20 @@
'''
Removes site's ability to create slices. Returns previous max_slices
'''
-def removeSiteSliceCreation(sitename):
- print "removeSiteSliceCreation(%s)" % sitename
+def removeSiteSliceCreation(loginbase):
+ print "removeSiteSliceCreation(%s)" % loginbase
+
+ if isPendingSite(loginbase):
+ msg = "INFO: removeSiteSliceCreation: Pending Site (%s)" % loginbase
+ print msg
+ logger.info(msg)
+ return
+
api = xmlrpclib.Server(auth.server, verbose=False)
try:
- logger.info("Removing slice creation for site %s" % sitename)
+ logger.info("Removing slice creation for site %s" % loginbase)
if not debug:
- api.UpdateSite(auth.auth, sitename, {'enabled': False})
+ api.UpdateSite(auth.auth, loginbase, {'enabled': False})
except Exception, exc:
logger.info("removeSiteSliceCreation: %s" % exc)
@@ -471,12 +504,7 @@
api = xmlrpclib.Server(auth.server, verbose=False)
try:
loginbase = siteId(nodename)
- #numslices = api.GetSites(auth.auth, {"login_base": loginbase},
- # ["max_slices"])[0]['max_slices']
- logger.info("Removing slice creation for site %s" % loginbase)
- if not debug:
- #api.UpdateSite(auth.auth, loginbase, {'max_slices': 0})
- api.UpdateSite(auth.auth, loginbase, {'enabled': False})
+ removeSiteSliceCreation(loginbase)
except Exception, exc:
logger.info("removeSliceCreation: %s" % exc)
Index: monitor/database/info/findbad.py
===================================================================
--- monitor/database/info/findbad.py (.../tags/Monitor-3.0-22) (révision 15357)
+++ monitor/database/info/findbad.py (.../trunk) (révision 15357)
@@ -38,6 +38,7 @@
# INTERNAL
kernel_version = Field(String,default=None)
bootcd_version = Field(String,default=None)
+ boot_server = Field(String,default=None)
nm_status = Field(String,default=None)
fs_status = Field(String,default=None)
iptables_status = Field(String,default=None)
Index: monitor/scanapi.py
===================================================================
--- monitor/scanapi.py (.../tags/Monitor-3.0-22) (révision 15357)
+++ monitor/scanapi.py (.../trunk) (révision 15357)
@@ -237,6 +237,7 @@
echo ' "kernel_version":"'`uname -a`'",'
echo ' "bmlog":"'`ls /tmp/bm.log`'",'
echo ' "bootcd_version":"'`cat /mnt/cdrom/bootme/ID`'",'
+ echo ' "boot_server":"'`cat /mnt/cdrom/bootme/BOOTSERVER`'",'
echo ' "nm_status":"'`ps ax | grep nm.py | grep -v grep`'",'
echo ' "dns_status":"'`host boot.planet-lab.org 2>&1`'",'
echo ' "iptables_status":"'`iptables -t mangle -nL | awk '$1~/^[A-Z]+$/ {modules[$1]=1;}END{for (k in modules) {if (k) printf "%s ",k;}}'`'",'
@@ -260,6 +261,7 @@
break
else:
values.update({'kernel_version': "", 'bmlog' : "", 'bootcd_version' : '',
+ 'boot_server' : '',
'nm_status' : '',
'fs_status' : '',
'uptime' : '',
Index: automate-default.sh
===================================================================
--- automate-default.sh (.../tags/Monitor-3.0-22) (révision 15357)
+++ automate-default.sh (.../trunk) (révision 15357)
@@ -56,6 +56,7 @@
# if no agent is running, set it up.
ssh-agent > ${MONITOR_SCRIPT_ROOT}/agent.sh
source ${MONITOR_SCRIPT_ROOT}/agent.sh
+ ssh-add /etc/planetlab/myops_ssh_key.rsa
ssh-add /etc/planetlab/debug_ssh_key.rsa
ssh-add /etc/planetlab/root_ssh_key.rsa
fi
Thierry Parmentelat [Fri, 16 Oct 2009 17:15:35 +0000 (17:15 +0000)]
commented out unused value
Barış Metin [Fri, 16 Oct 2009 10:33:10 +0000 (10:33 +0000)]
use ext_consortium_id to distinguish pending sites.
Stephen Soltesz [Thu, 15 Oct 2009 18:34:18 +0000 (18:34 +0000)]
remove monitor-client.cron
update spec file accordingly
Stephen Soltesz [Thu, 15 Oct 2009 18:31:36 +0000 (18:31 +0000)]
remove unused monitor-client init script
update spec file accordingly