2-20-08 Use this to capture the rates from the individual slices. while true; do /sbin/tc -s class show dev eth0 | grep -A 3 11f8 | grep -A 1 "class htb" | grep Sent | cut -d " " -f 3; sleep 1; done > GRD_pl_drl_34 2-15-08 Damn sysnet33 db won't come up. says /tmp is full in the chroot myplc/plc/root/var/log/pglog, but it's not full. So I'm trying to replace the loopback filesystem with something clean from the RPM. using cpio --no-absolute-filenames And rpm2cpio | cpio -diu but u is don't prompt. so I'll issue rpm2cpio | cpio -di --no-absolute_filenames OK, did this, then I copied over root.img to where I have it installed. And it all worked! Sweet. The image is mostly independent of the data! ADDING A PL NODE 1.) turn off crond for restarting ulogd in pl_netflow:/etc/crond.d/netflow edited the line that calls init.d/netflow restart because I don't want it to keep restarting ulogd as I test it. 2.) Allow pl_netflow to run tc I am also changing the pl_netflow.conf in /etc/vservers/pl_netflow.conf So that S_CAPS="CAP_NET_ADMIN" Then, from the root slice, vserver pl_netflow restart. 3.) copy over the ./start script from the pl_netflow /root directory. 4.) put bwlimit_drl.py into /usr/share/util-vserver/bwlimit2.py on the root slice put bwlimit_drl.py into /usr/share/util-vserver/bwlimit_drl.py on the pl_netflow slice. Future, make the names consistent 5.) pull the sch_netem.ko over and insmod it. cd /lib/modules/2.6.12-1.1398_FC4.5.planetlab/kernel/net/sched/ [root@sysnet32 sched]# scp -i ~/.ssh/root_ssh_key.rsa root@sysnet34:/lib/modules/2.6.12-1.1398_FC4.5.planetlab/kernel/net/sched/sch_netem_plab.ko . insmod sch_netem_plab.ko 6.) [root@vserver:pl_netflow ~]/etc/init.d/ulogd stop 7.) vserver pl_netflow enter 8.) copy over ulogd. run ./start 9.) Edit the ulogd configuration script. 2-12-08 bwlimit.py cannot look up the slice names in the pl_netflow slice b/c there are not entries in the /etc/passwd file for the slices. Of course there aren't. Even ulogd_NETFLOW.c replaces /etc/passwd with /dev/null. so in the short term I will copy over the roots /etc/passwd and place it where the new bwlimit_drl exists. /usr/share/util-vserver/passwd_drl I can have the root periodically copy this file into the pl_netflow slice in a cron job, or whenever a new slice is created. This is a hack. There are probably better ways, but it's a moving target. Now hack bwlimit_drl to use that file to look up slice contexts. HTB DOCS http://luxik.cdi.cz/~devik/qos/htb/manual/userg.htm 2-10-08 We probably have to put the netem at the bottom of the tbf. If we put it in the middle, say off of the 1:10 class, then we need to direct all traffic that would have been directed to a lower class in the htb to the 1:10 class. Then the traffic has to be directed to the lower classes and be accounted for directly. This screws with how traffic classification is done; I don't even know if it will obey the tb appropriately either, since the best you can do is put another htb qdisc below the netem class. And there is no way to directly have that qdisc borrow from the "parent" one that's just been terminated by the netem qdisc. Got that? 2-7-08 Screwing around with all the kernel configs has forced the build process to rebuild all the kernels. blech. BUILDING NETEM for plab machines: Kevin made an ubunto chroot install on his linux machine and built 3.3 gcc in it, and then copied over the source linux tree that had the .config stolen from sysnet34 /proc/config.gz. We also had to change the magic string so that it believed it was bulit to fc 4.5, not 4.7. Use insmod /lib/modules/2.6.12-1.1398_FC4.5.planetlab/kernel/net/sched/sch_netem_plab.ko see sysnet35:/root/myplc/build/SOURCES/CRAP/linux-2.6.12/.config We added a node at the root qdisc and it succesfully appears to delay packets. So now let's figure out how to add it to the bwlimit.py code so that it is always present and can be tweaked (replaced) when running ulogd. ---- tc qdisc del dev eth0 root handle 1: tc qdisc add dev eth0 parent 1: handle 2: htb default 1fff And everything then falls out of 2: But this will netem everything out of hte host, leaving nothing for "exempt" slices like the root. Not what we want. We place a netem qdisc at the leaf of every non-exempt slice traffic class in the HTB. 2-5-08 in pl_netflow vserver /etc/cron.d/netflow keeps restarting netflow init.d script which keeps restarting ulogd if started by hand. Fucking buggy. in root vserver see pl_mom cron job which runs bwmon and pl_mop. pl_mop makes sure pl_netflow is running. !!!!!!!!!! in sysnet34 pl_netflow:/etc/crond.d/netflow edited the line that calls init.d/netflow restart because I don't want it to keep restarting ulogd as I test it. !!!!! I am also changing the pl_netflow.conf in /etc/vservers/pl_netflow.conf So that S_CAPS="CAP_NET_ADMIN" Then, from the root slice, vserver pl_netflow restart. then parent ceil, looks like no borrowing occurs unless the cburst is very low. if you set cburst to 0, then tc sets cburst to something that allows the class to reach its ceiling. HOWTO: Set a ceiling on a particular slice. /usr/share/util-vaserver/bwlimit.py on pl_drl 1 8bit 500kbit 8bit 1gib The first arg is share. Works in the way you expect. The next is min guaranteed. The next is max or ceiling. This is the one to set. The next two are min/ceiling for special destinations. HOWTO: Set a ceiling on all slices leaving a node. 1.) We can change the ceiling for the 1:10 subclass. this should limit all traffic for all slices to non-exempted destinations /sbin/tc class replace dev eth0 parent 1:1 classid 1:10 htb rate 8bit ceil 500kbit This does not work. 2.) We can change the ceiling for the 1:1 root class. /sbin/tc class replace dev eth0 parent 1: classid 1:1 htb rate 500kbit modified this to /sbin/tc class replace dev eth0 parent 1: classid 1:1 htb rate 8bit ceil 500kbit With no effect. Mucking with classid 1:1 seems to make the machine very unhappy. bwlimit seems to set the bottom classes ceilings to the ceiling of the 1:10 block. Then why have the 1:10 block? Do 1&2, and neither have any effect. It looks like only the bottom most class has any affect. If I re-init the bwlimit.py script, then a rule parent 1:10 leaf 1:1fff seems to get all the traffic. And it is the limit that seems to matter. I see. 1fff is the deafult_xid. So when something has not been caught earlier this is where it goes to be limited and caught. tcp netperf tests for rate limiting from a slice. Experiment with 34->35. Logged in to the pl_drl slice to issue the netperf. vserver pl_drl status vserver pl_drl_two status vservers aren't running until you log in to them. so the above command won't return something useful until that point. 12-18-08 /etc/init.d/ncsd restart or reload (caches ldap information) this is for the sysnet machines. The files that you care about are in the release. they are /usr/share/util-vaserver/bwlimit.py This is what makes the calls to TC to do the traffic control. OK: Logging in to the drl slice, and setting up the netperf server/client. ssh -i ~/.ssh/myplc_admin.rsa -l pl_drl sysnet34.ucsd.edu So I'm sending netperf outbound flows from the pl_drl slice and it is getting recorded in the class htb 1:11f8 (which is context 504). When you do a vserver list, though, it says there's only a 503 (though when you log in to 503, it says the "security context" is 504). 1111 /usr/share/util-vserver/bwlimit.py -d eth0 -v init 1115 /usr/share/util-vserver/bwlimit.py -d eth0 on pl_drl 1 1kbit 10kbit 1kbit 10kbit 11-14-07 planetlab website login kyocum@cs.ucsd.edu ate smurfs as well sysnet35 kyocum - big Dog - and killing name root symlink cvs source code to a ulogd/drl directory. for file in '*.h'; do echo $file; rm /root/myplc/ulogd/drl/$file ; done for file in '*.c'; do echo $file; ln -s /root/ratelimiting/myplc/$file /root/myplc/ulogd/drl/. ; done 11-11-07 Brought back 32 and 33 from power offs. 32 is mounting everything read only. 34 is working fine, but it was never powered off. 10-29-07 Back from NSDI, SOSP, Texas, and firestorm. Will modify ulogd to perform DRL. Will first use GRD, as we can drop based on simple probabilities. Though we should be able to see all the packets that are coming in to the machine using ulogd. 9-16-07 vserver pl_netflow enter scp root@sysnet35:~/myplc/build/RPMS/i386/ulogd-1.02-11.i386.rpm . Where does the output go? 9-14-07 #2 Now let's try to install the ulog srpm that was created on one of the myplc nodes. sysnet32,34: myplc nodes sysnet35: build box sysnet33: myplc controller On 34, entering vserver for ulogd, downloading srpm, and trying to install. /etc/rc.d/init.d/ulogd stop rpm -e --nodeps ulogd rpm -Uvh ulogd-1.02-11.i386.rpm /etc/rc.d/init.d/ulogd start 9-14-07 Have installed FC4 on sysnet35. Am following Marc's instructions in his email.Check out rc1 build and rc1 ulogd. OK, I'm trying to build things using marc's instructions and not the entire myplc build envi ronment that is in the GUIDE. I'll try that next. Ok, I've set it all up. I'm using mirrors.kernel.org and fedora/core/4/os -- Ok, now I'm about to cd into build and make ulogd . . . q xPWD=$PWD cvs -d :pserver:anonymous@cvs.planet-lab.org:/cvs co -r planetlab-4_1-rc1 build cvs -d :pserver:anonymous@cvs.planet-lab.org:/cvs co -r planetlab-4_1-rc1 ulogd cd build mkdir SPEC SOURCES cd SPEC # note that you will need these symlinks to use the full paths --- NOT relative paths # which is why I am using $xPWD ln -fs $xPWD/ulogd/ulogd.spec ulogd.spec cd ../SOURCES ln -fs $xPWD/ulogd ulogd cd .. make ulogd Assuming the dependencies are set up right, this should first suck down the kernel and mysql modules via CVS, build them (which may take some time), and then build ulogd. Success!!!!! Now, check out ratelimiting but add root to cvs group number 30008 groupadd -g 30008 cvs (add group def) 8-17-07 Made two slices. pl_drl_one and pl_drl_two to log in to the slices I uploaded a key, myplc_admin.{rsa,pub} sake:~/.ssh grant$ ssh -i myplc_admin.rsa -l pl_drl_one sysnet34 We want to modify ULOGD can't run it. root@sysnet34 sbin]# ./ulogd --help ./ulogd: error while loading shared libraries: libproper.so.0: cannot open shared object file: No such file or directory trying chroot to the netflow slice first. That seems to do the trick. bash-3.00# ./ulogd Fri Aug 17 20:05:08 2007 <3> ulogd.c:300 registering interpreter `raw' Fri Aug 17 20:05:08 2007 <3> ulogd.c:300 registering interpreter `oob' Fri Aug 17 20:05:08 2007 <3> ulogd.c:300 registering interpreter `ip' Fri Aug 17 20:05:08 2007 <3> ulogd.c:300 registering interpreter `tcp' Fri Aug 17 20:05:08 2007 <3> ulogd.c:300 registering interpreter `icmp' Fri Aug 17 20:05:08 2007 <3> ulogd.c:300 registering interpreter `udp' Fri Aug 17 20:05:08 2007 <3> ulogd.c:300 registering interpreter `ahesp' Fri Aug 17 20:05:08 2007 <3> ulogd.c:300 registering interpreter `gre' Fri Aug 17 20:05:08 2007 <5> ulogd.c:355 registering output `netflow' ERROR: Unable to create netlink socket: Bad file descriptor 8-21-07 get yum to work on sysnet34. in /etc/yum.conf, changed reposdir=/etc/yum.repos.d/ http://onelab-build.inria.fr/websvn/filedetails.php?repname=OneLab&path=%2Fnew_plc_www%2Ftrunk%2FPlanetLabConf%2Fyum.conf.php has a crazy yum.conf file. yum install cvs yum install gcc [root@sysnet34 2.6.12-1.1398_FC4.5.planetlab]# cd /usr/src/ [root@sysnet34 src]# mkdir kernels Install the kernel source the lib/modules directory symlinks to /usr/src/kernels/ where we place 2.6.12-1.1398_FC4.5.planetlab Checkout kernel: cvs co -r planetlab-4_1-rc1 linux-2.6 placed in /usr/src/kernels/ Building PLAB kernel: fuck yum, no ncurses devel, no yum supository. wget http://coblitz.planet-lab.org/pub/fedora/linux/core/updates/4/x86_64/ncurses-devel-5.4-19.fc4.i386.rpm rpm -i ncurses---- OK, now the stupid make menuconfig worked, but we're going to copy a kernel config from /proc/config.gz zcat config.gz > /root/kernels/linux-2.6/.config make O=/root/kernels/linux-2.6/build/ oldconfig So none of that worked, copy config to .config in top level directory. then run menuconfig, then save config, then compile. that works. Yay! ./build/include/linux/version.h so config that shit. Configuring ulogd ./configure --with-kernel=/root/kernels/linux-2.6/build/ config success! download RPM and install from location above. mysql-devel-4.1.20-1.FC4.1.x86_64.rpm rpm -i <> Ok, had to re-enable yum.conf to use yum.repos.d had to modify yum.repos.d/fedora-devel so that it wasn't used. then I could yum install openssl-devel and mysql-devel The linker is bitching about mysql objects not being present. trying yum install mysql.i386 already installed trying yum install ulogd-pgsql.i386 yum install ulogd-mysql.i386 yum install mysqlclient10-devel.i386 Ok, all that is shit. Marc sent me a yumrepos.tgz that is on my desktop. I put all the files in the right place on 34. He then sent me a yum install line that is this. yum -y install beecrypt-devel bzip2 coreutils cpio createrepo curl curl-devel cvs db4-devel dev diffutils dnsmasq docbook-utils-pdf dosfstools doxygen expect gcc-c++ gd glibc glibc-common gnupg gperf gzip httpd install iptables less libpcap libpcap-devel libtool linuxdoc-tools mailx make metadata mkisofs mod_python mod_ssl mysql mysql-devel mysql-server nasm ncurses-devel openssh openssl php php-devel php-gd php-pgsql postgresql postgresql-devel postgresql-python postgresql-server python python-devel PyXML readline-devel redhat-rpm-config rpm rpm-build rpm-devel rsync sendmail sendmail-cf sharutils sudo tar tetex-latex time vconfig vixie-cron wget xmlsec1 xmlsec1-openssl yum which should make the box a build box. Of course, what I should really be doing is making a fresh FC4 node as my build environment, but whatever. ------ Ok, trying to figure out if there is a way to "start" the pl_netflow vserver. Don't know if it's currently running. Perhaps it is. vserver-stat shows 4 vservers, besides the root context, 0. #503 is pl_netflow. The other two are the two drl slices. Each vserver has a configuration file shown here: [root@sysnet34 sbin]# more /etc/vservers/pl_netflow.conf To enter a vserver context you can say vserver enter To run ulogd vserver pl_netflow enter cd /usr/sbin ./ulogd The configuration file is in /etc/ulogd.conf /var/log/ulogd.log 8-07 Installing myPLC. Trying to build rate limiting infrastructure on top of it. sysnet33. has only mounted /dev/xen_vg/dom0 on / that is only 2GB. There is another LVM called /dev/xen_vg/lvol0. It has 10GB. We will try to mount that. Display with lvmdisplay The underlying disk partition is /dev/sda3 (physical volume) display pv's with pvdisplay adding line in /etc/fstab to mount lvol0 on /lvol0 This partition has plenty of space. Everything looks good. use linux authconfig to see if it's using ldap for user authentication to disable selinux http://www.crypt.gen.nz/selinux/disable_selinux.html It was already disabled, see /etc/selinux/config INSTALLED in /lvol0 I edited /lvol0/myplc/etc/sysconfig/plc I symlinked /etc/init.d/plc --> installed I symlinked /etc/sysconfig/plc --> installed I edited etc/sysconfig/plc to point to my installation in /lvol0/myplc root@localhost.localdomain made a new account as well. kyocum@cs.ucsd.edu 8smurf2 --- Trying to install the images for the new nodes. They are in /lvol0/myplc/plc/root/data/var/www/html/download on sysnet33, yum install mkisofs ./bootcustom.sh DRL.iso plnode34.txt --- Machines come up in debug mode. They have insufficient resources. Probably disk space. Need to modify the boot script that is sent over from myplc (beckerr), or we can add a "/minhw" to the node type in the controller. The boot program is at: -bash-3.00$ pwd /lvol0/myplc/plc/data/var/www/html/boot/bootmanager.sh nope just contains binary of bootmanager. /lvol0/myplc/plc/root/etc/plc.d/bootmanager is where it's at. nope. qq/usr/share/bootmanager/ Generally, running BootManager by hand is the standard procedure. Set the machine to "Boot" state via your PLC API or web server. When the machine sfails to boot, hit Ctrl-C, login as root/root, cd /tmp/source, and run "./BootManager.py". Figure out where it dies, then start adding prints in the appropriate places to figure out why.