CKRM I/O controller Last updated: Sep 21, 2004 Intro ----- CKRM's I/O scheduler is developed as a delta over a modified version of the Complete Fair Queuing scheduler (CFQ) that implements I/O priorities. The latter's original posting can be found at: http://www.ussg.iu.edu/hypermail/linux/kernel/0311.1/0019.html Please note that this is not the CFQ version currently in the linus kernel (2.6.8.1 at time of writing) which provides equal, not prioritized, bandwidth allocation amongst processes. Since the CFQ in the kernel is likely to eventually move towards I/O priority implementation, CKRM has not renamed the underlying I/O scheduler and simply replaces drivers/block/cfq-iosched.c with the modified version. Installation ------------ 1. Configure "Disk I/O Resource Controller" under CKRM (see Documentation/ckrm/installation) 2. After booting into the new kernel, load ckrm-io # modprobe ckrm-io 3. Verify that reading /rcfs/taskclass/shares displays values for the I/O controller (res=cki). 4. Mount sysfs for monitoring bandwidth received (temporary solution till a userlevel tool is developed) # mount -t sysfs none /sys Usage ----- For brevity, we assume we are in the /rcfs/taskclass directory for all the code snippets below. Initially, the systemwide default class gets 100% of the I/O bandwidth. $ cat stats 20 total ioprio 20 unused/default ioprio The first value is the share of a class, as a parent. The second is the share of its default subclass. Initially the two are equal. As named subclasses get created and assigned shares, the default subclass' share (which equals the "unused" portion of the parent's allocation) dwindles. CFQ assigns one of 20 I/O priorities to all I/O requests. Each priority level gets a fixed proportion of the total bandwidth in increments of 5%. e.g. ioprio=1 gets 5%, ioprio=2 gets 10%..... all the way through ioprio=19 getting 95% ioprio=0 gets bandwidth only if no other priority level submits I/O i.e. it can get starved. ioprio=20 is considered realtime I/O and always gets priority. CKRM's I/O scheduler distributes these 20 priority levels amongst the hierarchy of classes according to the relative share of each class. Thus, root starts out with the total allocation of 20 initially. As children get created and shares assigned to them, root's allocation reduces. At any time, the sum of absolute share values of all classes equals 20. Class creation -------------- $ mkdir a Its initial share is zero. The parent's share values will be unchanged. Note that even classes with zero share get unused bandwidth under CFQ. Setting a new class share ------------------------- $ echo "res=cki,guarantee=20" > /rcfs/taskclass/a/shares Set cki shares to 20 -1 -1 -1 $ echo a/shares res=cki,guarantee=20,limit=100,total_guarantee=100,max_limit=100 The limit and max_limit fields can be ignored as they are not implemented. The absolute share of a is 20% of parent's absolute total (20) and can be seen through $ echo a/stats 4 total ioprio 4 unused/default ioprio Since a gets 4, parent's default's share diminishes accordingly. Thus $ echo stats 20 total ioprio 16 unused/default ioprio Monitoring ---------- Each priority level's request service rate can be viewed through sysfs (mounted during installation). To view the servicing of priority 4's requests, $ while : ; echo /sys/block//queue/iosched/p4 ; sleep 1 ; done rq (10,15) sec (20,30) q (40,50) where rq = cumulative I/O requests received (10) and serviced (15) sec = cumulative sectors requested (20) and served (30) q = cumulative number of times the queue was created(40)/destroyed (50) The rate at which requests or sectors are serviced should differ for different priority levels. The difference in received and serviced values indicates queue depth - with insufficient depth, differentiation between I/O priority levels will not be observed. The rate of q creation is not significant for CKRM. Caveats ------- CFQ's I/O differentiation is still being worked upon so its better to choose widely separated share values to observe differences in delivered I/O bandwidth. CFQ, and consequently CKRM, does not provide limits yet. So it is not possible to completely limit an I/O hog process by putting it in a class with a low I/O share. Only if the competing classes maintain sufficient queue depth (i.e a high I/O issue rate) will they get preferential treatment. However, they may still see latency degradation due to seeks caused by servicing of the low priority class. When limits are implemented, this behaviour will be rectified. Please post questions on the CKRM I/O scheduler on ckrm-tech@lists.sf.net.