Linux中的进程调度(二)

近期主要研究linux kernel中的进程调度算法。在2.4及2.4以前的版本中，基本上都是根据优先级来选择下一个被调入的进程。算法时间复杂度为O(n)(因为要遍历所有进程，来决定哪个被选中)。当进程数比较多时，些操作费时较大。在2.6版本刚推出时，采用著名的O(1)调度算法，每个优先级有一个单独的可运行队列，此队列又分为active和expired两个，通过指针的交换等巧妙操作，使选择下一个进程操作的时间复杂度降到了常数时间，O(1)调度算法因此得名。在2.6.23版本的内核中，又推出了CFS(complete fair scheduler)，抛弃了以前优先级和时间片是固定的简单映射方式，采用虚拟运行时间，同时用红黑树维护可运行队列。下面来具体分析其实现(代码以2.6.24为准)。先来看新创建一个进程时的动作。在do_fork()函数后，根据clone_flags复制完父进程相关资源后，将会执行wake_up_new_task(p,clone_flags),将新创建的进程加入可执行队列（这里是红黑树),然后重新进行一次调度。相关代码如下

if (!(clone_flags & CLONE_STOPPED))
wake_up_new_task(p, clone_flags);

我们顺着调用路线往下走，进入到wake_up_new_task中去。

/*
* wake_up_new_task - wake up a newly created task for the first time.
*
* This function will do some initial scheduler statistics housekeeping
* that must be done for every newly created context, then puts the task
* on the runqueue and wakes it.
*/
void fastcall wake_up_new_task(struct task_struct *p, unsigned long clone_flags)
{
unsigned long flags;
struct rq *rq;

rq = task_rq_lock(p, &flags);
BUG_ON(p->state != TASK_RUNNING);
update_rq_clock(rq);

p->prio = effective_prio(p);

if (!p->sched_class->task_new || !current->se.on_rq) {
activate_task(rq, p, 0);
} else {
/*
* Let the scheduling class do new task startup
* management (if any):
*/
p->sched_class->task_new(rq, p);
inc_nr_running(p, rq);
}
check_preempt_curr(rq, p);
task_rq_unlock(rq, &flags);
}

首先，将进程p所在的CPU的可执行队列加锁（自旋锁)，然后调用update_rq_clock(rq)，更改一些此队列的统计信息（这些信息不是针对某个进程，而是针对这整个队列），然后，通过effective_prio(p)算出进程的优先级（虽然在CFS调度器中优先级不直接与时间片进行映射，但是还是会作为权重来区分进程的重要性),此函数如下

/*
* Calculate the current priority, i.e. the priority
* taken into account by the scheduler. This value might
* be boosted by RT tasks, or might be boosted by
* interactivity modifiers. Will be RT if the task got
* RT-boosted. If not then it returns p->normal_prio.
*/
static int effective_prio(struct task_struct *p)
{
p->normal_prio = normal_prio(p);
/*
* If we are RT tasks or we were boosted to RT priority,
* keep the priority unchanged. Otherwise, update priority
* to the normal priority:
*/
if (!rt_prio(p->prio))
return p->normal_prio;
return p->prio;
}

可见，先通过normal_prio(p)得到p的normal_prio，进入此函数

/*
* Calculate the expected normal priority: i.e. priority
* without taking RT-inheritance into account. Might be
* boosted by interactivity modifiers. Changes upon fork,
* setprio syscalls, and whenever the interactivity
* estimator recalculates.
*/
static inline int normal_prio(struct task_struct *p)
{
int prio;

if (task_has_rt_policy(p))
prio = MAX_RT_PRIO-1 - p->rt_priority;
else
prio = __normal_prio(p);
return prio;
}

这里需要区分实时进程与非实时进程，task_has_rt_policy()代码如下

static inline int rt_policy(int policy)
{
if (unlikely(policy == SCHED_FIFO) || unlikely(policy == SCHED_RR))
return 1;
return 0;
}

static inline int task_has_rt_policy(struct task_struct *p)
{
return rt_policy(p->policy);
}

如果进程所属的调度策略是SCHED_FIFO或者SCHED_RR(这两种实时进程的区别最大之处就是有没有时间片的概念,RR=round robin)，这里用了unlikely宏，将此宏展开后，是gcc编译器专门的一个优化，也就是说，如果if条件很大概率下为真，那么这部分代码在编译时会放到较前的位置。可见，kernel的设计者考虑的何等细致。回到刚才的normal_prio，如果通过task_has_rt_policy()发现进程是一个实时进程，那么，返回MAX_RT_PRIO-1-p->rt_priority，（对于实时进程，rt_priority值越大，优先级越高) 此值将会赋给p->normal_prio（明明是实时优先级，为什么还要赋给normal_prio呢？先把网上找到的一段话放到这里，以后慢慢研究） prio和normal_prio为动态优先级，static_prio为静态优先级。static_prio是进程创建时分配的优先级，如果不人为的更改，那么在这个进程运行期间不会发生变化。 normal_prio是基于static_prio和调度策略计算出的优先级。prio是调度器类考虑的优先级，某些情况下需要暂时提高进程的优先级 (实时互斥量)，因此有此变量，对于优先级未动态提高的进程来说这三个值是相等的。以上三个优先级值越小，代表进程的优先级有高。一般情况下子进程的静态优先级继承自父进程，子进程的prio继承自父进程的normal_prio。如果发现进程是一个非实时进程，那么，返回__normal_prio(),这个函数的代码如下

/*
* __normal_prio - return the priority that is based on the static prio
*/
static inline int __normal_prio(struct task_struct *p)
{
return p->static_prio;
}

也就是说，对于非实时进程，返回的就是它的静态优先级好了，绕了这么远分析返回值是什么，再回过头去看返回值给了谁

p->prio = effective_prio(p);

看来，进程调度时，是以prio为准进行调度的。今天先分析到这，将优先级确定后的动作下回再分析～