Android CPU使用率:top和dump cpuinfo的不同

CPU是系统非常重要的资源,在Android中,查看CPU使用情况,可以使用top命令和dump cpuinfo。我记得很久以前,就发现这两者存在不同,初步猜测应该是算法上存在差异。最近需要采集应用CPU的使用率,看了一下两种CPU的计算方法。

1、top

top是比较经典的CPU计算方法,top的代码在androidm/system/core/toolbox/top.c下面,输出process的cpu使用率在print_procs里面:

static void print_procs(void) {
...
    for (i = 0; i < num_new_procs; i++) {
        if (new_procs[i]) {
            old_proc = find_old_proc(new_procs[i]->pid, new_procs[i]->tid);
            if (old_proc) {
                new_procs[i]->delta_utime = new_procs[i]->utime - old_proc->utime;
                new_procs[i]->delta_stime = new_procs[i]->stime - old_proc->stime;
            } else {
                new_procs[i]->delta_utime = 0;
                new_procs[i]->delta_stime = 0;
            }
            new_procs[i]->delta_time = new_procs[i]->delta_utime + new_procs[i]->delta_stime;
        }
    }

    total_delta_time = (new_cpu.utime + new_cpu.ntime + new_cpu.stime + new_cpu.itime
                        + new_cpu.iowtime + new_cpu.irqtime + new_cpu.sirqtime)
                     - (old_cpu.utime + old_cpu.ntime + old_cpu.stime + old_cpu.itime
                        + old_cpu.iowtime + old_cpu.irqtime + old_cpu.sirqtime);

 ...

if (!threads) {
            printf("%5d %2d %3" PRIu64 "%% %c %5d %6" PRIu64 "K %6" PRIu64 "K %3s %-8.8s %s
",
                   proc->pid, proc->prs, proc->delta_time * 100 / total_delta_time, proc->state, proc->num_threads,
                   proc->vss / 1024, proc->rss * getpagesize() / 1024, proc->policy, user_str, proc->name[0] != 0 ? proc->name : proc->tname);
        } else {
            printf("%5d %5d %2d %3" PRIu64 "%% %c %6" PRIu64 "K %6" PRIu64 "K %3s %-8.8s %-15s %s
",
                   proc->pid, proc->tid, proc->prs, proc->delta_time * 100 / total_delta_time, proc->state,
                   proc->vss / 1024, proc->rss * getpagesize() / 1024, proc->policy, user_str, proc->tname, proc->name);
        }
...
}

CPU的计算是proc->delta_time * 100 / total_delta_time。

先看total_delta_time由:

    total_delta_time = (new_cpu.utime + new_cpu.ntime + new_cpu.stime + new_cpu.itime
                        + new_cpu.iowtime + new_cpu.irqtime + new_cpu.sirqtime)
                     - (old_cpu.utime + old_cpu.ntime + old_cpu.stime + old_cpu.itime
                        + old_cpu.iowtime + old_cpu.irqtime + old_cpu.sirqtime);

而这些变量的值,是在read_procs通过读取/proc/stat的jiffies得到:

static void read_procs(void) {
...
    proc_dir = opendir("/proc");
    if (!proc_dir) die("Could not open /proc.
");

    new_procs = calloc(INIT_PROCS * (threads ? THREAD_MULT : 1), sizeof(struct proc_info *));
    num_new_procs = INIT_PROCS * (threads ? THREAD_MULT : 1);

    file = fopen("/proc/stat", "r");
    if (!file) die("Could not open /proc/stat.
");
    fscanf(file, "cpu  %lu %lu %lu %lu %lu %lu %lu", &new_cpu.utime, &new_cpu.ntime, &new_cpu.stime,
            &new_cpu.itime, &new_cpu.iowtime, &new_cpu.irqtime, &new_cpu.sirqtime);
    fclose(file);

而proc->delta_time是两次读取/proc/pid/stat相减得到:

static int read_stat(char *filename, struct proc_info *proc) {
...
  /* Scan rest of string. */
    sscanf(close_paren + 1,
           " %c " "%*d %*d %*d %*d %*d %*d %*d %*d %*d %*d "
           "%" SCNu64
           "%" SCNu64 "%*d %*d %*d %*d %*d %*d %*d "
           "%" SCNu64
           "%" SCNu64 "%*d %*d %*d %*d %*d %*d %*d %*d %*d %*d %*d %*d %*d %*d "
           "%d",
           &proc->state,
           &proc->utime,
           &proc->stime,
           &proc->vss,
           &proc->rss,
           &proc->prs);

    return 0;
}

可见,top是一段时间内,计算process的cpu jiffies与总的cpu jiffies差值得到。

2、dump cpuinfo

dump cpuinfo是Android特有的命令(我一直都android的各种dump、trace非常感兴趣,快玩物丧志了。。。)。dump cpuinfo命令的实现在androidm/frameworks/base/core/Java/com/android/internal/os/ProcessCpuTracker.java类里面,方法是printCurrentState:

final public String printCurrentState(long now) {
...
        int N = mWorkingProcs.size();
        for (int i=0; i<N; i++) {
            Stats st = mWorkingProcs.get(i);
            printProcessCPU(pw, st.added ? " +" : (st.removed ? " -": "  "),
                    st.pid, st.name, (int)st.rel_uptime,
                    st.rel_utime, st.rel_stime, 0, 0, 0, st.rel_minfaults, st.rel_majfaults);
            if (!st.removed && st.workingThreads != null) {
                int M = st.workingThreads.size();
                for (int j=0; j<M; j++) {
                    Stats tst = st.workingThreads.get(j);
                    printProcessCPU(pw,
                            tst.added ? "   +" : (tst.removed ? "   -": "    "),
                            tst.pid, tst.name, (int)st.rel_uptime,
                            tst.rel_utime, tst.rel_stime, 0, 0, 0, 0, 0);
                }
            }
        }

...
}

而printProcessCPU输出process CPU的使用情况:

private void printProcessCPU(PrintWriter pw, String prefix, int pid, String label,
            int totalTime, int user, int system, int iowait, int irq, int softIrq,
            int minFaults, int majFaults) {
        pw.print(prefix);
        if (totalTime == 0) totalTime = 1;
        printRatio(pw, user+system+iowait+irq+softIrq, totalTime);
...
}

user+system+iowait+irq+softIrq 比totalTime。 
看下st个变量的赋值,在collectStats里面:

private int[] collectStats(String statsFile, int parentPid, boolean first,
            int[] curPids, ArrayList<Stats> allProcs) {

        int[] pids = Process.getPids(statsFile, curPids);
...
 final long uptime = SystemClock.uptimeMillis();
   final long[] procStats = mProcessStatsData;
                    if (!Process.readProcFile(st.statFile.toString(),
                            PROCESS_STATS_FORMAT, null, procStats, null)) {
                        continue;
                    }

...
                    if (DEBUG) Slog.v("Load", "Stats changed " + st.name + " pid=" + st.pid
                            + " utime=" + utime + "-" + st.base_utime
                            + " stime=" + stime + "-" + st.base_stime
                            + " minfaults=" + minfaults + "-" + st.base_minfaults
                            + " majfaults=" + majfaults + "-" + st.base_majfaults);

                    st.rel_uptime = uptime - st.base_uptime;
                    st.base_uptime = uptime;
                    st.rel_utime = (int)(utime - st.base_utime);
                    st.rel_stime = (int)(stime - st.base_stime);
                    st.base_utime = utime;
                    st.base_stime = stime;
                    st.rel_minfaults = (int)(minfaults - st.base_minfaults);
                    st.rel_majfaults = (int)(majfaults - st.base_majfaults);
                    st.base_minfaults = minfaults;
                    st.base_majfaults = majfaults;
                    st.working = true;
...
}

st.rel_utime 和 st.rel_stime还是通过读/proc/pid/stat相减得到,而st.rel_uptime却是通过 SystemClock.uptimeMillis()差值,并不是跟top一样,通过得到总CPU jiffies。

看到这,也就能明白,top跟dump cpuinfo的区别在于:top分母有的是总测CPU jiffies,而dump cpuinfo是uptime,是时间,而并非jiffies,也能解释为什么top出来的cpu,大部分时间会比dump cpuinfo的原因。

原文地址:https://www.cnblogs.com/mymelon/p/5871002.html