问题:目前设备cps 期望是7w结果只有2w不到!!!需要解决
目前看代码没用,需要大概找出出现问题属于哪一类,再来分析!!
1、目前使用1w的cps测试;首先看cpu的利用信息!!
top命令看到cpu sys内核太占用较多;topcpu(s)键盘按1 查看各个cpu使用情况的时候发现不能看,提示terminal is not big enough
CPU的负载有点高,目前设备为48核cpu,按道理 48*0.7 = 34 负载最好,目前平均负载为1了。
看下之前的查看cpu工具文章;
mpstat -P ALL 1 100, 查看每个CPU的使用情况,目前发现CPU-0以及CPU-18 soft 这一栏的值偏高,----->也就是软中断偏高
Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle Average: all 4.87 0.00 74.90 0.00 0.00 1.90 0.00 0.00 0.00 18.33 Average: 0 3.14 0.00 48.00 0.00 0.00 28.86 0.00 0.00 0.00 20.00 Average: 1 3.71 0.00 74.29 0.00 0.00 1.00 0.00 0.00 0.00 21.00 Average: 2 3.71 0.00 72.75 0.00 0.00 1.14 0.00 0.00 0.00 22.40 Average: 3 4.42 0.00 64.62 0.00 0.00 1.14 0.00 0.00 0.00 29.81 Average: 4 3.72 0.00 72.49 0.00 0.00 1.00 0.00 0.00 0.00 22.78 Average: 5 4.14 0.00 73.57 0.00 0.00 1.14 0.00 0.00 0.00 21.14 Average: 6 3.29 0.00 74.25 0.00 0.00 1.00 0.00 0.00 0.00 21.46 Average: 7 3.14 0.00 74.00 0.00 0.00 1.00 0.00 0.00 0.00 21.86 Average: 8 3.42 0.00 74.04 0.00 0.00 1.28 0.00 0.00 0.00 21.26 Average: 9 3.99 0.00 75.18 0.00 0.00 1.28 0.00 0.00 0.00 19.54 Average: 10 3.15 0.00 76.54 0.00 0.00 1.14 0.00 0.00 0.00 19.17 Average: 11 3.00 0.00 74.47 0.00 0.00 1.71 0.00 0.00 0.00 20.83 Average: 12 5.13 0.00 74.50 0.00 0.00 1.71 0.00 0.00 0.00 18.66 Average: 13 3.57 0.00 75.57 0.00 0.00 1.71 0.00 0.00 0.00 19.14 Average: 14 3.86 0.00 74.39 0.00 0.00 1.43 0.00 0.00 0.00 20.31 Average: 15 3.99 0.00 74.32 0.00 0.00 2.14 0.00 0.00 0.00 19.54 Average: 16 4.43 0.00 74.29 0.00 0.00 2.00 0.00 0.00 0.00 19.29 Average: 17 3.14 0.00 74.47 0.00 0.00 2.00 0.00 0.00 0.00 20.40 Average: 18 4.71 0.00 59.43 0.00 0.00 15.43 0.00 0.00 0.00 20.43 Average: 19 3.29 0.00 74.71 0.00 0.00 1.86 0.00 0.00 0.00 20.14 Average: 20 3.00 0.00 74.57 0.00 0.00 1.86 0.00 0.00 0.00 20.57 Average: 21 3.00 0.00 74.29 0.00 0.00 2.29 0.00 0.00 0.00 20.43 Average: 22 3.00 0.00 75.29 0.00 0.00 2.00 0.00 0.00 0.00 19.71 Average: 23 3.28 0.00 75.32 0.00 0.00 2.00 0.00 0.00 0.00 19.40 Average: 24 3.43 0.00 75.54 0.00 0.00 1.14 0.00 0.00 0.00 19.89 Average: 25 3.28 0.00 75.32 0.00 0.00 1.71 0.00 0.00 0.00 19.69 Average: 26 3.14 0.00 74.89 0.00 0.00 1.85 0.00 0.00 0.00 20.11 Average: 27 3.58 0.00 74.68 0.00 0.00 1.43 0.00 0.00 0.00 20.31 Average: 28 3.29 0.00 76.29 0.00 0.00 1.57 0.00 0.00 0.00 18.86 Average: 29 3.43 0.00 76.11 0.00 0.00 1.72 0.00 0.00 0.00 18.74 Average: 30 3.57 0.00 75.86 0.00 0.00 1.57 0.00 0.00 0.00 19.00 Average: 31 3.57 0.00 76.00 0.00 0.00 2.00 0.00 0.00 0.00 18.43 Average: 32 4.42 0.00 78.74 0.00 0.00 0.00 0.00 0.00 0.00 16.83 Average: 33 4.85 0.00 77.18 0.00 0.00 0.00 0.00 0.00 0.00 17.97 Average: 34 4.71 0.00 77.57 0.00 0.00 0.00 0.00 0.00 0.00 17.71 Average: 35 4.57 0.00 78.29 0.00 0.00 0.00 0.00 0.00 0.00 17.14 Average: 36 5.42 0.00 78.46 0.00 0.00 0.00 0.00 0.00 0.00 16.12 Average: 37 5.57 0.00 79.00 0.00 0.00 0.00 0.00 0.00 0.00 15.43 Average: 38 5.71 0.00 77.71 0.00 0.00 0.00 0.00 0.00 0.00 16.57 Average: 39 6.00 0.00 78.43 0.00 0.00 0.00 0.00 0.00 0.00 15.57 Average: 40 6.44 0.00 78.83 0.00 0.00 0.00 0.00 0.00 0.00 14.74 Average: 41 7.12 0.00 79.06 0.00 0.00 0.00 0.00 0.00 0.00 13.82 Average: 42 8.71 0.00 78.71 0.00 0.00 0.00 0.00 0.00 0.00 12.57 Average: 43 9.00 0.00 78.57 0.00 0.00 0.00 0.00 0.00 0.00 12.43 Average: 44 9.71 0.00 78.86 0.00 0.00 0.00 0.00 0.00 0.00 11.43 Average: 45 11.43 0.00 78.29 0.00 0.00 0.00 0.00 0.00 0.00 10.29 Average: 46 12.14 0.00 78.57 0.00 0.00 0.00 0.00 0.00 0.00 9.29 Average: 47 13.43 0.00 78.71 0.00 0.00 0.00 0.00 0.00 0.00 7.86
上面的top 中看到了一个僵尸进程! 真是尼玛!!
使用ps -e -o stat,ppid,pid,cmd|grep -e '^[Zz]' 查看是哪个进程!!
# ps -e -o stat,ppid,pid,cmd|egrep '^[Zz]' Z 9897 36918 [ethtool] <defunct> //大多数linux系统,也会将僵尸进程标识为defunct
使用vmstat 查看一下:
man vmstat 可以看到其使用方法以及字段描述
FIELD DESCRIPTION FOR VM MODE Procs r: The number of runnable processes (running or waiting for run time). b: The number of processes in uninterruptible sleep. Memory swpd: the amount of virtual memory used. free: the amount of idle memory. buff: the amount of memory used as buffers. cache: the amount of memory used as cache. inact: the amount of inactive memory. (-a option) active: the amount of active memory. (-a option) Swap si: Amount of memory swapped in from disk (/s). so: Amount of memory swapped to disk (/s). IO bi: Blocks received from a block device (blocks/s). bo: Blocks sent to a block device (blocks/s). System in: The number of interrupts per second, including the clock. cs: The number of context switches per second. CPU These are percentages of total CPU time. us: Time spent running non-kernel code. (user time, including nice time) sy: Time spent running kernel code. (system time) id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time. wa: Time spent waiting for IO. Prior to Linux 2.5.41, included in idle. st: Time stolen from a virtual machine. Prior to Linux 2.6.11, unknown.
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 49 0 0 124260720 9648 168352 0 0 1 1 36 46 4 63 33 0 32 0 0 124263832 9648 168364 0 0 0 20 199467 389128 5 76 19 0 49 0 0 124263456 9648 168372 0 0 0 4 200207 398071 5 77 19 0 39 0 0 124264864 9664 168360 0 0 0 256 199046 387918 5 75 21 0 37 0 0 124250384 9664 168376 0 0 0 16 201828 399777 5 74 21 0 33 0 0 124258768 9664 168380 0 0 0 8 201014 396513 5 75 20 0 34 0 0 124251792 9664 168384 0 0 0 4 200561 393866 5 76 19 0 46 0 0 124240544 9664 168404 0 0 0 0 198971 360904 5 78 17 0 45 0 0 124260144 9696 168376 0 0 0 212 201883 391899 5 76 19 0 34 0 0 124254576 9696 168408 0 0 0 16 201658 398830 5 76 19 0 32 0 0 124254160 9696 168412 0 0 0 24 200428 399979 5 79 16 0 41 0 0 124257888 9696 168424 0 0 0 4 200593 399821 5 76 19 0 42 0 0 124256712 9696 168424 0 0 0 4 201132 411719 5 77 18 0 36 0 0 124251472 9712 168412 0 0 0 419 200511 391821 5 76 20 0 41 0 0 124254512 9720 168404 0 0 0 28 199560 395600 5 75 20 0 35 0 0 124252152 9720 168452 0 0 0 168 202091 395793 5 76 19 0 48 0 0 124249872 9720 168452 0 0 0 4 200647 389197 5 78 17 0 37 0 0 124270032 9736 168448 0 0 0 1236 201570 392591 5 77 18 0 42 0 0 124269472 9752 168444 0 0 0 108 199694 392387 5 75 20 0
可以看到:
r值 也就是The number of runnable processes (running or waiting for run time). 最大值为48 目前为48核,
in 以及cs 比较高;都是6位数。(in 每秒CPU的中断次数,包括时间中断;cs 每秒上下文切换次数)
IO和CPU一般是反着来的,CPU利用率高则IO不大,IO大则CPU就小。关于IO,主要有三个:
一个是磁盘文件IO,一个是驱动程序的IO(如:网卡),一个是内存换页率。
然后就是 网络了!!
目前看到的是网络流量不大!!sar -n DEV 查看收发包流量不大。
如果CPU不高,IO不高,内存使用不高,网络带宽使用不高。但是系统的性能上不去;此时应该是程序有问题;比如锁的竞争导致休眠调度等
so: 目前看主要是由于内核态的CPU利用比较高, cs 切换有点多
使用pidstat 查看一下:
由于是多线程所以查看时 需要注意参数使用
root@localhost /opt/data/debug_fp # pidstat -t -w -p 9616 Linux 2.6.39-gentoo-r3-wafg2-47214 (localhost) 04/06/21 _x86_64_ (48 CPU) 17:12:43 UID TGID TID cswch/s nvcswch/s Command 17:12:43 0 9616 - 2.03 0.00 wafd 17:12:43 0 - 9616 2.03 0.00 |__wafd 17:12:43 0 - 9617 0.00 0.00 |__wafd 17:12:43 0 - 9618 0.20 0.00 |__wafd 17:12:43 0 - 9639 0.00 0.00 |__wafd 17:12:43 0 - 9680 1.00 0.28 |__wafd 17:12:43 0 - 9681 1.17 0.00 |__wafd 17:12:43 0 - 9682 1.00 0.01 |__wafd 17:12:43 0 - 9683 1.98 0.00 |__wafd 17:12:43 0 - 9684 1.00 0.00 |__wafd 17:12:43 0 - 9685 2193.54 6.95 |__wafd 17:12:43 0 - 9686 3280.16 6.77 |__wafd 17:12:43 0 - 9687 3298.50 6.51 |__wafd 17:12:43 0 - 9688 3668.60 6.86 |__wafd 17:12:43 0 - 9689 3309.01 4.34 |__wafd 17:12:43 0 - 9690 3330.93 4.01 |__wafd 17:12:43 0 - 9691 3353.79 3.43 |__wafd 17:12:43 0 - 9692 3370.80 3.20 |__wafd 17:12:43 0 - 9693 3384.94 3.04 |__wafd 17:12:43 0 - 9694 3403.42 2.95 |__wafd 17:12:43 0 - 9695 3422.77 2.95 |__wafd 17:12:43 0 - 9696 3454.69 2.91 |__wafd 17:12:43 0 - 9697 3195.38 10.86 |__wafd 17:12:43 0 - 9698 3327.54 7.39 |__wafd 17:12:43 0 - 9699 3405.50 6.12 |__wafd 17:12:43 0 - 9700 3449.54 5.21 |__wafd 17:12:43 0 - 9701 3483.75 4.90 |__wafd 17:12:43 0 - 9702 3512.70 4.53 |__wafd 17:12:43 0 - 9703 3609.01 6.73 |__wafd 17:12:43 0 - 9704 3557.42 3.80 |__wafd 17:12:43 0 - 9705 3602.27 3.74 |__wafd 17:12:43 0 - 9706 3626.98 3.87 |__wafd 17:12:43 0 - 9707 3622.26 3.44 |__wafd 17:12:43 0 - 9708 3625.70 3.35 |__wafd 17:12:43 0 - 9709 3678.24 3.02 |__wafd 17:12:43 0 - 9710 3721.95 3.18 |__wafd 17:12:43 0 - 9711 3736.38 3.16 |__wafd 17:12:43 0 - 9712 3747.18 3.08 |__wafd 17:12:43 0 - 9713 3733.81 3.10 |__wafd 17:12:43 0 - 9714 3723.45 3.08 |__wafd 17:12:43 0 - 9715 3710.86 3.13 |__wafd 17:12:43 0 - 9716 3697.94 3.13 |__wafd 17:12:43 0 - 9717 3688.42 2.96 |__wafd 17:12:43 0 - 9718 3655.82 3.02 |__wafd 17:12:43 0 - 9719 3604.03 3.03 |__wafd 17:12:43 0 - 9720 3540.53 3.10 |__wafd 17:12:43 0 - 9722 3473.61 4.80 |__wafd 17:12:43 0 - 9723 3452.84 4.52 |__wafd 17:12:43 0 - 9724 3406.79 4.43 |__wafd 17:12:43 0 - 9725 3333.05 4.42 |__wafd 17:12:43 0 - 9726 3238.23 4.62 |__wafd 17:12:43 0 - 9727 3133.44 4.78 |__wafd 17:12:43 0 - 9728 2933.01 5.44 |__wafd 17:12:43 0 - 9729 2847.38 5.21 |__wafd 17:12:43 0 - 9730 2631.84 5.52 |__wafd 17:12:43 0 - 9731 2404.50 5.78 |__wafd 17:12:43 0 - 9732 2191.32 6.12 |__wafd 17:12:43 0 - 9733 1975.52 6.71 |__wafd 17:12:43 0 - 9734 1.00 0.00 |__wafd 17:12:43 0 - 9735 1.00 0.00 |__wafd 17:12:43 0 - 9736 1.00 0.02 |__wafd 17:12:43 0 - 9830 1.82 0.28 |__wafd 17:12:43 0 - 32529 9.89 0.01 |__wafd 17:12:43 0 - 32530 9.89 0.01 |__wafd root@localhost /opt/data/debug_fp # pidstat -t -p 9616 Linux 2.6.39-gentoo-r3-wafg2-47214 (localhost) 04/06/21 _x86_64_ (48 CPU) 17:13:26 UID TGID TID %usr %system %guest %CPU CPU Command 17:13:26 0 9616 - 176.83 2911.44 0.00 3088.27 1 wafd 17:13:26 0 - 9616 0.00 0.00 0.00 0.01 1 |__wafd 17:13:26 0 - 9617 0.00 0.00 0.00 0.00 1 |__wafd 17:13:26 0 - 9618 0.00 0.00 0.00 0.00 12 |__wafd 17:13:26 0 - 9639 0.00 0.00 0.00 0.00 9 |__wafd 17:13:26 0 - 9680 0.03 0.10 0.00 0.13 17 |__wafd 17:13:26 0 - 9681 0.00 0.01 0.00 0.01 13 |__wafd 17:13:26 0 - 9682 0.06 0.01 0.00 0.07 3 |__wafd 17:13:26 0 - 9683 0.01 0.00 0.00 0.01 0 |__wafd 17:13:26 0 - 9684 0.00 0.00 0.00 0.00 16 |__wafd 17:13:26 0 - 9685 1.99 23.26 0.00 25.25 0 |__wafd 17:13:26 0 - 9686 2.37 58.09 0.00 60.46 1 |__wafd 17:13:26 0 - 9687 2.61 55.47 0.00 58.08 2 |__wafd 17:13:26 0 - 9688 3.18 49.06 0.00 52.24 3 |__wafd 17:13:26 0 - 9689 2.42 58.50 0.00 60.92 4 |__wafd 17:13:26 0 - 9690 2.36 59.61 0.00 61.97 5 |__wafd 17:13:26 0 - 9691 2.37 60.13 0.00 62.49 6 |__wafd 17:13:26 0 - 9692 2.43 60.45 0.00 62.88 7 |__wafd 17:13:26 0 - 9693 2.44 60.68 0.00 63.13 8 |__wafd 17:13:26 0 - 9694 2.46 60.90 0.00 63.37 9 |__wafd 17:13:26 0 - 9695 2.46 61.04 0.00 63.50 10 |__wafd 17:13:26 0 - 9696 2.42 61.40 0.00 63.82 11 |__wafd 17:13:26 0 - 9697 2.41 56.13 0.00 58.54 12 |__wafd 17:13:26 0 - 9698 2.41 57.94 0.00 60.36 13 |__wafd 17:13:26 0 - 9699 2.46 58.85 0.00 61.31 14 |__wafd 17:13:26 0 - 9700 2.48 59.54 0.00 62.03 15 |__wafd 17:13:26 0 - 9701 2.50 60.02 0.00 62.52 16 |__wafd 17:13:26 0 - 9702 2.49 60.40 0.00 62.89 17 |__wafd 17:13:26 0 - 9703 3.22 55.62 0.00 58.84 18 |__wafd 17:13:26 0 - 9704 2.45 60.94 0.00 63.39 19 |__wafd 17:13:26 0 - 9705 2.41 60.96 0.00 63.37 20 |__wafd 17:13:26 0 - 9706 2.39 61.26 0.00 63.65 21 |__wafd 17:13:26 0 - 9707 2.42 61.54 0.00 63.96 22 |__wafd 17:13:26 0 - 9708 2.47 61.84 0.00 64.31 23 |__wafd 17:13:26 0 - 9709 2.75 61.91 0.00 64.66 24 |__wafd 17:13:26 0 - 9710 2.49 62.15 0.00 64.64 25 |__wafd 17:13:26 0 - 9711 2.51 62.39 0.00 64.90 26 |__wafd 17:13:26 0 - 9712 2.55 62.56 0.00 65.11 27 |__wafd 17:13:26 0 - 9713 2.69 62.85 0.00 65.54 28 |__wafd 17:13:26 0 - 9714 2.83 63.00 0.00 65.83 29 |__wafd 17:13:26 0 - 9715 2.93 63.14 0.00 66.07 30 |__wafd 17:13:26 0 - 9716 3.07 63.31 0.00 66.38 31 |__wafd 17:13:26 0 - 9717 3.13 63.64 0.00 66.77 32 |__wafd 17:13:26 0 - 9718 3.31 63.78 0.00 67.09 33 |__wafd 17:13:26 0 - 9719 3.59 63.94 0.00 67.53 34 |__wafd 17:13:26 0 - 9720 3.92 64.10 0.00 68.02 35 |__wafd 17:13:26 0 - 9722 4.18 63.41 0.00 67.59 36 |__wafd 17:13:26 0 - 9723 4.24 63.78 0.00 68.02 37 |__wafd 17:13:26 0 - 9724 4.50 64.01 0.00 68.52 38 |__wafd 17:13:26 0 - 9725 4.86 64.12 0.00 68.98 39 |__wafd 17:13:26 0 - 9726 5.25 64.27 0.00 69.51 40 |__wafd 17:13:26 0 - 9727 5.89 64.23 0.00 70.12 41 |__wafd 17:13:26 0 - 9728 6.96 64.14 0.00 71.10 42 |__wafd 17:13:26 0 - 9729 7.12 64.41 0.00 71.53 43 |__wafd 17:13:26 0 - 9730 7.98 64.45 0.00 72.44 44 |__wafd 17:13:26 0 - 9731 8.96 64.51 0.00 73.47 45 |__wafd 17:13:26 0 - 9732 10.01 64.65 0.00 74.66 46 |__wafd 17:13:26 0 - 9733 11.19 64.83 0.00 76.02 47 |__wafd 17:13:26 0 - 9734 0.00 0.00 0.00 0.00 1 |__wafd 17:13:26 0 - 9735 0.00 0.00 0.00 0.00 35 |__wafd 17:13:26 0 - 9736 0.00 0.09 0.00 0.09 19 |__wafd 17:13:26 0 - 9830 0.14 0.01 0.00 0.15 0 |__wafd 17:13:26 0 - 32529 0.01 0.02 0.00 0.03 0 |__wafd 17:13:26 0 - 32530 0.01 0.02 0.00 0.03 16 |__wafd
so: 部分线程Cswch/s:每秒主动任务上下文切换数量偏高,内核的cpu较高, 由引擎进程触发
使用perf 工具看下进程的相关信息
毕竟cpu io密集型和cpu并行计算导致cpu高处理方法不一样
perf stat命令用来显示程序运行的整体状况;perf record命令则用来记录指定事件在程序运行过程中的信息;perf report命令则用来报告基于前面record命令记录的事件信息生成的程序运行状况报
perf Top:实时显示当前系统的性能统计信息
perf top -a -g perf top -g -p $pid perf record -F 99 -a -g perf record -F 99 -a -p $pid
perf trace类似于strace
功能
perf stat结果:
IPC:是 Instructions/Cycles 的比值,该值越大越好,说明程序充分利用了处理器的特性。?? google 了一下 没看到解释-----
perf top -p $pid 结果:
perf record -F 99 -a -p 9616 结果: perf record 记录单个函数级别的统计信息,并使用 perf report 来显示统计结果
目前看: 线程存在 频繁上下文切换,内核太利用率高,
内核太高: 系统调用以及内核态线程引起?目前引擎进程一直在频繁的系统调用。
但是如果将引擎绑定到8核cpu,结果cps比绑定到48核CPU还高!!!
这是??
选择一个线程strace 看下:确实存在一些问题!!
对于strace -c -f -p xxx 结果就不放了