求程序员工资的中位数 附两种众数求法

除了平均数外,众数和中位数也是衡量集合的标尺之一,中位数是将集合升序排列后恰好位于正中间的元素,如果集合总数为偶数,则取中间两个元素的平均值作为中位数,下文将就用SQL去求中位数展开讨论。

首先建表:

create table tb_coder(
    id number(4,0) not null primary key,
    name nvarchar2(20) not null,
    salary integer not null)

然后充值:

insert into tb_coder
select rownum,dbms_random.string('*',dbms_random.value(2,20)),dbms_random.value(3000,20000)
from dual
connect by level<11
order by dbms_random.random;

然后我们看看表中数据:

SQL> select * from tb_coder order by salary;

        ID NAME                                         SALARY
---------- ---------------------------------------- ----------
         9 DTVMMQSOAJA                                    4947
        10 CSIZK                                          5424
         2 TYLWH                                          5676
         7 EPUIILOYENSD                                  10316
         3 MSKHEGSWNIFZWB                                15153
         4 OFZKCXNWZWSJR                                 15326
         5 HOJLJWIOYNDBB                                 16101
         8 XJQTKPDXAIOOWJ                                16843
         6 RWRGMEIRGZN                                   18599
         1 OZKFBBFYWGEERAUGSNH                           19710

已选择10行。

现在可以观察得知,总个数为10,是偶数,中位数就该是处于中间两个数15153和15326的平均值:15329.5

经过思考,我发现解决问题的关键是得知中间在哪,为此我添加了两列以帮助思考:

SQL> select id,salary,rank() over (order by salary asc) as seq,rank() over (order by salary desc) as revseq from tb_coder order by salary;

        ID     SALARY        SEQ     REVSEQ
---------- ---------- ---------- ----------
         9       4947          1         10
        10       5424          2          9
         2       5676          3          8
         7      10316          4          7
         3      15153          5          6
         4      15326          6          5
         5      16101          7          4
         8      16843          8          3
         6      18599          9          2
         1      19710         10          1

已选择10行。

进过观察可以知道,用seq减去revseq再求绝对值,值大的一定靠两边,值小的一定靠中间。于是再查:

SQL> select a.*,abs(a.seq-a.revseq) as diff from
  2  (select id,salary,rank() over (order by salary asc) as seq,rank() over (order by salary desc) as revseq from tb_coder order by salary) a;

        ID     SALARY        SEQ     REVSEQ       DIFF
---------- ---------- ---------- ---------- ----------
         9       4947          1         10          9
        10       5424          2          9          7
         2       5676          3          8          5
         7      10316          4          7          3
         3      15153          5          6          1
         4      15326          6          5          1
         5      16101          7          4          3
         8      16843          8          3          5
         6      18599          9          2          7
         1      19710         10          1          9

已选择10行。

从上面已经明显看出,diff值最小的就是我们要找的15153和15326两条记录。

如果我取diff值最小的记录,求平均值不就是中位数了吗,于是有了下面的SQL:

SQL> select avg(b.salary) from
  2  (select a.*,abs(a.seq-a.revseq) as diff from (select id,salary,rank() over (order by salary asc) as seq,rank() over (order by salary desc) as revseq from tb_coder order by salary) a ) b
  3  where b.diff=(select min(c.diff) from
  4  (select a.*,abs(a.seq-a.revseq) as diff from (select id,salary,rank() over (order by salary asc) as seq,rank() over (order by salary desc) as revseq from tb_coder order by salary) a ) c);

AVG(B.SALARY)
-------------
      15239.5

结果等于预期,这时偶数个情况,那么奇数个呢?

让我们先删除最后一行:

SQL> delete from tb_coder where id=10;

已删除 1 行。

SQL> commit;

提交完成。

再看结果集:

SQL> select * from tb_coder order by salary;

        ID NAME                                         SALARY
---------- ---------------------------------------- ----------
         9 DTVMMQSOAJA                                    4947
         2 TYLWH                                          5676
         7 EPUIILOYENSD                                  10316
         3 MSKHEGSWNIFZWB                                15153
         4 OFZKCXNWZWSJR                                 15326
         5 HOJLJWIOYNDBB                                 16101
         8 XJQTKPDXAIOOWJ                                16843
         6 RWRGMEIRGZN                                   18599
         1 OZKFBBFYWGEERAUGSNH                           19710

已选择9行。

只剩下九条,那么处于中间的15326应该是中位数。

看看是不是:

SQL> select avg(b.salary) from
  2  (select a.*,abs(a.seq-a.revseq) as diff from (select id,salary,rank() over (order by salary asc) as seq,rank() over (order by salary desc) as revseq from tb_coder order by salary) a ) b
  3  where b.diff=(select min(c.diff) from
  4  (select a.*,abs(a.seq-a.revseq) as diff from (select id,salary,rank() over (order by salary asc) as seq,rank() over (order by salary desc) as revseq from tb_coder order by salary) a ) c);

AVG(B.SALARY)
-------------
        15326

结果等于预期!

附:另一种中位数求法,这种方法比较简洁,但是是别人的方法《SQL进阶教程》P61-P62,不是我自己想出来的。

select avg(distinct salary)
from 
(
    select t1.salary
    from tb_coder t1,tb_coder t2
    group by t1.salary
    having sum(case when t1.salary>=t2.salary then 1 else 0 end)>=count(*)/2
        and  sum(case when t1.salary<=t2.salary then 1 else 0 end)>=count(*)/2
)

另外众数也是衡量集合属性的标尺之一,下面列出了两种求众数的方法供大家参考(一千元分一个档):

--求众数(使用谓词)
select a.sal,count(*) as cnt from
(select round(salary/1000)*1000 as sal from tb_coder) a
group by a.sal
having count(*) >= all(select count(*) from (select round(salary/1000)*1000 as sal from tb_coder) b group by b.sal)

--求众数(使用rownum)
select b.sal,b.cnt from
( select a.sal,count(*) as cnt from
(select round(salary/1000)*1000 as sal from tb_coder) a
group by a.sal
order by cnt desc ) b
where rownum=1

--2020年4月5日--

以上用到的全部SQL:

create table tb_coder(
    id number(4,0) not null primary key,
    name nvarchar2(20) not null,
    salary integer not null)
    
insert into tb_coder
select rownum,dbms_random.string('*',dbms_random.value(2,20)),dbms_random.value(3000,20000)
from dual
connect by level<11
order by dbms_random.random;

select id,salary,rank() over (order by salary asc) as seq,rank() over (order by salary desc) as revseq from tb_coder order by salary


select a.*,abs(a.seq-a.revseq) as diff from
(select id,salary,rank() over (order by salary asc) as seq,rank() over (order by salary desc) as revseq from tb_coder order by salary) a

select b.* from
(select a.*,abs(a.seq-a.revseq) as diff from (select id,salary,rank() over (order by salary asc) as seq,rank() over (order by salary desc) as revseq from tb_coder order by salary) a ) b
order by b.diff

select b.* from
(select a.*,abs(a.seq-a.revseq) as diff from (select id,salary,rank() over (order by salary asc) as seq,rank() over (order by salary desc) as revseq from tb_coder order by salary) a ) b
where b.diff=(select min(c.diff) from
(select a.*,abs(a.seq-a.revseq) as diff from (select id,salary,rank() over (order by salary asc) as seq,rank() over (order by salary desc) as revseq from tb_coder order by salary) a ) c)

select avg(b.salary) from
(select a.*,abs(a.seq-a.revseq) as diff from (select id,salary,rank() over (order by salary asc) as seq,rank() over (order by salary desc) as revseq from tb_coder order by salary) a ) b
where b.diff=(select min(c.diff) from
(select a.*,abs(a.seq-a.revseq) as diff from (select id,salary,rank() over (order by salary asc) as seq,rank() over (order by salary desc) as revseq from tb_coder order by salary) a ) c)
原文地址:https://www.cnblogs.com/heyang78/p/12637767.html