sas中的sql(1) 基本语法

　　Sas proc sql与寻常sas语句的的不同之处

1：The PROC SQL step does not require a RUN statement. PROC SQL executes each query automatically

2：Unlike many other SAS procedures, PROC SQL continues to run after you submit a step. To end the procedure, you must submit another PROC step, a DATA step, or a QUIT statement

PROC SQL; *sql过程中的变量不区分大小写，表与表之间，变量与变量之间都必须要用逗号分开；

CREATE TABLE table-name AS

SELECT column-1<, . . . column-n>

FROM table-1 | view-1<, ... table-n | view-n>

<WHERE expression>

<GROUP BY column-1<, ... column-n>>

having experssion

<ORDER BY column-1<, ... column-n>>;

Unlike other SAS procedures, the order of clauses within a SELECT statement in PROC SQL is important

select:In the SELECT clause, you can both specify existing columns (columns that are already stored in a table) and create new columns,和input后面插入自己想要的变量一样，new columns exist only for the duration of the query, unless a table or a view is created（新建的列只在查询中）。

where:In the WHERE clause, you can specify any column(s) from the underlying table(s). The columns specified in the WHERE clause do not have to be specified in the SELECT clause(where中的变量可以为表中的任意一个变量)。

order by:后面可跟多个变量，用逗号分隔，对升序或降序的指定的关键字放在变量后，desc asc,也可以指定不在select clause中的变量,对于在clause中的变量比如

select a,b,c from table; order by中可用名字代替也可以数字代替 order by a=order by 1,对于多个变量的排序，用分号间隔，数值和字母可以混用 order by 1, age;。

group by:和Summary Functions用在一起，才有分组功能，否则sas会将其自动转换为order by功能。

 proc sql;    
     select membertype, sum(milestraveled) as TotalMiles   
                 from sasuser.frequentflyers        
        group by membertype;      /*group by后面接select中没有经过summary函数的变量才有用，否则不能得出分组效果*/
quit;

Having:A HAVING clause works with the GROUP BY clause to restrict the groups that are displayed in the output, based on one or more specified conditions。和group by连用，作为限制性条件，控制输出内容

proc sql;      
select jobcode,     
       avg(salary) as AvgSalary     format=dollar11.2        
        from sasuser.payrollmaster     
        group by jobcode        
        having avg(salary) > (select avg(salary) from sasuser.payrollmaster);
quit; 

/*先执行子查询中的语句，计算出出所有salary的平均值，然后在having子句中，用每个group的avg工资与子查询得出的总的avg工资进行比较，筛选出想要的结果*/

单表操作

proc sql;
    create table class as 
    select /*select和from中间全部是要获取的变量,分为两种，一种是数据集本身已有的，另一种是由其衍生的变量*/ 
    name,
    case when sex eq '女' then '1'
         when sex eq '男' then '2'
         else '3'
    end
    as sex_tran label="性别转换", /*as做为关键字，形成一个变量，label是sas中的关键字*/
    sum(weight-20) as new1 format=16.2,
    mean(height-20) as new2 format=16.2,/*一般来说sum mean后面一定会出现group by 因为是要根据组别进行汇总*/
    (calculated new1-calculated new2) as new /*calculated为关键字，表示new1是衍生的变量，如果需要对其进行操作的话要加上calculated*/
    from sashelp.class(where=(height between 50 and 70)) /*sas中between and是闭区间*/
    group by name,calculated sex_tran/*除开group by后面的字段外，select中其他字段必须是数值型，不然无法进行group*/
    /*group系统默认按照升序排列后面的字段*/
    order by calculated sex_tran desc/*对想要进行降序的字段可额外进行order by*/
;
quit;

having的用法，实际中很少应用，能用data步的就用data步

/*having语句一般跟在group by后面，如果单独来用，其作用和where差不多*/
proc sql;
    create table have1 as
    select 
        id,
        max(txn_seq) as max_txn,
        sum(amount) as sum_amount
    from temp.having1
    group by id
    having calculated sum_amount ge 70
;    
quit;

/*目标是去除每一组的最后一条观测，但是使用having在分组相关变量有重复的情况下不保险，所以能用data步处理尽量用data步*/
proc sql;
    create table have2 as
    select 
        id,
        txn_seq,
        amount
    from temp.having2
    group by id
    having txn_seq eq max(txn_seq)
;    
quit;

创建和更新表