sas Data步数据读取流程详解

data me;
put _n_= x=;
    
/*******1******/ input x/*
input这里是读入缓冲流的关键步骤变量是从缓冲流中取出数据,根据缓冲流中指针的位置来获取变量信息*/;

/*INPUT statement causes SAS to read the first record of raw data into the input buffer. Then, according to the instructions in the INPUT statement, SAS reads the data values in the input buffer and assigns them to variables in the program data vector*/
/*将记录读入缓冲流,从缓冲流中读出数据,再将数据赋值给pdv,这是input语句的工作*/

/*******2******/ put x=; /*对读取的数据进行操作的步骤放在input和cards之间*/

/*每一个data步结束,进行的工作有如下几个 1:清空pdv 2:返回data步开头 3:_n_递增1 4:_error_设置为0 5:将pdv中的数据写入数据集*/
/*******3******/ cards;
1  /*每一行代表一个record,一个input将一条record读入input buffer,然后再分别对input 后的pdv变量进行赋值*/
    2
    3
;
run;

data步中input和其余可执行语句之间的执行顺序问题

这里是按顺序执行

1:执行put _n_ x;输出结果为 _N_=1 x=. 执行input,跳转到cards语句输入第一行观测值,(此时x已有值),执行put x=,输出x=1执行到run,清空pdv,返回data步开头

2:执行put _n_ x;输出结果为 _N_=2 x=. (前一步因为已清空了pdv,所以x为缺失值) 执行input,跳转到cards语句输入第二行观测值,(此时x已有值),执行put x=,输出x=2执行到run,清空pdv,返回data步开头

3:执行put _n_ x;输出结果为 _N_=3 x=. (前一步因为已清空了pdv,所以x为缺失值) 执行input,跳转到cards语句输入第三行观测值,(此时x已有值),执行put x=,输出x=3执行到run,清空pdv,返回data步开头

4:执行put _n_ x;输出结果为 _N_=4 x=. (前一步因为已清空了pdv,所以x为缺失值) 执行input,跳转到cards语句输入第四行观测值,但因为读取到了底部,所以直接跳转到run,退出程序。

编译阶段所做的工作

When you submit a DATA step for execution, SAS checks the syntax of the SAS statements and compiles them, that is, automatically translates the statements into machine code. SAS further processes the code, and creates the following three items

当向系统提交data步执行时,sas检验语法并进行编译(也就是将其转化为机器代码,计算机能识别的代码,010101),然后sas会进一步处理代码代码,并创造如下三个项目:

input buffer:

is a logical area in memory into which SAS reads each record of data from a raw data file when the program executes. (When SAS reads from a SAS data set, however, the data is written directly to the program data vector.)

内存中存储一行record的逻辑存储区域,只有从raw data中读取时才产生,如果直接从数据集中读取,并不会山上,数据集中的数据是直接写入pdv

pdv(program data vector)

is a logical area of memory where SAS builds a data set, one observation at a time. When a program executes, SAS reads data values from the input buffer or creates them by executing SAS language statements. SAS assigns the values to the appropriate variables in the program data vector. From here, SAS writes the values to a SAS data set as a single observation

建立数据集的内存中的逻辑区域,一般是一条观测对应一个,程序执行时,会从input buffer中读取数据,或直接依靠sas系统语句赋值。

Along with data set variables and computed variables,the PDV contains two automatic variables,_N_ /_ERROR_,

_N_:counts the number of times the data step begins to iterate.

_ERROR_: 0->> no error    1->> has error.

descriptor information

is information about each SAS data set, including data set attributes and variable attributes. SAS creates and maintains the descriptor information. 

数据集属性和变量属性的描述信息。

The Execution Phase

By default, a sample Data step iterates once for each observation that is being created. The flow of action in the Execution Phase of a simple DATA step i wsdescribe as follows

1.The DATA step begin with a DATA statement.each time the DATA statement executes, a new iteration of DATA step begins,and the _N_

   automatic variable is incremented by 1.

2.SAS sets the newly created program variables to missing in the program data vector(PDV).

3.SAS reads a data record from a raw data file into the input buffer,or it read an observation from a SAS data set directly into the PDV.

4.SAS executes any subsequent programming statements for the current record.

5.At the end of the statements, an outputreturn、reset occur automatically.SAS write an observation to the SAS data set,the system

  automatically return to the top of the DATA step, and the values of variable created by INPUT and assignment statement are reset to missing

  in the PDV. NOTE::variables that you read with a SET,MERGE,MODIFY or UPDATE statement are not reset to missing here.

  when sas reset the PDV, (1):the values of variables created by the INPUT statement are set to missing.

                (2):the value created by sum/retain statement is automatically retained.

              (3):_N_ incremented by 1, the value of _ERROR_ is reset to 0

6. SAS counts another iteration, reads the next record or observation, and execute the subsequent programming statements for the current observation.

7. the DATA step terminates when SAS encounter the end-of-file in a SAS data set or a raw data file.

当一个数据步结束后只会有三项工作进行,缓冲流中的数据指针的位置并不会自动转到下一个record中,使其转到下一个record中的原因是当前record已读取完毕,才会转入下一个record!

原文地址:https://www.cnblogs.com/yican/p/4052644.html