erl0001-Erlang 设计原则 process port io

Erlang原理（转载自ITEYE cryolite博客 ps：精彩）
by Robert Virding

This is a description of some of the basic properties and features of Erlang and an attempt to describe the rationale behind them. Erlang grew as we better understood the original problem we were trying to solve, telephony, and as we evolved the basic concepts for solving the problem.
本文描述了Erlang的基本特性以及其背后的原理。Erlang最初是基于解决我们面临的电信问题，随着我们对这些问题的深入了解，对解决这些的某些基本概念也在不断的演化。

One major point I hope to show here is that most of the features of Erlang, both the language and the system, are not isolated properties or were developed in isolation. They were designed to all interact with each other. For example: processes, process communication, distribution and error handling are all based on common principles which allow them to interact more or less seamlessly with each other; pattern matching, which is ubiquitous, is always the same irrespective of where it is used and is the only way to bind variables.
我们将展示Erlang最重要的那些特征─包括语言和系统两个方面─并不是孤立的，也不是独立发展出来的。它们设计成可以彼此互动。举个例子：进程、进程通信、分布式以及错误处理都是基于共同的原理，也就是允许它们或多或少的能彼此无缝的互动；还有，无处不在的模式匹配，不论是在哪里，它总是绑定变量的唯一方式。

术语

I try to avoid using standard terms here in a non-standard context. So “objects” refers to objects in the standard OO manner, “processes” refers to Erlang and OS processes as opposed to threads, which we never have to deal with in Erlang.
我将避免在非标准的上下文中使用标准术语。因此，对象（objects）在标准的面向对象方式中的对象，“进程”是Erlang中的进程而非操作系统进程，在Erlang中永远不会与操作系统进程打交道。

第一原则(First principles)

This is no history of Erlang, read Joe’s HOPL paper for that (I am still incapable of making small changes to anything), but knowing the initial problem and our solution to it will help to understand Erlang. The problem was telephony, i.e. large switches, and the properties of a language/system to program telephony we felt should contain:
Lightweight concurrency – This is critical, the system should be able to handle large number of processes, and process creation, context switching and inter-process communication must be cheap and fast.
Error handling – This is critical, the system must be able to detect and handle errors.
Continuous evolution of the system – We want to upgrade the system while it is running and with no loss of service.
High level language to get real benefits.
Asynchronous communication – The problem domain used asynchronous communication.
Process isolation – We don’t want what is happening in one process to affect any other process.

这里不会讨论Erlang的历史，你可以去读Joe的HOPL论文了解Erlang的历史，但是了解Erlang最初设计时所面临的问题及其解决之道将有助于对Erlang的了解。Erlang设计之初面临的是电信领域的问题，即大型交换机（large switches），解决此类问题的语言/系统我们认为需要有以下特性：
轻量级的并行：这非常重要，系统应该能处理大量的进程，而进程创建、上下文切换以及进程间通信应该廉价且快速。
错误处理：这非常重要，系统必须能检测和处理错误。
系统的持续演化：我们希望能在系统正在运行时无需中断服务就能进行升级。
高级语言实实在在的好处。（High level language to get real benefits.）
异步通信：问题域使用异步通信
进程隔离：我们不想让一个进程中发生的事影响其它进程

Some other properties we thought were important:
The language should be simple – Simple in the sense that there should be a small number of basic principles, if these are right then the language will be powerful but easy to comprehend and use. Small is good.
We should provide tools for building systems not solutions – We would provide the basic operations needed for building communication protocols and error handling.
此外还有其它我们认为重要的性质：
语言应该简单：这种情况下简单意味着只有少量的基本原则，如果它们正确那么语言就会很有威力而且容易理解和使用。小即好（Small is good）
我们应该提供构建系统的工具，而非提供解决方案：我们应该为构建通讯协议和错误处理提供基本的必要操作

All these properties were so basic and important that we felt that they had to be built in from the beginning and supported by the language. Adding them afterwards would just not cut it.
所有的这些特性都非常集成而且非常重要，我们决定它们应该最开始就被语言支持。

From the beginning we saw Erlang as a language to control hardware.
从一开始我们就将Erlang视为一个能控制硬件的语言。

Erlang “Things”

It was early decided that there would only be two different basic types of “things” in Erlang, the normal immutable data structures and processes. All things in Erlang were meant to be either a data structure or a process. We also added mutable things (like the process dictionary) but we did not overly publish the fact, and discouraged use of the process dictionary. Note that while the process dictionary itself is mutable, it is really just an implicit dictionary, the data stored in it is not.
最早我们就决定，Erlang中只有两种不同类型的物件（things）：普通的不可变数据结构和进程。这意味着，所有Erlang中的物件（things），它要么是一个数据结构，要么是一个进程。我们也添加了可变的物件（things）（例如进程字典（process dictionary））,但我们没有过于强调它，而且也不鼓励使用进程字典。注意，进程字典实际上只是一个隐式的字典，虽然进程字典本身是可变的，但其存放的数据不是可变的。

不可变的数据结构（Immutable data structures）

These are the normal Erlang terms. Having them as immutable made everything much simpler, both conceptually and in the implementation.
是指那些普通的Erlang项式（term）。项式不可变使得事情变得更简单：概念上简单，实现也简单。

Personal thought: Immutable data suits a high-level language, having mutable data gets you in to all sorts of trouble and difficulties, just read descriptions of other languages which have it and the difficulties in describing what gets changed and when, for example Python’s copy and deep_copy. Mutable data, however, is much easier to comprehend in a low-level language like C, K&R C, where you directly see which data is passed by value and which is passed by reference, i.e. mutable.
个人认为：不可变数据适合高阶（high-level）语言，拥有可变的数据使你陷入各种各样的麻烦，读读其它那些有可变数据的语言的规范，其困难之处在于描述是什么引起了变化、以及在何时变化，例如，Python的拷贝和深拷贝。然而，可变数据在像C这样的低价语言里更容易理解，这些语言中你可以直接看到哪些数据是传值的，哪些是传引用的（即可变的）。

进程(Processes)

A process is something which obeys process semantics:
All communication is through asynchronous message passing.
Links/monitors for error detection/handling.
Obey/transmit exit signals.
Parallel independent execution.

一个进程是指符合进程语义的物件：
所有通信通过异步消息传递进行；
连接/监视（Links/monitors）用于错误检测/处理的；
服从/转交退出信号（Obey/transmit exit signals）；
并行的独立的执行

There were to be no “back doors” in how processes communicate and all messages sent were complete, no partial messages. This is in fact essential for building robust systems, you know that either the message has been sent, or it hasn’t, there is no uncertainty. . Also, you don’t know if the message has been received.
关于进程如何通信没有什么后门，发送的消息是完整的，不会发生只发送一部分的消息的情况。实际上，这对构建健壮的系统是必不可少的，你清楚的知道：要么消息被发送、要么没有，不会有不确定性存在。。。而且，你不必知道消息是否成功接收。

N.B. Nothing is said here about how processes are implemented, or where, or in what language etc, only how the rest of Erlang perceives them. If it obeys process semantics then it is a process.
注意：这里说的不是进程如何实现、在哪里实现、由哪个语言实现等。这里说的是如果某个东东如果符合这里的进程语义，那么它就是一个进程。

进程通信（Process communication）

All communication between processes was meant to be asynchronous, this means messages, exit signals and control signals (link, unlink, etc). It means that all BIFs dealing with processes are also meant to be asynchronous, they can really only check their arguments and not the result of sending off their message/control signal. This is why BIFs like link/1 used to return errors of non-existent processes by exit signals not by return values.
所有进程间的通信都是异步的，也就是说消息、退出信号（exit signals）和控制信号（control signals，例如link、unlink等）都是异步的。这也意味着所有用于处理进程的BIF都是异步的，它们只能检查其传入参数而不能检查发送消息（或者控制信号）后的结果。这也是为什么像link/1这样BIF要通过退出信号（exit signals）而不是通过返回值判断不存在的进程。

Having all communication asynchronous also allowed us to avoid a number of problems with built-in synchronous communication:
The level of security in the message protocol.
How to decide when an error has occurred.
Many protocols are asynchronous.
让所有通信都异步也使我们避免了内建同步通信中的那些麻烦：
消息协议的安全级（The level of security in the message protocol)
当一个错误发生时该怎么办；
许多协议都是异步的

This was really only broken in one way and that was when sending a message to a registered process where it is checked whether a process is actually connected to the name.
只在一种情况下会有麻烦，那就是当给一个注册的进程发送消息时，需要检查是否有一个进程已经连接到该名字了。

错误处理（Error Handling）

分布式（distribution）

While it was Klacke who first implemented distribution we had planned for distribution before that while we were thinking about process communication and error handling. We decided that truly asynchronous communication was best for distribution..
是Klacke首先实现了我们规划的分布式机制。在那之前，我们正在考虑进程通信和错误。我们相信异步通信对分布式是最好的.

Distribution is based on the concept of loosely coupled nodes.
分布式是基于松散耦合节点的概念之上的。

Distribution was always meant to be transparent, if so desired. This meant process communication and error handling must work the same for distributed processes as for local processes.
分布式总是意味着透明。这意味着，分布进程的进程通信和错误处理必须和本地进程的相同。

端口（Ports）

Ports are the mechanism for communicating with the outside world, i.e. anything “outside” of Erlang. Ports were designed to obey process semantics as this was the best way to make them fit into the standard Erlang execution model. In fact, there was a serious discussion whether there should be a separate port data type or whether they should be Pids. We finally decided to make them a separate data type as we felt that there could be times when it might be necessary to be able to detect if we were communicating with a process or a port. But apart from a type test and the open_port function ports behaved as processes, they were processes in the Erlang sense. As processes are concurrent things then sending message will never hang the sender which can happen if the interface is based on function calls. It is up to the implementation to ensure this never happens. It also meant that no, for Erlang, new mechanisms were needed for handling ports.
端口是负责处理与外部世界（即Eralng之外的任意东东）通信的机制。端口被设计成遵循进程语义，这使得它们能很好的适合标准Erlang执行模型。实际上，有过一场严肃的讨论：是否应该有一个单独的端口数据类型，还是，端口就是Pid。我们最终决定使用一个单独的数据类型（port()），这样在必要时可以知道我们是在与一个进程还是在与一个端口通信；但是，除了类型检查，open_port函数得到的端口表现的与进程一致，在Erlang中，它们就是进程。既然进程是并行的物件（thing），那么发送消息将永远不会挂起发送者（而这会在基于函数调用实现的接口中发生），这也意味着对于Erlang来说不需要有一个处理端口的专门机制。

As ports were meant to look like processes their interface was message/link based as are processes. Also ports shouldn’t know more than is communicated by messages.
既然端口意味着进程，其接口就像进程一样是基于消息/连接的。除了通过消息进行通信，端口不需要知道太多别的东西。

The model for ports was streams from UNIX. Having ports as processes would make it transparent with what was actually being communicated with. You could transparently insert filters between the application and the port for processing data.
端口的模型就是UNIX中的流。把端口作为进程处理，这使它与实际要通信的东西保持透明。你可以在应用（application）和端口之间透明的加入处理数据的过滤器。

This makes the comments in the module erlang documentation of how the port_XXX functions are cleaner and much more logical than the message interface a bit strange. The message protocols were first, the functions came later.
erlang模块文档中的port_XXX函数的注释比消息接口更清晰更有逻辑性，这看上去有点奇怪。消息协议第一，函数第二。

类型（Typing）

We all came from a dynamically typed past so having Erlang dynamically typed seemed the most natural thing to do. Although this has been the cause of much discussion and made us a little off in the academic functional language world I still think is was the right decision. Also with loosely coupled distribution were nodes were more less allowed to come and go as they pleased it would be hard to statically type communication between nodes.
我们过去都有使用动态类型的经历，因此我们设计的Eralng具有动态类型是很自然的事。虽然这引起很多争论，而且这使我们与学院派函数式语言世界有点不同，我还是认为这是一个正确的决定。如果在节点间使用静态类型通信就比较难以做到松耦合分布式透明性。（Also with loosely coupled distribution were nodes were more less allowed to come and go as they pleased it would be hard to statically type communication between nodes）

模块、代码和代码装载（Modules, code and loading）

I/O系统（I/O System)

The i/o system is process based, as it should be in Erlang. This allows it to be very versatile, for example starting shells on one node and having its i/o redirected to another node is trivial.
I/O系统是基于Erlang进程的。这使得它很通用，例如，在一个节点上启动shell，将它的I/O重定向到其它节点简直是小菜。

The central part of the i/o system is the i/o-server. It is the process to which i/o requests are sent from the application and which maps these into suitable messages to the i/o device/ port. An i/o-server must handle device specific requests as well as the generic i/o requests for example through the module ‘io’ from the application. It is the i/o-server which handles matching the characters read/written in the applications i/o requests against the actual device which may need buffering to create requests of appropriate size. It is only in very trivial cases like having fixed size records where this is not needed. All necessary buffering of data between requests, both input and output is done by the i/o-server.
I/O系统的中心部分是I/O服务器，这是一个进程。I/O请求从应用发送到该进程，该进程将这些I/O请求映射成合适的消息然后发给I/O设备/端口。一个I/O服务器必须处理设备相关的请求以及通用的I/O请求(例如，应用程序使用io模块发送的I/O请求)。I/O服务器负责处理将应用程序中的I/O请求中的字符读/写“匹配”到实际设备上，这些设备可能需要缓存以创建合适大小的请求。只在某些情况下不需要缓存，例如对固定大小的记录的处理。请求之间所有必要的缓存，包括输入和输出都由I/O服务器完成。

Splitting the i/o system in this fashion has two important benefits:
It made it possible to use generic i/o functions. The function for doing formatted output would not need to know to what type of device it was generating output, the i/o-server would handle the specifics of the device. The same applies for input, we only need one function to scan for Erlang tokens and the i/o-server provides it with input characters as needed. The alternative would be to have one set of i/o functions for each type of device, which quickly become completely unmanageable.
It meant that an i/o-server for a specific device is generic and can be used for many different types of data requests. It is not necessary to open a device for a certain type of interaction, for example lines or Erlang forms. So it is possible to interlace requests to read Erlang tokens, lines, fixed-length records and anything else through one i/o-server to a device.
这种将I/O系统划分出来的方式带来两个重要的好处：
它使得使用通用的I/O函数成为可能。处理格式化输出的函数不必知道其输出的设备的类型，I/O服务器负责处理特定的设备。对输入来说也是一样的道理，我们仅需要一个扫描Erlang token的函数，而I/O服务器为该函数提供了必要的输入字符。不然就得为每种设备准备一套I/O函数，随着I/O设备的不断增嘉很快就会变得不可管理。
它意味着对一个特点设备的I/O服务器是通用的，它可以用于许多不同类型的数据请求。不必为了一个特定类型的交换打开一个设备。所以可以从一个I/O服务器到一个设备交错的读Erlang tokens，数据行、固定长度的记录以及其它东东。

The generic part of the i/o-server is truly generic and can handle any read or write requests. The module ‘io’ together with the module ‘io_lib’ implements a basic Erlang oriented i/o interface but an i/o-server can implement any form of interface through the following protocols
I/O服务器的通用部分真的非常通用，可以处理任意读或写请求。io模块和io_lib模块一起实现了一个基本的面向Erlang的I/O接口，但是一个I/O服务器可以通过如下协议实现任意接口。

基本消息格式（Basic message format）
{io_request, From, ReplyAs, Request}
{io_reply, ReplyAs, Reply}

These are the messages for sending a request to an i/o-server and returning the reply. The ReplyAs term is created by the client and sent to the server as the way for it to identify to which request this is a reply to. It can be any term and the i/o-server only uses it in its replies. Doing it this way is both efficient and very versatile. A client can, for example, send many requests to a server and then selectively receive replies to these requests in any order it wishes, which can be useful if the i/o-server has non-blocking requests. It also allows the i/o-server to pass requests on to other processes which can then directly reply to the client.
这些是用来将一个请求发送给一个I/O服务器，以及从I/O服务器返回响应的消息。ReplyAs项式由客户端创建然后发送给服务器，用以标识响应应该返回给哪个请求。它可以是任意项式（term），I/O服务器仅在其响应时用到它。这样做高效而通用。例如，一个客户端可以发送许多请求到一个服务器，然后以它希望的顺序选择性的接收这些请求的回复，如果I/O服务器有非阻塞的请求这非常有用。它还允许I/O服务器将请求传给其它能直接回复客户的进程。

至I/O服务器的通用请求（Generic request to an i/o-server）

{put_chars, IoList}
{put_chars, Module, Function, Args}

These requests are for output. The first takes an iolist generated by the application and sends it the device while the with the second the i/o-server calls Module:Function(Args) which must return an iolist which is then sent to the device. This allows the application to decide where the work to generate the output data is to be done. For example in the module ‘io’ calling io:fwrite(…) causes {put_chars,io_lib,fwrite,[…]} to be sent to the i/o-server which then evaluates the fwrite.
这些请求用于输出。第一个格式（{put_chars,IoList}）使用了应用程序生成的iolist，并采用这个格式发送数据给设备。采用第二个格式（{put_chars,M,F,A}），I/O服务器会调用Module:Function(Args)，该函数必须返回一个iolist，随后它被发送给设备。这使得应用程序可以觉得在哪来生成输出数据。例如，io模块函数io:fwrite(...)的调用会导致{put_chars,io_lib,fwrite,[...]}被发送给I/O服务器，I/O服务器随后会执行相应的fwrite。

These reply either ‘ok’ or ‘{error,Error}’.
其回复要么是ok，要么是{error,Error}。

{get_until, Prompt, Module, Function, ExtraArgs}
This request is for input. The Prompt is included in the input request. If a prompt is not needed for that device then it is ignored by the i/o-server.
此请求用于输入。Prompt包含在输入请求内，如果一个设备不需要提示符（prompt）那么它就会被I/O服务器忽略掉。

The major problem when doing input is that in most cases you don’t know how many characters are needed from the device to complete the input request. For example reading an Erlang form can require anything from a few characters up to tens of thousands (for a truly enormous function). Even a line can be of variable length. For some types of devices, for examples files, it may be feasible to read in the whole input in one go, but in many cases this is impossible, for example requiring users to enter their whole shell input in one go before doing any processing is not realistic.
进行输入时最主要的问题是，大多数情况下你没法知道完成一次输入请求需要从设备中读入多少字符。例如，例如读取一个Erlang格式数据可以得到任意的东西，从少数几个字符到上万个。即使是一行字符，其长度也是可变的。对某些类型的设备来说，例如文件，一次读取整个输入可能很灵活，但是许多情况下这是不可能的任务，例如让用户在shell上一次输入所有的命令（而不等这些命令一个个执行）是不现实的。

Two ways of handling this are:
Call the input function with an argument which is a function to be called when more characters are needed.
Make the input function re-entrant so if there are not enough characters then a continuation is returned which can be called with more characters to continue the collecting.
这种情况有两种处理办法：
调用一个带参数的输入函数。该参数是一个函数，当需要更多字符时会此函数会被调用；
使这个输入函数是可以重入的（re-entrant），这样如果没有足够的字符，会返回一个continuation，这个continuation能被调用以获取更多的字符以继续字符的收集工作。

The second alternative was chosen as it completely separates the input function from collecting characters from the device and also allows the i/o-server to retain control during input. For example even during input processing the i/o-server can still easily look for and handle output requests, if the device allows it.
选择了第二个办法，因为此办法能将收集字符的输入函数与设备完全隔离，这也使得I/O服务器在处理输入时保留控制权。例如，只要设备允许，即使正在输入处理，I/O服务器仍然能够很容易的处理输出请求。

The continuation is a data structure created by the input function which it can use to continue collecting input when called with more characters. If implemented today it would most likely be a fun, but these did not exist then.
continuation是一个由输入函数创建的数据结构，当需要更多字符时它可以用来继续收集输入。

You call the input function with some characters and it tries to collect enough characters
   apply(Module, Function, [Continuation, Characters | ExtraArgs])

and it returns
{done, Result, RestChars}
{more, Continuation}

你可以带着某些字符调用输入函数，它将试图收集足够的字符
   apply(Module, Function, [Continuation, Characters | ExtraArgs]) %%注意这里，我们必须用apply，因为我们需要建立参数列表
它会返回
{done, Result, RestChars}
{more, Continuation}

If the input function could get enough characters then it returns ‘done’, the result to return to the caller and the remaining characters to keep for the next input request. If there are not enough characters then ‘more’ is returned and a new continuation. The input function is then called again with the new continuation and more characters and this is continued until enough characters have been collected. The continuation is initially set to []. A generic loop would look something like:
get_until(Cont, Chars, Mod, Func, Args) ->
case apply(Mod, Func, [Cont,Chars|Args]) of
    {done,Result,RestChars} -> %Result may be error
      {done,Result,RestChars};
    {more,NewCont} ->
      MoreChars = get_some_chars(), %Device specific
      get_until(NewCont, MoreChars, Mod, Func, Args)
end.

如果输入函数获取到足够的字符，它会返回done，结果将返回给输入函数的调用者，保留剩下的字符以备下一次输入请求。如果没有足够的字符，它会返回more和一个新的continuation。然后带着新的continuation参数和更多的字符，输入函数又一次被调用，这一过程将持续进行指定收集到足够的字符。如果continuation初始化设置为[]。一个通用的循环看上去就像这样：
get_until(Cont, Chars, Mod, Func, Args) ->
case apply(Mod, Func, [Cont,Chars|Args]) of
    {done,Result,RestChars} -> %结果可能是错误的
      {done,Result,RestChars};
    {more,NewCont} ->
      MoreChars = get_some_chars(), %设备相关
      get_until(NewCont, MoreChars, Mod, Func, Args)
end.

For example io_lib:collect_line/2 obeys this protocol and returns a line, erl_scan:tokens/3 obeys this protocol and returns a list of tokens up to and including a ‘dot’ and the lexical analyzer generator ‘leex’ creates functions token/3 and tokens/3 which obey this protocol.

举个例子，io_lib:collect_line/2函数遵循此协议，它会返回一个字符行；erl_scan:tokens/3遵循此协议，它会返回一列包含一个dot的tokens；文本分析器生成器leex创建函数token/3和tokens/3也遵循此协议。